Pia Pachinger

she / her

I'm a fourth year PhD student at TU Wien (Vienna, Austria) supervised by Allan Hanbury and Julia Neidhardt. I am interested in safety alignment, perspectivism and human label variation in Natural Language Processing, social bias in NLP, low-resource settings, and data resource creation. 

Please reach out to me if you want to discuss!

Contact me by reordering    @    tuwien.ac.at    pia.pachinger

An AI Reading of the Library of Babel
Baltazar Pérez, Pia Pachinger, Simón López Trujillo  

Publications

The size of AustroTox

2024 ACL  Findings 
AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection 
Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, Julia Neidhardt

We create the first dataset with a focus on Austrian German toxic language online and the first German dataset on toxicity featuring annotated spans facilitating explainability and error analysis. We find that LLMs not fine-tuned on AustroTox fail to recognize country-specific vulgar language and the targets of toxic statements (as of 1 / 2024). 
 

2025 EMNLP NLPerspectives
A Disaggregated Dataset on English Offensiveness Containing Spans
Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Julia Neidhardt, Allan Hanbury

We re-annotate a subset of posts of the Jigsaw Toxic Comment Classification Challenge and provide disaggregated toxicity labels and spans. We find that five annotations per instance allow for fine distinctions between genuine disagreement and that arising from annotation error or inconsistency. Disagreement is especially high in cases of toxic statements involving non-human targets.

Data Paper

2023 EACL C3NLP 
T
oward Disambiguating the Definitions of Abusive, Offensive, Toxic, and Uncivil Comments
Pia Pachinger, Anna Maria Planitzer, Julia Neidhardt, Allan Hanbury

We find that researchers studying harmful language online treat abusiveness, offensiveness, and toxicity interchangeably as sub-concept of one another. While social science literature frequently employs incivility with similar meaning, computer science research underutilizes this term, suggesting both fields would benefit from terminological unification. We compile and analyse the distinct definitions researchers use for these concepts to facilitate future unification efforts across and within disciplines.

Paper Video

2022 TU Wien
A Recommender System for Scientific Referees Based on Bibliographic Databases and Knowledge Graphs
Pia Pachinger, Georg Gottlob, Joël Ouaknine, Glenn Starkman, Matt Rainey, Emanuel Sallinger

We implement multiple recommender systems for scientific expert search using two-step coauthorship and publication venue overlap. Qualitative evaluation by 15 established computer science researchers showed the best system recommended a mean of 5 excellent and 2.8 suitable experts per 10 recommendations.

Paper

Current Presentations

Awards

Education

Employment

Academic Service

2025 Supervision of Gerald Weber's Masters Thesis on Information Extraction for the Engineering Domain


Reviewer for LREC-Coling 2024, WOAH (ACL) 2025

2019 Faculty of Mathematics, University of Vienna
Python for Mathematicians

2023 Faculty of Informatics, TU Vienna
Natural Language Processing and Information Extraction

2020 University of Vienna 
Member of the curricular working group for the new data science master studies

2023 Faculty of Informatics, TU Vienna
Advanced Information Retrieval

2023 Faculty of Linguistics, Paris Lodron University Salzburg
Language Technology and Language Data

Teaching

2018 - 2020 Faculty of Mathematics, University of Vienna
Introduction to Wolfram Mathematica

2025 ​Faculty of Informatics, TU Vienna
Interdisciplinary Project in Data Science

Research Visits

2018 University of Bergen, Norway
Collaboration with Morten Brun on Topological Data Analysis

2017 National University of Colombia
Collaboration with Francisco Gómez on Topological Data Analysis