Pia Pachinger

she / her

I'm a fourth year PhD student currently interested in perspectivism and human label variation in Natural Language Processing, social bias in NLP, low-resource language varieties and domains, data resource creation, and domain-specific information extraction.

I love all areas of NLP, from foundations to application. Please reach out to me if you want to discuss!

Contact me by reordering @ tuwien.ac.at pia.pachinger

An AI Reading of the Library of Babel
Simón López Trujillo*, Pia Pachinger*, Baltazar Pérez*

Publications

The size of AustroTox

2024 ACL Findings
AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection
Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, Julia Neidhardt

We create the first dataset with a focus on Austrian German toxic language online and the first German dataset on toxicity featuring annotated spans facilitating explainability and error analysis. We find that LLMs not fine-tuned on AustroTox fail to recognize country-specific vulgar language and the targets of toxic statements (as of 1 / 2024).

Here is the data, here is the poster.

2023 EACL C3NLP
Toward Disambiguating the Definitions of Abusive, Offensive, Toxic, and Uncivil Comments
Pia Pachinger, Anna Maria Planitzer, Julia Neidhardt, Allan Hanbury

We find that researchers studying harmful language online treat abusiveness, offensiveness, and toxicity interchangeably as sub-concept of one another. While social science literature frequently employs incivility with similar meaning, computer science research underutilizes this term, suggesting both fields would benefit from terminological unification. We compile and analyse the distinct definitions researchers use for these concepts to facilitate future unification efforts across and within disciplines.

2022 TU Vienna
A Recommender System for Scientific Referees Based on Bibliographic Databases and Knowledge Graphs
Pia Pachinger, Georg Gottlob, Joël Ouaknine, Glenn Starkman, Matt Rainey, Emanuel Sallinger

We implement multiple recommender systems for scientific expert search using two-step coauthorship and publication venue overlap. Qualitative evaluation by 15 established computer science researchers showed the best system recommended a mean of 5 excellent and 2.8 suitable experts per 10 recommendations.

Current Presentations

Workshop on Online Abuse and Harms (ACL) 2025
Alignment by Disagreement? Toward Investigating LLMs' Adaptation to Personal and Sociodemographic Variability in the Perception of Toxicity
Pia Pachinger, Anna Maria Planitzer, Allan Hanbury, Julia Neidhardt, Sophie Lecheler

International Communications Association Conference 2025
Like Walking a Tightrope: User-Centric Perspectives on Automated Content Moderation
Anna Maria Planitzer, Sophie Lecheler, Svenja Schäfer, Pia Pachinger (presented by Anna)

COMPTEXT 2025
Incorporating User Perceptions of Online Norm Violations in Toxicity Detection Models
Pia Pachinger, Anna Maria Planitzer, Allan Hanbury, Julia Neidhardt, Rebekah Wegener, Sophie Lecheler

Prices

2024 NAACL, Student Research Workshop
Best PhD proposal
User-Centric Offensive Text Detection in Culture-Specific Contexts:
A PhD Proposal
Pia Pachinger

2022 Inria, Paris
Design + interaction + AI hackathon
Second Prize for That's life.
Artifact publicly exhibited at Le Bis in Paris
Anaïs Cambou*, Anthonin Gourichon*, Fengyu Li*, Xiaoning Meng*, Pia Pachinger* (* equal contribution)

Education

PhD in Informatics, TU Vienna, 2022 – 2026
Natural Language Processing, User-Centric Offensive Text Detection in Culture-Specific Contexts

Master in Data Science, TU Vienna
Machine Learning and Statistics, Natural Language Processing and Visual Analytics
GPA 3.7 / 4.0

Languages
German (native)
English (C1)
Spanish (C1)
Italian (very bad still :) )
Python

Bachelor in Mathematics
University of Vienna

Employment

2019 - 2020 Centre for Cyber Security
Austrian Institute of Technology
Freelance researcher
Pre-training and evaluation of CNNs and LSTMs for anomaly detection in time series of system log data

2022 - 2026 Data Science Group, TU Vienna
Prae-Doc researcher
TACo: User-centric content moderation
BrAIn: Domain-specific information extraction
VHH: Evaluation of machine translation models

2021 - 2022 Databases and Artificial Intelligence Group, TU Vienna
Student researcher
Implementation of recommender system for scientific referees

2018 - 2020 Faculty of Mathematics, University of Vienna
Teaching Assistant

Further
Activities

2025 Supervision of Gerald Weber's Masters Thesis on Information Extraction for the Engineering Domain

Reviewer for LREC-Coling 2024, WOAH (ACL) 2025

Teaching

2025 Faculty of Informatics, TU Vienna
Interdisciplinary Project in Data Science

2023 - 2024 Faculty of Informatics, TU Vienna
Natural Language Processing and Information Extraction

2023, Reimagining recommender systems together with Anna Merl and Ignacio Pérez Messina. Here is a demo.

2023 Faculty of Informatics, TU Vienna
Advanced Information Retrieval

2020 University of Vienna
Member of the curricular working group for the new data science master studies

2023 Faculty of Linguistics, Paris Lodron University Salzburg
Language Technology and Language Data

Visits

2019 Faculty of Mathematics, University of Vienna
Python for Mathematicians

2018 University of Bergen, Norway
Collaboration with Morten Brun on Topological Data Analysis

2018 - 2020 Faculty of Mathematics, University of Vienna
Introduction to Wolfram Mathematica

2018 National University of Colombia
Collaboration with Francisco Gómez on Topological Data Analysis

2016 - 2017 Autonomous University of Madrid
Erasmus

A long time ago: Guardería de Don Bosco, San José, Costa Rica
Volunteer in a kindergarden and boarding school for socially deprived children and adolescents