austrotox

2024 ACL Findings
AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection
Pia Pachinger, Janis Goldzycher, Anna Maria Planitzer, Wojciech Kusa, Allan Hanbury, Julia Neidhardt

We create the first dataset with a focus on Austrian German toxic language online and the first German dataset on toxicity featuring annotated spans facilitating explainability and error analysis. We find that LLMs not fine-tuned on AustroTox fail to recognize country-specific vulgar language and the targets of toxic statements (as of 1 / 2024).

Pia

Paper

Data

Code

Video