Steinborn, Victor (2024): Multilingual and multimodal bias probing and mitigation in natural language processing. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik |
Vorschau |
PDF
Steinborn_Victor.pdf 6MB |
Abstract
Gender bias is a key global challenge of our time according to the United Nations sustainability goals, which call for the elimination of all forms of gender-based discrimination. Since it is ubiquitous online and offline, gender bias is also prevalent in the training data for Natural Language Processing models; these models therefore learn and internalize this bias. Gender bias then reappears when models are probed and used in downstream tasks such as automatic recruitment leading to gender-based discrimination that affects people's lives in a negative way. Thus, gender bias is problematic as it harms individuals. There is a growing body of research attempting to combat gender bias in language models. However, the diversity of research is quite limited and focused on English and on occupational biases. In this thesis, we attempt to move beyond the current insular state of gender bias research in language models to improve the coverage of languages and biases that are being studied. Specifically, we undertake three projects that aim to broaden the breadth of current gender bias research in Natural Language Processing (NLP). The first project aims to build a dataset to investigate languages beyond English; our methodology makes it easy to extend the dataset to any language of choice. In addition, we propose a new analytical bias measure that may be used to evaluate bias, given the model's prediction probabilities. In the second project, we demonstrate that learned gender stereotypes regarding politeness may bleed into cyberbullying detection systems, which may disproportionately fail to protect women if the system is attacked with honorifics. In this project, we focus on Korean and Japanese NLP models; however, our results raise the question whether other systems in other languages can fall prey to the same biases. In the third project, we demonstrate that visual representations of emoji may evoke harmful text generation that disproportionately affects different genders, depending on the emoji choice.
Dokumententyp: | Dissertationen (Dissertation, LMU München) |
---|---|
Themengebiete: | 300 Sozialwissenschaften
300 Sozialwissenschaften > 310 Statistik |
Fakultäten: | Fakultät für Mathematik, Informatik und Statistik |
Sprache der Hochschulschrift: | Englisch |
Datum der mündlichen Prüfung: | 15. April 2024 |
1. Berichterstatter:in: | Schütze, Hinrich |
MD5 Prüfsumme der PDF-Datei: | 81b8bd550daf90222d114720cc3640a3 |
Signatur der gedruckten Ausgabe: | 0001/UMC 30725 |
ID Code: | 34165 |
Eingestellt am: | 10. Oct. 2024 10:43 |
Letzte Änderungen: | 10. Oct. 2024 10:43 |