Multilingual and multimodal bias probing and mitigation in natural language processing

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Gender bias is a key global challenge of our time according to the United Nations sustainability goals, which call for the elimination of all forms of gender-based discrimination. Since it is ubiquitous online and offline, gender bias is also prevalent in the training data for Natural Language Processing models; these models therefore learn and internalize this bias. Gender bias then reappears when models are probed and used in downstream tasks such as automatic recruitment leading to gender-based discrimination that affects people's lives in a negative way. Thus, gender bias is problematic as it harms individuals. There is a growing body of research attempting to combat gender bias in language models. However, the diversity of research is quite limited and focused on English and on occupational biases. In this thesis, we attempt to move beyond the current insular state of gender bias research in language models to improve the coverage of languages and biases that are being studied. Specifically, we undertake three projects that aim to broaden the breadth of current gender bias research in Natural Language Processing (NLP). The first project aims to build a dataset to investigate languages beyond English; our methodology makes it easy to extend the dataset to any language of choice. In addition, we propose a new analytical bias measure that may be used to evaluate bias, given the model's prediction probabilities. In the second project, we demonstrate that learned gender stereotypes regarding politeness may bleed into cyberbullying detection systems, which may disproportionately fail to protect women if the system is attacked with honorifics. In this project, we focus on Korean and Japanese NLP models; however, our results raise the question whether other systems in other languages can fall prey to the same biases. In the third project, we demonstrate that visual representations of emoji may evoke harmful text generation that disproportionately affects different genders, depending on the emoji choice.

Not available

Steinborn, Victor

15. Apr. 2024

2024

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-341655

Steinborn, Victor (2024): Multilingual and multimodal bias probing and mitigation in natural language processing. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Steinborn_Victor.pdf
6MB

DOI: 10.5282/edoc.34165

URN: urn:nbn:de:bvb:19-341655

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Themengebiete:	300 Sozialwissenschaften 300 Sozialwissenschaften > 310 Statistik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	15. April 2024
1. Berichterstatter:in:	Schütze, Hinrich
MD5 Prüfsumme der PDF-Datei:	81b8bd550daf90222d114720cc3640a3
Signatur der gedruckten Ausgabe:	0001/UMC 30725
ID Code:	34165
Eingestellt am:	10. Oct. 2024 10:43
Letzte Änderungen:	10. Oct. 2024 10:43

Nur für Administratoren und Editoren: Dokument bearbeiten