Gahman, Paul (2023): Thematizer: a computational approach to thematic theory and analysis. Dissertation, LMU München: Fakultät für Sprach- und Literaturwissenschaften |
Vorschau |
Lizenz: Creative Commons: Namensnennung-Nicht Kommerziell-Keine Bearbeitung 4.0 (CC-BY-NC-ND)
Gahman_Paul.pdf 8MB |
Abstract
Thematic theory, comprised of the concepts of theme, rheme, and thematic progression, concerns itself with the interplay between word order, information status, propositional content and discourse function. In contemporary research on thematic theory, researchers have begun to leverage computational means for text analysis with respect to thematic structure. However, deficiencies in both the theoretical treatment and computational operationalization of thematic theory have limited writers’ accessibility to thematic structure. The present work set out to address these deficiencies by identifying remaining gaps in thematic theory and by developing the software Thematizer, which automatically analyzes texts in terms of themes, rhemes and thematic progression. To develop and train Thematizer, 30 Wikipedia articles, L1 and L2 university texts, blog articles and lyrics were used. The accuracy of Thematizer, measured with the F1 score, was then validated with ten novel test texts. All 160 texts were first manually analyzed for comparison against the results that the software yielded. The resulting F1 scores for Thematizer’s parsing functionality were then used as a metric for its operationalization of thematic theory via computational means. In turn, Thematizer’s degree of operationalization informed writers’ degree of accessibility to thematic theory. In the identification of themes and rhemes, Thematizer achieved an F1 score of 85.8% for training texts and 92.0% for test texts (cf. 89.1% gold standard). The identification and classification of marked themes exceeded the gold standard of 89.1% through the training texts’ F1 score of 94.9% and the test texts’ F1 score of 93.4%. Finally, only training texts (F1 = 80.2%) exceeded the gold standard of 79.2% for the classification of thematic progression patterns, with test texts yielding an accuracy of F1 = 75.9%. These findings indicate that Thematizer successfully operationalized marked theme identification and classification but was only able to partially operationalize the identification of themes and rhemes in text. Thematic progression, however, was inconsistently operationalized due to the wide range of F1 scores that Thematizer achieved and that were often below the gold standard. Operationalization and thereby accessibility to thematic theory were both facilitated by automated means, which represents a marked advancement in the computational treatment of theme, rheme and thematic progression. Ultimately, the present work was able to forward thematic theory both conceptually and computationally. The inclusion of unmarked themes in conjunction with marked themes enriches thematic analyses by readily tracing GIVEN discourse topics through a text. Further delineation of marked themes into separate types and semantic subclasses reveals their functional, logical and contextualizing contribution to the discourse messages that follow. Visualization of the analytical results from the thematic analyses embedded within the user’s text in Thematizer’s web interface additionally affords greater interactability with thematic structure. Thematizer’s ability to analyze multiple documents and simultaneously present their results facilitates intertextual analyses that previous tools lacked. Including the option to export the results from the thematic analyses also provides users with agency over their own texts for subsequent use in their own research. Finally, the analytical results that Thematizer delivers can enable users to further reflect on the structural and logical development of their text.
Dokumententyp: | Dissertationen (Dissertation, LMU München) |
---|---|
Themengebiete: | 400 Sprache
400 Sprache > 410 Linguistik |
Fakultäten: | Fakultät für Sprach- und Literaturwissenschaften |
Sprache der Hochschulschrift: | Englisch |
Datum der mündlichen Prüfung: | 17. November 2023 |
1. Berichterstatter:in: | Sanchez-Stockhammer, Christina |
MD5 Prüfsumme der PDF-Datei: | 021d49862cb47fd26ea1bd22247b8eb2 |
Signatur der gedruckten Ausgabe: | 0001/UMC 30241 |
ID Code: | 33248 |
Eingestellt am: | 06. Mar. 2024 13:44 |
Letzte Änderungen: | 06. Mar. 2024 13:48 |