Gahman, Paul (2023): Thematizer: a computational approach to thematic theory and analysis. Dissertation, LMU München: Faculty for Languages and Literatures |
Preview |
Licence: Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 (CC-BY-NC-ND) Gahman_Paul.pdf 8MB |
Abstract
Thematic theory, comprised of the concepts of theme, rheme, and thematic progression, concerns itself with the interplay between word order, information status, propositional content and discourse function. In contemporary research on thematic theory, researchers have begun to leverage computational means for text analysis with respect to thematic structure. However, deficiencies in both the theoretical treatment and computational operationalization of thematic theory have limited writers’ accessibility to thematic structure. The present work set out to address these deficiencies by identifying remaining gaps in thematic theory and by developing the software Thematizer, which automatically analyzes texts in terms of themes, rhemes and thematic progression. To develop and train Thematizer, 30 Wikipedia articles, L1 and L2 university texts, blog articles and lyrics were used. The accuracy of Thematizer, measured with the F1 score, was then validated with ten novel test texts. All 160 texts were first manually analyzed for comparison against the results that the software yielded. The resulting F1 scores for Thematizer’s parsing functionality were then used as a metric for its operationalization of thematic theory via computational means. In turn, Thematizer’s degree of operationalization informed writers’ degree of accessibility to thematic theory. In the identification of themes and rhemes, Thematizer achieved an F1 score of 85.8% for training texts and 92.0% for test texts (cf. 89.1% gold standard). The identification and classification of marked themes exceeded the gold standard of 89.1% through the training texts’ F1 score of 94.9% and the test texts’ F1 score of 93.4%. Finally, only training texts (F1 = 80.2%) exceeded the gold standard of 79.2% for the classification of thematic progression patterns, with test texts yielding an accuracy of F1 = 75.9%. These findings indicate that Thematizer successfully operationalized marked theme identification and classification but was only able to partially operationalize the identification of themes and rhemes in text. Thematic progression, however, was inconsistently operationalized due to the wide range of F1 scores that Thematizer achieved and that were often below the gold standard. Operationalization and thereby accessibility to thematic theory were both facilitated by automated means, which represents a marked advancement in the computational treatment of theme, rheme and thematic progression. Ultimately, the present work was able to forward thematic theory both conceptually and computationally. The inclusion of unmarked themes in conjunction with marked themes enriches thematic analyses by readily tracing GIVEN discourse topics through a text. Further delineation of marked themes into separate types and semantic subclasses reveals their functional, logical and contextualizing contribution to the discourse messages that follow. Visualization of the analytical results from the thematic analyses embedded within the user’s text in Thematizer’s web interface additionally affords greater interactability with thematic structure. Thematizer’s ability to analyze multiple documents and simultaneously present their results facilitates intertextual analyses that previous tools lacked. Including the option to export the results from the thematic analyses also provides users with agency over their own texts for subsequent use in their own research. Finally, the analytical results that Thematizer delivers can enable users to further reflect on the structural and logical development of their text.
Item Type: | Theses (Dissertation, LMU Munich) |
---|---|
Subjects: | 400 Language 400 Language > 410 Linguistics |
Faculties: | Faculty for Languages and Literatures |
Language: | English |
Date of oral examination: | 17. November 2023 |
1. Referee: | Sanchez-Stockhammer, Christina |
MD5 Checksum of the PDF-file: | 021d49862cb47fd26ea1bd22247b8eb2 |
Signature of the printed copy: | 0001/UMC 30241 |
ID Code: | 33248 |
Deposited On: | 06. Mar 2024 13:44 |
Last Modified: | 06. Mar 2024 13:48 |