Thematizer

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Thematizer. a computational approach to thematic theory and analysis

Thematic theory, comprised of the concepts of theme, rheme, and thematic progression, concerns itself with the interplay between word order, information status, propositional content and discourse function. In contemporary research on thematic theory, researchers have begun to leverage computational means for text analysis with respect to thematic structure. However, deficiencies in both the theoretical treatment and computational operationalization of thematic theory have limited writers’ accessibility to thematic structure. The present work set out to address these deficiencies by identifying remaining gaps in thematic theory and by developing the software Thematizer, which automatically analyzes texts in terms of themes, rhemes and thematic progression. To develop and train Thematizer, 30 Wikipedia articles, L1 and L2 university texts, blog articles and lyrics were used. The accuracy of Thematizer, measured with the F1 score, was then validated with ten novel test texts. All 160 texts were first manually analyzed for comparison against the results that the software yielded. The resulting F1 scores for Thematizer’s parsing functionality were then used as a metric for its operationalization of thematic theory via computational means. In turn, Thematizer’s degree of operationalization informed writers’ degree of accessibility to thematic theory. In the identification of themes and rhemes, Thematizer achieved an F1 score of 85.8% for training texts and 92.0% for test texts (cf. 89.1% gold standard). The identification and classification of marked themes exceeded the gold standard of 89.1% through the training texts’ F1 score of 94.9% and the test texts’ F1 score of 93.4%. Finally, only training texts (F1 = 80.2%) exceeded the gold standard of 79.2% for the classification of thematic progression patterns, with test texts yielding an accuracy of F1 = 75.9%. These findings indicate that Thematizer successfully operationalized marked theme identification and classification but was only able to partially operationalize the identification of themes and rhemes in text. Thematic progression, however, was inconsistently operationalized due to the wide range of F1 scores that Thematizer achieved and that were often below the gold standard. Operationalization and thereby accessibility to thematic theory were both facilitated by automated means, which represents a marked advancement in the computational treatment of theme, rheme and thematic progression. Ultimately, the present work was able to forward thematic theory both conceptually and computationally. The inclusion of unmarked themes in conjunction with marked themes enriches thematic analyses by readily tracing GIVEN discourse topics through a text. Further delineation of marked themes into separate types and semantic subclasses reveals their functional, logical and contextualizing contribution to the discourse messages that follow. Visualization of the analytical results from the thematic analyses embedded within the user’s text in Thematizer’s web interface additionally affords greater interactability with thematic structure. Thematizer’s ability to analyze multiple documents and simultaneously present their results facilitates intertextual analyses that previous tools lacked. Including the option to export the results from the thematic analyses also provides users with agency over their own texts for subsequent use in their own research. Finally, the analytical results that Thematizer delivers can enable users to further reflect on the structural and logical development of their text.

Not available

Gahman, Paul

17. Nov. 2023

2023

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-332482

Gahman, Paul (2023): Thematizer: a computational approach to thematic theory and analysis. Dissertation, LMU München: Fakultät für Sprach- und Literaturwissenschaften

Vorschau

Lizenz: Creative Commons: Namensnennung-Nicht Kommerziell-Keine Bearbeitung 4.0 (CC-BY-NC-ND)
PDF
Gahman_Paul.pdf
8MB

DOI: 10.5282/edoc.33248

URN: urn:nbn:de:bvb:19-332482

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Themengebiete:	400 Sprache 400 Sprache > 410 Linguistik
Fakultäten:	Fakultät für Sprach- und Literaturwissenschaften
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	17. November 2023
1. Berichterstatter:in:	Sanchez-Stockhammer, Christina
MD5 Prüfsumme der PDF-Datei:	021d49862cb47fd26ea1bd22247b8eb2
Signatur der gedruckten Ausgabe:	0001/UMC 30241
ID Code:	33248
Eingestellt am:	06. Mar. 2024 13:44
Letzte Änderungen:	06. Mar. 2024 13:48

Nur für Administratoren und Editoren: Dokument bearbeiten