| Wiegrebe, Simon (2026): Statistical frameworks for modeling longitudinal and time‑to‑event outcomes: with applications to epidemiology and genetics. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik |
Vorschau |
PDF
Wiegrebe_Simon.pdf 494kB |
Abstract
Time-dependent data, such as longitudinal and time-to-event data, are particularly informative because they enable both between- and within-subject analyses. Yet analyzing this type of data introduces new challenges beyond those inherent in cross-sectional data. While numerous methods exist to model time-dependent data, their application to complex, high-dimensional settings and their combination with machine learning techniques remain underexplored. This dissertation presents statistical frameworks for analyzing longitudinal and time-to-event outcomes, specifically tailored to high-dimensional data and the incorporation of machine learning techniques, with a focus on their applications in epidemiology and genetics. The first part of this dissertation presents approaches for modeling longitudinal data in genetics, where the predictor space is high-dimensional. For many (disease) traits, progression -- that is, trait change -- is of primary interest but difficult to investigate based solely on between-subject comparisons from cross-sectional data. The first contributing article identifies linear mixed models (LMMs) as a well-calibrated and scalable statistical method with type I error control and high power for modeling genetic effects on trait change. The article further demonstrates that modeling genetic effects on trait change as interaction with time or age is advantageous compared to directly modeling the effect on previously computed trait change outcomes. This is because trajectories of arbitrary length can be incorporated and effect size estimates are unbiased. LMMs are subsequently used to identify novel genetic variants associated with kidney function decline in a large-scale UK Biobank dataset. The second contributing article shows that, under certain assumptions, genetic-by-age interactions from cross-sectional data can be indicative of genetic associations with longitudinal trait change and proposes a two-stage approach: genome-wide pre-screening for genetic-by-age interaction in (abundant) cross-sectional data, followed by testing identified variants for longitudinal change in (scarce) independent longitudinal data. The second part of this dissertation focuses on analyzing time-to-event data by integrating machine and deep learning techniques. The third contributing article provides a comprehensive overview of deep learning-based methods for survival analysis according to both deep learning- and survival-specific aspects. The fourth contributing article presents a methodological comparison of different reduction techniques for time-to-event data, which transform survival tasks into standard regression or classification tasks. This allows for the use of a broad variety of estimation techniques, in particular facilitating the use of machine learning algorithms. The fifth contributing article combines these two topics by developing a concrete time-to-event method based on the piecewise exponential additive model (PAM), which is both deep learning- and reduction-based. The third part of this dissertation revisits the task of modeling longitudinal data, but now from the angle of multi-stage disease histories, which are increasingly being derived from longitudinal data. One example is chronic kidney disease, whose multiple stages are defined by clinically meaningful thresholds of a quantitative trait (estimated glomerular filtration rate). While multi-state models are natural candidates for analyzing such multi-stage disease history data, this type of analysis comes with new challenges: dependent left-truncation, multiple time scales, index event bias, and interval-censoring. The final contributing article shows via simulations how a modeling framework based on multi-state PAMs is capable of addressing most of these challenges. This framework is then applied to model transition probabilities of and genetic variant associations with chronic kidney disease onset and progression, using the same UK Biobank dataset as in the first contributing article.
| Dokumententyp: | Dissertationen (Dissertation, LMU München) |
|---|---|
| Themengebiete: | 300 Sozialwissenschaften
300 Sozialwissenschaften > 310 Statistik |
| Fakultäten: | Fakultät für Mathematik, Informatik und Statistik |
| Sprache der Hochschulschrift: | Englisch |
| Datum der mündlichen Prüfung: | 22. Januar 2026 |
| 1. Berichterstatter:in: | Küchenhoff, Helmut |
| MD5 Prüfsumme der PDF-Datei: | 38d40daafe39f42786953859dae57c1c |
| Signatur der gedruckten Ausgabe: | 0001/UMC 31728 |
| ID Code: | 36479 |
| Eingestellt am: | 06. Feb. 2026 16:08 |
| Letzte Änderungen: | 06. Feb. 2026 16:08 |