Logo Logo
Hilfe
Kontakt
Switch language to English
Statistical frameworks for modeling longitudinal and time‑to‑event outcomes. with applications to epidemiology and genetics
Statistical frameworks for modeling longitudinal and time‑to‑event outcomes. with applications to epidemiology and genetics
Time-dependent data, such as longitudinal and time-to-event data, are particularly informative because they enable both between- and within-subject analyses. Yet analyzing this type of data introduces new challenges beyond those inherent in cross-sectional data. While numerous methods exist to model time-dependent data, their application to complex, high-dimensional settings and their combination with machine learning techniques remain underexplored. This dissertation presents statistical frameworks for analyzing longitudinal and time-to-event outcomes, specifically tailored to high-dimensional data and the incorporation of machine learning techniques, with a focus on their applications in epidemiology and genetics. The first part of this dissertation presents approaches for modeling longitudinal data in genetics, where the predictor space is high-dimensional. For many (disease) traits, progression -- that is, trait change -- is of primary interest but difficult to investigate based solely on between-subject comparisons from cross-sectional data. The first contributing article identifies linear mixed models (LMMs) as a well-calibrated and scalable statistical method with type I error control and high power for modeling genetic effects on trait change. The article further demonstrates that modeling genetic effects on trait change as interaction with time or age is advantageous compared to directly modeling the effect on previously computed trait change outcomes. This is because trajectories of arbitrary length can be incorporated and effect size estimates are unbiased. LMMs are subsequently used to identify novel genetic variants associated with kidney function decline in a large-scale UK Biobank dataset. The second contributing article shows that, under certain assumptions, genetic-by-age interactions from cross-sectional data can be indicative of genetic associations with longitudinal trait change and proposes a two-stage approach: genome-wide pre-screening for genetic-by-age interaction in (abundant) cross-sectional data, followed by testing identified variants for longitudinal change in (scarce) independent longitudinal data. The second part of this dissertation focuses on analyzing time-to-event data by integrating machine and deep learning techniques. The third contributing article provides a comprehensive overview of deep learning-based methods for survival analysis according to both deep learning- and survival-specific aspects. The fourth contributing article presents a methodological comparison of different reduction techniques for time-to-event data, which transform survival tasks into standard regression or classification tasks. This allows for the use of a broad variety of estimation techniques, in particular facilitating the use of machine learning algorithms. The fifth contributing article combines these two topics by developing a concrete time-to-event method based on the piecewise exponential additive model (PAM), which is both deep learning- and reduction-based. The third part of this dissertation revisits the task of modeling longitudinal data, but now from the angle of multi-stage disease histories, which are increasingly being derived from longitudinal data. One example is chronic kidney disease, whose multiple stages are defined by clinically meaningful thresholds of a quantitative trait (estimated glomerular filtration rate). While multi-state models are natural candidates for analyzing such multi-stage disease history data, this type of analysis comes with new challenges: dependent left-truncation, multiple time scales, index event bias, and interval-censoring. The final contributing article shows via simulations how a modeling framework based on multi-state PAMs is capable of addressing most of these challenges. This framework is then applied to model transition probabilities of and genetic variant associations with chronic kidney disease onset and progression, using the same UK Biobank dataset as in the first contributing article.
Not available
Wiegrebe, Simon
2026
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Wiegrebe, Simon (2026): Statistical frameworks for modeling longitudinal and time‑to‑event outcomes: with applications to epidemiology and genetics. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Wiegrebe_Simon.pdf]
Vorschau
PDF
Wiegrebe_Simon.pdf

494kB

Abstract

Time-dependent data, such as longitudinal and time-to-event data, are particularly informative because they enable both between- and within-subject analyses. Yet analyzing this type of data introduces new challenges beyond those inherent in cross-sectional data. While numerous methods exist to model time-dependent data, their application to complex, high-dimensional settings and their combination with machine learning techniques remain underexplored. This dissertation presents statistical frameworks for analyzing longitudinal and time-to-event outcomes, specifically tailored to high-dimensional data and the incorporation of machine learning techniques, with a focus on their applications in epidemiology and genetics. The first part of this dissertation presents approaches for modeling longitudinal data in genetics, where the predictor space is high-dimensional. For many (disease) traits, progression -- that is, trait change -- is of primary interest but difficult to investigate based solely on between-subject comparisons from cross-sectional data. The first contributing article identifies linear mixed models (LMMs) as a well-calibrated and scalable statistical method with type I error control and high power for modeling genetic effects on trait change. The article further demonstrates that modeling genetic effects on trait change as interaction with time or age is advantageous compared to directly modeling the effect on previously computed trait change outcomes. This is because trajectories of arbitrary length can be incorporated and effect size estimates are unbiased. LMMs are subsequently used to identify novel genetic variants associated with kidney function decline in a large-scale UK Biobank dataset. The second contributing article shows that, under certain assumptions, genetic-by-age interactions from cross-sectional data can be indicative of genetic associations with longitudinal trait change and proposes a two-stage approach: genome-wide pre-screening for genetic-by-age interaction in (abundant) cross-sectional data, followed by testing identified variants for longitudinal change in (scarce) independent longitudinal data. The second part of this dissertation focuses on analyzing time-to-event data by integrating machine and deep learning techniques. The third contributing article provides a comprehensive overview of deep learning-based methods for survival analysis according to both deep learning- and survival-specific aspects. The fourth contributing article presents a methodological comparison of different reduction techniques for time-to-event data, which transform survival tasks into standard regression or classification tasks. This allows for the use of a broad variety of estimation techniques, in particular facilitating the use of machine learning algorithms. The fifth contributing article combines these two topics by developing a concrete time-to-event method based on the piecewise exponential additive model (PAM), which is both deep learning- and reduction-based. The third part of this dissertation revisits the task of modeling longitudinal data, but now from the angle of multi-stage disease histories, which are increasingly being derived from longitudinal data. One example is chronic kidney disease, whose multiple stages are defined by clinically meaningful thresholds of a quantitative trait (estimated glomerular filtration rate). While multi-state models are natural candidates for analyzing such multi-stage disease history data, this type of analysis comes with new challenges: dependent left-truncation, multiple time scales, index event bias, and interval-censoring. The final contributing article shows via simulations how a modeling framework based on multi-state PAMs is capable of addressing most of these challenges. This framework is then applied to model transition probabilities of and genetic variant associations with chronic kidney disease onset and progression, using the same UK Biobank dataset as in the first contributing article.