Logo Logo
Help
Contact
Switch language to German
Statistical approaches for modeling network and public health data
Statistical approaches for modeling network and public health data
The rapid technological advancement that characterized the past few decades has brought about an increasingly large amount and variety of data. This wealth of data naturally comes with further complexity, thus requiring increasingly sophisticated and efficient methodologies to extract valuable information from it. In this context, statistical models can serve as effective tools to obtain interpretable insight from the data while adequately quantifying and accounting for the underlying uncertainty. This thesis deals with the statistical modeling of two broad data categories that are prominent in modern times: network data and public health data. After an introductory Part I, the thesis comprises a total of eleven contributions, which can be divided into three further parts. Part II, composed of four contributions, deals with the statistical analysis of network data. Networks can broadly be defined as groups of interconnected people or things. This thesis focuses mostly on social and economic networks, and on statistical models aimed at capturing and ex- plaining the mechanisms leading to the formation of ties between actors within the network. We specifically concern ourselves with two broad model families, namely latent variable models and exponential random graph models. The first two contributions in this section introduce and compare several models from these classes, and showcase them by applying them to real-world network data. The following two contributions extend and apply these models to answer substantive questions in the social sciences. More specifically, the third contribution extends exponential random graph models to deal with the modeling of a massive dynamic bipartite network of patents and inventors to explore the drivers of innovation, while the fourth one uses latent distance models to map the network of popular Twitter users discussing the COVID-19 pandemic, with the goal of investigating polarization on the platform. Part III, which also comprises four contributions, addresses statistical challenges related to the real-time monitoring and modeling of public health data. More specifically, the chapter tackles questions that emerged during the early stages of the COVID-19 pandemic, mainly by adapting and extending the class of generalized additive mixed models (GAMMs). The fifth contribution develops a statistical model using reported fatal infections data to predict how many of the registered infections will turn out to be lethal in the near future, thereby enabling to effectively monitor the current state of the pandemic. The sixth contribution instead focuses on all reported infections, and proposes a model to nowcast locally detected (but not yet centrally reported) cases by accounting for expected reporting delays, as well as to forecast infections at the regional level in the near future. The seventh contribution proposes a statistical tool to study the dynamics of the case-detection ratio over time, allowing for comparisons of infection figures between different pandemic phases. The chapter is concluded by the eighth contribution, which further demonstrates the effectiveness of GAMMs by applying them to three relevant pandemic-related issues, i.e. the interdependence among infections in different age groups among school children, the nowcasting of COVID-19 related hospitalizations, and the modeling of the weekly occupancy of intensive care units. Finally, Part IV, composed of three contributions, focuses on the principled estimation of excess mortality, which can generally be defined as the number of deaths from all causes during a crisis beyond what would have been expected had the crisis not occurred. More specifically, the ninth contribution develops a point-estimation method by deploying a corrected version of classical life tables to calculate age-adjusted excess mortality, and applies it to obtain estimates the first year of the COVID-19 pandemic (i.e. 2020) in Germany. The tenth contribution applies the same method to provide updated age-specific estimates for 2021. Finally, the eleventh contribution extends the method to incorporate uncertainty quantification, and deploys it at a broader scale to obtain estimates for 30 developed countries in the first two years of the COVID-19 crisis. The results are further compared with existing estimates published in other major scientific outlets, highlighting the importance of proper age adjustment to obtain unbiased figures.
Not available
De Nicola, Giacomo
2024
English
Universitätsbibliothek der Ludwig-Maximilians-Universität München
De Nicola, Giacomo (2024): Statistical approaches for modeling network and public health data. Dissertation, LMU München: Faculty of Mathematics, Computer Science and Statistics
[thumbnail of De_Nicola_Giacomo.pdf]
Preview
PDF
De_Nicola_Giacomo.pdf

22MB

Abstract

The rapid technological advancement that characterized the past few decades has brought about an increasingly large amount and variety of data. This wealth of data naturally comes with further complexity, thus requiring increasingly sophisticated and efficient methodologies to extract valuable information from it. In this context, statistical models can serve as effective tools to obtain interpretable insight from the data while adequately quantifying and accounting for the underlying uncertainty. This thesis deals with the statistical modeling of two broad data categories that are prominent in modern times: network data and public health data. After an introductory Part I, the thesis comprises a total of eleven contributions, which can be divided into three further parts. Part II, composed of four contributions, deals with the statistical analysis of network data. Networks can broadly be defined as groups of interconnected people or things. This thesis focuses mostly on social and economic networks, and on statistical models aimed at capturing and ex- plaining the mechanisms leading to the formation of ties between actors within the network. We specifically concern ourselves with two broad model families, namely latent variable models and exponential random graph models. The first two contributions in this section introduce and compare several models from these classes, and showcase them by applying them to real-world network data. The following two contributions extend and apply these models to answer substantive questions in the social sciences. More specifically, the third contribution extends exponential random graph models to deal with the modeling of a massive dynamic bipartite network of patents and inventors to explore the drivers of innovation, while the fourth one uses latent distance models to map the network of popular Twitter users discussing the COVID-19 pandemic, with the goal of investigating polarization on the platform. Part III, which also comprises four contributions, addresses statistical challenges related to the real-time monitoring and modeling of public health data. More specifically, the chapter tackles questions that emerged during the early stages of the COVID-19 pandemic, mainly by adapting and extending the class of generalized additive mixed models (GAMMs). The fifth contribution develops a statistical model using reported fatal infections data to predict how many of the registered infections will turn out to be lethal in the near future, thereby enabling to effectively monitor the current state of the pandemic. The sixth contribution instead focuses on all reported infections, and proposes a model to nowcast locally detected (but not yet centrally reported) cases by accounting for expected reporting delays, as well as to forecast infections at the regional level in the near future. The seventh contribution proposes a statistical tool to study the dynamics of the case-detection ratio over time, allowing for comparisons of infection figures between different pandemic phases. The chapter is concluded by the eighth contribution, which further demonstrates the effectiveness of GAMMs by applying them to three relevant pandemic-related issues, i.e. the interdependence among infections in different age groups among school children, the nowcasting of COVID-19 related hospitalizations, and the modeling of the weekly occupancy of intensive care units. Finally, Part IV, composed of three contributions, focuses on the principled estimation of excess mortality, which can generally be defined as the number of deaths from all causes during a crisis beyond what would have been expected had the crisis not occurred. More specifically, the ninth contribution develops a point-estimation method by deploying a corrected version of classical life tables to calculate age-adjusted excess mortality, and applies it to obtain estimates the first year of the COVID-19 pandemic (i.e. 2020) in Germany. The tenth contribution applies the same method to provide updated age-specific estimates for 2021. Finally, the eleventh contribution extends the method to incorporate uncertainty quantification, and deploys it at a broader scale to obtain estimates for 30 developed countries in the first two years of the COVID-19 crisis. The results are further compared with existing estimates published in other major scientific outlets, highlighting the importance of proper age adjustment to obtain unbiased figures.