Handling of realistic missing data scenarios in clinical trials using machine learning techniques

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Missing data problem is a common challenge when designing and analyzing clinical trials, which are the data that are needed for the main analyses but are not collected. If the missing data are not properly imputed/handled, they may cause following issues: reduce the statistical power of the important analysis; they may bias/ confound the treatment effect estimation; they may cause an underestimation of the variability in target variable. Three different types of missingness are defined in Rubin’s 1976 paper. (1) MCAR (missing completely at random): when data are MCAR, “the probability of missingness does not depend on observed or unobserved measurements”, for example, subjects who dropout from the trial due to the reasons that are not related to their health status. (2) MAR (missing at random): when data are MAR, “the probability of missingness depends only on observed measurements conditional on the covariates in the model”, for example, younger subjects (those who don’t think it is necessary to measure their blood pressure as they consider themselves healthier) may more likely to have missing blood pressure. (3) MNAR (missing not at random): when data are MNAR, “the probability of missingness depends on unobserved measurements”, for example, subjects leave the trial because of “lack of efficacy” (i.e., they are not convinced by effec-tiveness of the study drug and hence dropout from the trial). Although all three types of missing data are well defined, it is very difficult to determine the association between missing data and unobserved outcomes in the real-world data; in other words, it is very difficult to justify the MAR assumption in any realistic situation. As EMA suggested in 2010, a combined strategy can be used, e.g., treat the discontinu-ations due to “lack of efficacy” as MNAR data, and treat the discontinuations due to “lost to follow-up” as MAR data. Many statistical methods have been developed to handle missing data under the prerequisite assumption of either MNAR or MAR. However, in the real world, missing data are often mixed with different types of missing mechanisms. This violates the basic assumptions for missing data (i.e., either MNAR or MAR), which leads to a degradation in the processing performance of these methods (Enders, 2010). To handle the missing data problem in reallife situations (e.g., MNAR and MAR mixed together in the same dataset), we propose a missing data prediction framework that are based on machine learning techniques. As Breiman pointed out in his 2001 paper, in the statistical (ma-chine) learning exercise, “the goal is not interpretability, but accurate information”. Along this line of thought, our methods handle MNAR by focusing on (giving more sample weights to) the missing part, meanwhile, and also to handle the MAR data by looking for precise individual (subject-level) information. The problem of MNAR is seen as an imbalanced machine learning exercise, i.e., to oversample the minority cases to compen-sate for the data that are MNAR in certain area.

clinical trials, missing data, machine learning, imbalanced learning, clustering

Haliduola, Halimuniyazi

03. Mar. 2023

2023

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-314928

Haliduola, Halimuniyazi (2023): Handling of realistic missing data scenarios in clinical trials using machine learning techniques. Dissertation, LMU München: Medizinische Fakultät

[thumbnail of Haliduola_Halimuniyazi.pdf]

PDF
Haliduola_Halimuniyazi.pdf
6MB

DOI: 10.5282/edoc.31492

URN: urn:nbn:de:bvb:19-314928

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	clinical trials, missing data, machine learning, imbalanced learning, clustering
Themengebiete:	600 Technik, Medizin, angewandte Wissenschaften 600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin und Gesundheit
Fakultäten:	Medizinische Fakultät
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	3. März 2023
1. Berichterstatter:in:	Mansmann, Ulrich
MD5 Prüfsumme der PDF-Datei:	221d5e7a08d1292364cb53d10f355ed7
Signatur der gedruckten Ausgabe:	0700/UMD 21032
ID Code:	31492
Eingestellt am:	31. Mar. 2023 14:03
Letzte Änderungen:	31. Mar. 2023 14:03

Nur für Administratoren und Editoren: Dokument bearbeiten