| Sauer, Christina (2025): Optimistic bias in the evaluation of statistical methods: illustrations and possible solutions. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik |
Vorschau |
PDF
Sauer_Christina.pdf 12MB |
Abstract
Benchmark studies are an important tool for assessing the properties of statistical methods by evaluating and comparing them on simulated or real data. Conducting such studies requires researchers to make many choices, for example the specific methods to compare as well as the data and performance measures to use for the assessment. From applied research, which examines the models produced by methods rather than the methods themselves, it is well known that such flexibility, combined with the inherent non-neutrality of researchers, may lead to results biased in the direction of their expectations. This deviation can be referred to as optimistic bias and may, for example, manifest as false positive rejections in hypothesis testing. In light of this, there is concern that optimistic bias may also occur in benchmark studies. Such bias is particularly likely to arise in studies that accompany the proposal of a new method, where researchers are clearly not neutral, potentially causing false claims of superiority. This thesis adds to existing work by broadening the discussion on how optimistic bias can arise in benchmark studies, while also addressing the possibility that performance differences between studies result from factors other than optimistic bias. Furthermore, it provides additional strategies to reduce optimistic bias. To this end, the cumulative thesis comprises four contributions. The first contribution considers the often-overlooked role of preprocessing steps, such as variable selection or transformation, in the generation and evaluation of prediction models. By formalizing these choices as preprocessing hyperparameters, it highlights their impact and potential for misuse. While being the only contribution not situated in methodological but in applied research, the insights of this contribution are relevant to both contexts, as the evaluation procedures it discusses closely parallel those used in benchmark studies. The second contribution extends an existing benchmark study to empirically illustrate how results can vary when different design and analysis decisions are made, and how this variability can be easily exploited to obtain favorable results. As the first contribution, it also examines important but rarely addressed choices, specifically the handling of missing performance values and the derivation of method rankings. It further proposes an approach for visualizing the results obtained from different benchmark variants. The widely noted tendency for newly proposed methods to perform best in the benchmark studies accompanying their introduction is the focus of the third contribution. Through a cross-design validation experiment, where two methods are reevaluated using each other’s original benchmark study setup, it explores the roles of optimistic bias, researcher expertise, and mismatches between original and subsequent study settings in explaining performance differences. Finally, the fourth contribution focuses on the choice of data in benchmark studies, in particular the generation of data using parametric simulations. A common approach is to base these simulations on real datasets, yet in practice only one or two datasets are typically used, and the rationale for their selection is often unclear. In addition to formalizing real-data-based parametric simulations, the fourth contribution promotes a more systematic procedure for selecting real datasets, clarifying the data settings to which the benchmark study’s conclusions are intended to generalize and increasing their representativeness for that scope.
Abstract
| Dokumententyp: | Dissertationen (Dissertation, LMU München) |
|---|---|
| Keywords: | optimistic bias, metascience, researcher degrees of freedom, benchmark studies, method evaluation |
| Themengebiete: | 300 Sozialwissenschaften
300 Sozialwissenschaften > 310 Statistik |
| Fakultäten: | Fakultät für Mathematik, Informatik und Statistik |
| Sprache der Hochschulschrift: | Englisch |
| Datum der mündlichen Prüfung: | 19. Dezember 2025 |
| 1. Berichterstatter:in: | Boulesteix, Anne-Laure |
| MD5 Prüfsumme der PDF-Datei: | a53d6ab8d62e04a998ff2ba37712a72d |
| Signatur der gedruckten Ausgabe: | 0001/UMC 31720 |
| ID Code: | 36395 |
| Eingestellt am: | 06. Feb. 2026 15:50 |
| Letzte Änderungen: | 06. Feb. 2026 15:50 |