Paulo Galhoz, Ana Cláudia (2025): Development of machine learning and biostatistical models for cancer pharmacogenomics screens. Dissertation, LMU München: Fakultät für Biologie |
Vorschau |
PDF
Paulo_Galhoz_Ana_Claudia.pdf 12MB |
Abstract
Cancer is a complex genetic disease emerging from the accumulation of somatic alterations that drive tumour growth. This disease is remarkably heterogeneous, comprising several subtypes driven by various distinct mutational events and with individual response mechanisms. Notably, its complexity renders this disease hard to research and contributes to be one of the top deadliest worldwide. High-throughput drug screens have empowered numerous targeted and combination therapies for personalised patient treatment by revealing potentially relevant biomarkers. The application of large scale of genomic datasets, such as the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Therapeutics Response Portal (CTRP), has sparked the need for suitable bioinformatic tools to properly mine, model and analyse cancer biomarkers in the data. In this dissertation, I focused on three aims towards cancer biomarker discovery and developed distinct algorithms to analyse each task. Aim 1, analysing drug resistance mechanisms using statistical frame- works; Aim 2, investigating synergistic drug combinations in cells with uncontrolled proliferation markers using curve fitting methodologies; and Aim 3, identifying new cancer-specific driver genes based on a network-based approach. Aim 1: To investigate acquired resistance to a treatment from initially responsive cell lines, I developed an outlier statistical model that identifies unexpectedly resistant cell lines from the GDSC and CCLE drug screens. This method not only reproduced known biomarkers in lung adenocarcinoma, but also outperformed a standardised outlier detection method. Furthermore, the proposed hierarchical statistical frame- work was also tested in terms of false discovery rate bounds. Aim 2: Secondly, I looked into the modelling of drug responses with unexpectedly increase cell viability missed by standard methodologies, and proposed to leverage drug-induced uncontrolled proliferation as a new synergistic combination therapy with drugs that act on fast proliferating cells, e.g., DNA damaging agents. Building on this, I developed two mathematical frameworks based on Gaussian and linear models to capture cancer-type biomarkers of increased viability. Promising candidates in lung cancer were tested in additional drug screen experiments and potential synergistic drug combinations were hypothesised. Aim 3: I proposed the weighted Protein-Protein Interaction (wPPI) tool based on PPI networks, combined with Gene Ontology and Human Phenotype Ontology datasets, to infer new tissue-specific genes closely related to cancer driver genes. Subsequently, the gene expression profiles of the top highest scoring candidates were used to develop drug response machine learning models in breast cancer. The performance of the built models was assessed and cross-compared with models created with several gene feature sets, namely unspecific tissue-specific genes and genes prioritised with other network-based methodology. In summary, this dissertation introduces innovative and robust computational methodologies to advance tissue-specific cancer biomarker discovery. These approaches address multiple challenges associated with limited statistical power in precision oncology, including the investigation of rare phenomena and the insufficient understanding of key players of cancer progression. As an overarching goal, these methodologies are envisioned to not only enhance insights into the complex mechanisms underlying cancer, but also con- tribute to the design of refined targeted therapeutic strategies.
Dokumententyp: | Dissertationen (Dissertation, LMU München) |
---|---|
Keywords: | Cancer, Pharmacogenomics Screens, Machine Learning, Biostatistical Models, Cell lines |
Themengebiete: | 500 Naturwissenschaften und Mathematik
500 Naturwissenschaften und Mathematik > 570 Biowissenschaften, Biologie |
Fakultäten: | Fakultät für Biologie |
Sprache der Hochschulschrift: | Englisch |
Datum der mündlichen Prüfung: | 14. Juli 2025 |
1. Berichterstatter:in: | Menden, Michael |
MD5 Prüfsumme der PDF-Datei: | 4dc064275d64a021973b0cc9305e7870 |
Signatur der gedruckten Ausgabe: | 0001/UMC 31401 |
ID Code: | 35623 |
Eingestellt am: | 19. Aug. 2025 08:55 |
Letzte Änderungen: | 19. Aug. 2025 09:05 |