Logo Logo
Hilfe
Kontakt
Switch language to English
Development of machine learning and biostatistical models for cancer pharmacogenomics screens
Development of machine learning and biostatistical models for cancer pharmacogenomics screens
Cancer is a complex genetic disease emerging from the accumulation of somatic alterations that drive tumour growth. This disease is remarkably heterogeneous, comprising several subtypes driven by various distinct mutational events and with individual response mechanisms. Notably, its complexity renders this disease hard to research and contributes to be one of the top deadliest worldwide. High-throughput drug screens have empowered numerous targeted and combination therapies for personalised patient treatment by revealing potentially relevant biomarkers. The application of large scale of genomic datasets, such as the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Therapeutics Response Portal (CTRP), has sparked the need for suitable bioinformatic tools to properly mine, model and analyse cancer biomarkers in the data. In this dissertation, I focused on three aims towards cancer biomarker discovery and developed distinct algorithms to analyse each task. Aim 1, analysing drug resistance mechanisms using statistical frame- works; Aim 2, investigating synergistic drug combinations in cells with uncontrolled proliferation markers using curve fitting methodologies; and Aim 3, identifying new cancer-specific driver genes based on a network-based approach. Aim 1: To investigate acquired resistance to a treatment from initially responsive cell lines, I developed an outlier statistical model that identifies unexpectedly resistant cell lines from the GDSC and CCLE drug screens. This method not only reproduced known biomarkers in lung adenocarcinoma, but also outperformed a standardised outlier detection method. Furthermore, the proposed hierarchical statistical frame- work was also tested in terms of false discovery rate bounds. Aim 2: Secondly, I looked into the modelling of drug responses with unexpectedly increase cell viability missed by standard methodologies, and proposed to leverage drug-induced uncontrolled proliferation as a new synergistic combination therapy with drugs that act on fast proliferating cells, e.g., DNA damaging agents. Building on this, I developed two mathematical frameworks based on Gaussian and linear models to capture cancer-type biomarkers of increased viability. Promising candidates in lung cancer were tested in additional drug screen experiments and potential synergistic drug combinations were hypothesised. Aim 3: I proposed the weighted Protein-Protein Interaction (wPPI) tool based on PPI networks, combined with Gene Ontology and Human Phenotype Ontology datasets, to infer new tissue-specific genes closely related to cancer driver genes. Subsequently, the gene expression profiles of the top highest scoring candidates were used to develop drug response machine learning models in breast cancer. The performance of the built models was assessed and cross-compared with models created with several gene feature sets, namely unspecific tissue-specific genes and genes prioritised with other network-based methodology. In summary, this dissertation introduces innovative and robust computational methodologies to advance tissue-specific cancer biomarker discovery. These approaches address multiple challenges associated with limited statistical power in precision oncology, including the investigation of rare phenomena and the insufficient understanding of key players of cancer progression. As an overarching goal, these methodologies are envisioned to not only enhance insights into the complex mechanisms underlying cancer, but also con- tribute to the design of refined targeted therapeutic strategies.
Cancer, Pharmacogenomics Screens, Machine Learning, Biostatistical Models, Cell lines
Paulo Galhoz, Ana Cláudia
2025
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Paulo Galhoz, Ana Cláudia (2025): Development of machine learning and biostatistical models for cancer pharmacogenomics screens. Dissertation, LMU München: Fakultät für Biologie
[thumbnail of Paulo_Galhoz_Ana_Claudia.pdf]
Vorschau
PDF
Paulo_Galhoz_Ana_Claudia.pdf

12MB

Abstract

Cancer is a complex genetic disease emerging from the accumulation of somatic alterations that drive tumour growth. This disease is remarkably heterogeneous, comprising several subtypes driven by various distinct mutational events and with individual response mechanisms. Notably, its complexity renders this disease hard to research and contributes to be one of the top deadliest worldwide. High-throughput drug screens have empowered numerous targeted and combination therapies for personalised patient treatment by revealing potentially relevant biomarkers. The application of large scale of genomic datasets, such as the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Therapeutics Response Portal (CTRP), has sparked the need for suitable bioinformatic tools to properly mine, model and analyse cancer biomarkers in the data. In this dissertation, I focused on three aims towards cancer biomarker discovery and developed distinct algorithms to analyse each task. Aim 1, analysing drug resistance mechanisms using statistical frame- works; Aim 2, investigating synergistic drug combinations in cells with uncontrolled proliferation markers using curve fitting methodologies; and Aim 3, identifying new cancer-specific driver genes based on a network-based approach. Aim 1: To investigate acquired resistance to a treatment from initially responsive cell lines, I developed an outlier statistical model that identifies unexpectedly resistant cell lines from the GDSC and CCLE drug screens. This method not only reproduced known biomarkers in lung adenocarcinoma, but also outperformed a standardised outlier detection method. Furthermore, the proposed hierarchical statistical frame- work was also tested in terms of false discovery rate bounds. Aim 2: Secondly, I looked into the modelling of drug responses with unexpectedly increase cell viability missed by standard methodologies, and proposed to leverage drug-induced uncontrolled proliferation as a new synergistic combination therapy with drugs that act on fast proliferating cells, e.g., DNA damaging agents. Building on this, I developed two mathematical frameworks based on Gaussian and linear models to capture cancer-type biomarkers of increased viability. Promising candidates in lung cancer were tested in additional drug screen experiments and potential synergistic drug combinations were hypothesised. Aim 3: I proposed the weighted Protein-Protein Interaction (wPPI) tool based on PPI networks, combined with Gene Ontology and Human Phenotype Ontology datasets, to infer new tissue-specific genes closely related to cancer driver genes. Subsequently, the gene expression profiles of the top highest scoring candidates were used to develop drug response machine learning models in breast cancer. The performance of the built models was assessed and cross-compared with models created with several gene feature sets, namely unspecific tissue-specific genes and genes prioritised with other network-based methodology. In summary, this dissertation introduces innovative and robust computational methodologies to advance tissue-specific cancer biomarker discovery. These approaches address multiple challenges associated with limited statistical power in precision oncology, including the investigation of rare phenomena and the insufficient understanding of key players of cancer progression. As an overarching goal, these methodologies are envisioned to not only enhance insights into the complex mechanisms underlying cancer, but also con- tribute to the design of refined targeted therapeutic strategies.