Kluge, Michael (2021): Combining automated processing and customized analysis for large-scale sequencing data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik |
Vorschau |
PDF
Kluge_Michael.pdf 20MB |
Abstract
Extensive application of high-throughput methods in life sciences has brought substantial new challenges for data analysis. Often many different steps have to be applied to a large number of samples. Here, workflow management systems support scientists through the automated execution of corresponding large analysis workflows. The first part of this cumulative dissertation concentrates on the development of Watchdog, a novel workflow management system for the automated analysis of large-scale experimental data. Watchdog`s main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. A graphical user interface enables workflow construction using a pre-defined toolset without programming experience and a community sharing platform allows scientists to share toolsets and workflows efficiently. Furthermore, we implemented methods for resuming execution of interrupted or partially modified workflows and for automated deployment of software using package managers and container virtualization. Using Watchdog, we implemented default analysis workflows for typical types of large-scale biological experiments, such as RNA-seq and ChIP-seq. Although they can be easily applied to new datasets of the same type, at some point such standard workflows reach their limit and customized methods are required to resolve specific questions. Hence, the second part of this dissertation focuses on combining standard analysis workflows with the development of application-specific novel bioinformatics approaches to address questions of interest to our biological collaboration partners. The first study concentrates on identifying the binding motif of the ZNF768 transcription factor, which consists of two anchor regions connected by a variable linker region. As standard motif finding methods detected only the anchors of the motifs separately, a custom method was developed for determining the spaced motif with the linker region. The second study focused on the effect of CDK12 inhibition on transcription. Results obtained from standard RNA-seq analysis indicated substantial transcript shortening upon CDK12 inhibition. We thus developed a new measure to quantify the degree of transcript shortening. In addition, a customized meta-gene analysis framework was developed to model RNA polymerase II progression using ChIP-seq data. This revealed that CDK12 inhibition causes an RNA polymerase II processivity defect resulting in the detected transcript shortening. In summary, the methods developed in this thesis represent both general contributions to large-scale sequencing data analysis and served to resolve specific questions regarding transcription factor binding and regulation of elongating RNA Polymerase II.
Dokumententyp: | Dissertationen (Dissertation, LMU München) |
---|---|
Keywords: | workflow management system, watchdog, watchdog-wms, next generation sequencing, ZNF768, bipartite binding motif, CDK12, RNAPII processivity defect |
Themengebiete: | 000 Allgemeines, Informatik, Informationswissenschaft
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik |
Fakultäten: | Fakultät für Mathematik, Informatik und Statistik |
Sprache der Hochschulschrift: | Englisch |
Datum der mündlichen Prüfung: | 1. März 2021 |
1. Berichterstatter:in: | Friedel, Caroline |
MD5 Prüfsumme der PDF-Datei: | 1e346573af026feffffc5d9fcc9afee6 |
Signatur der gedruckten Ausgabe: | 0001/UMC 27773 |
ID Code: | 27589 |
Eingestellt am: | 15. Mar. 2021 10:57 |
Letzte Änderungen: | 15. Mar. 2021 10:57 |