Combining automated processing and customized analysis for large-scale sequencing data

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Extensive application of high-throughput methods in life sciences has brought substantial new challenges for data analysis. Often many different steps have to be applied to a large number of samples. Here, workflow management systems support scientists through the automated execution of corresponding large analysis workflows. The first part of this cumulative dissertation concentrates on the development of Watchdog, a novel workflow management system for the automated analysis of large-scale experimental data. Watchdog`s main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. A graphical user interface enables workflow construction using a pre-defined toolset without programming experience and a community sharing platform allows scientists to share toolsets and workflows efficiently. Furthermore, we implemented methods for resuming execution of interrupted or partially modified workflows and for automated deployment of software using package managers and container virtualization. Using Watchdog, we implemented default analysis workflows for typical types of large-scale biological experiments, such as RNA-seq and ChIP-seq. Although they can be easily applied to new datasets of the same type, at some point such standard workflows reach their limit and customized methods are required to resolve specific questions. Hence, the second part of this dissertation focuses on combining standard analysis workflows with the development of application-specific novel bioinformatics approaches to address questions of interest to our biological collaboration partners. The first study concentrates on identifying the binding motif of the ZNF768 transcription factor, which consists of two anchor regions connected by a variable linker region. As standard motif finding methods detected only the anchors of the motifs separately, a custom method was developed for determining the spaced motif with the linker region. The second study focused on the effect of CDK12 inhibition on transcription. Results obtained from standard RNA-seq analysis indicated substantial transcript shortening upon CDK12 inhibition. We thus developed a new measure to quantify the degree of transcript shortening. In addition, a customized meta-gene analysis framework was developed to model RNA polymerase II progression using ChIP-seq data. This revealed that CDK12 inhibition causes an RNA polymerase II processivity defect resulting in the detected transcript shortening. In summary, the methods developed in this thesis represent both general contributions to large-scale sequencing data analysis and served to resolve specific questions regarding transcription factor binding and regulation of elongating RNA Polymerase II.

workflow management system, watchdog, watchdog-wms, next generation sequencing, ZNF768, bipartite binding motif, CDK12, RNAPII processivity defect

Kluge, Michael

01. Mar. 2021

2021

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-275891

Kluge, Michael (2021): Combining automated processing and customized analysis for large-scale sequencing data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Kluge_Michael.pdf
20MB

DOI: 10.5282/edoc.27589

URN: urn:nbn:de:bvb:19-275891

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	workflow management system, watchdog, watchdog-wms, next generation sequencing, ZNF768, bipartite binding motif, CDK12, RNAPII processivity defect
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	1. März 2021
1. Berichterstatter:in:	Friedel, Caroline
MD5 Prüfsumme der PDF-Datei:	1e346573af026feffffc5d9fcc9afee6
Signatur der gedruckten Ausgabe:	0001/UMC 27773
ID Code:	27589
Eingestellt am:	15. Mar. 2021 10:57
Letzte Änderungen:	15. Mar. 2021 10:57

Nur für Administratoren und Editoren: Dokument bearbeiten