Bayesian statistical approach for protein residue-residue contact prediction

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures.

contact prediction, Bayesian statistical modelling, contrastive divergence, direct coupling analysis, evolutionary couplings, Potts model, random forest

Vorberg, Susann

11. Dec. 2017

2017

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-216353

Vorberg, Susann (2017): Bayesian statistical approach for protein residue-residue contact prediction. Dissertation, LMU München: Fakultät für Chemie und Pharmazie

Vorschau

PDF
Vorberg_Susann.pdf
11MB

DOI: 10.5282/edoc.21635

URN: urn:nbn:de:bvb:19-216353

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	contact prediction, Bayesian statistical modelling, contrastive divergence, direct coupling analysis, evolutionary couplings, Potts model, random forest
Themengebiete:	500 Naturwissenschaften und Mathematik 500 Naturwissenschaften und Mathematik > 540 Chemie
Fakultäten:	Fakultät für Chemie und Pharmazie
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	11. Dezember 2017
1. Berichterstatter:in:	Söding, Johannes
MD5 Prüfsumme der PDF-Datei:	9a8df85ddcef9dc31f11d1f95a51c36b
Signatur der gedruckten Ausgabe:	0001/UMC 25156
ID Code:	21635
Eingestellt am:	22. Dec. 2017 13:04
Letzte Änderungen:	23. Oct. 2020 18:06

Nur für Administratoren und Editoren: Dokument bearbeiten