| Janßen, Philipp (2025): Improving the methodological basis of cross-species scRNA-seq analysis. Dissertation, LMU München: Fakultät für Biologie |
Vorschau |
PDF
Janssen_Philipp.pdf 72MB |
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a powerful method to explore cell type diversity and gene expression at unprecedented resolution. Extending this approach across species not only enables the identification of conserved and species-specific cell types, but also provides insight into how cellular programs evolve. Comparative single-cell studies in primates are especially valuable for understanding the molecular changes that underlie human-specific traits within an evolutionary framework. However, meaningful cross-species comparisons rely not only on the availability of single-cell data from different organisms, but also on robust data quality, well-matched cellular systems and appropriate computational frameworks for integration. This thesis addresses key challenges in cross-species single-cell transcriptomics, with a focus on improving the methodological foundation for comparative studies in primates. Ensuring good data quality is essential for all single-cell studies and becomes even more important when comparing data across species. Yet technical artifacts are not uncommon and can obscure biological signal and complicate data interpretation. One such artifact is background noise, which originates from cell-free ambient RNA or barcode swapping events. To evaluate the extent and impact of background noise in 10x Genomics data, I established a benchmarking dataset generated from pooled kidney cells of two mouse subspecies. I used naturally occurring genetic variants to determine the origin of individual reads and identify transcripts that were incorrectly assigned to a cell barcode to quantify background noise. I found that background levels varied substantially between cells and replicates, with ambient RNA identified as the primary source. This noise particularly compromises the detection of marker genes, reducing their specificity. Furthermore, I evaluated several computational methods for noise correction and found that most approaches improved marker detection, with CellBender showing the strongest performance. These findings help characterize the nature of background noise and provide practical guidance for its mitigation in future single-cell studies. Besides accurate measurements, cross-species single-cell studies also rely on access to comparable cellular material. For primates in particular, obtaining such material remains a challenge. In this context, induced pluripotent stem cells (iPSC) and their derivates offer a powerful resource for comparative studies. I contributed to the characterization of newly established iPSC lines from various non-human primates (NHP), including vervet monkeys, baboons, rhesus macaques, gorillas and orangutans. My contributions focused on validating the pluripotency and identity of these cell lines using bulk and single-cell RNA-seq data. On the one hand, I helped to classify primary cells, iPSCs and derived cell types based on their expression profiles. On the other hand, I called genetic variants from RNA-seq data for authentication of the cell lines. Finally, I analysed a cross-species dataset of embryoid bodies (EB) derived from human and NHP iPSCs to enable comparative analyses of early primate development. This dataset includes four species and spans a wide range of different cell types. To identify orthologous cell types in this complex setting, I developed a semi-automated pipeline combining classification and manual annotation steps. Based on these annotations I investigated cross-species conservation of gene expression, with a particular focus on the transferability of marker genes. The results showed that while broadly expressed genes are relatively well conserved, many cell type-specific marker genes are less transferable across species. These findings underscore the challenges of cell type annotation in cross-species settings and provide a curated dataset and computational approach to support future comparative analyses in primates.
| Dokumententyp: | Dissertationen (Dissertation, LMU München) |
|---|---|
| Keywords: | scRNA-seq, transcriptomics, computational biology, genomics |
| Themengebiete: | 500 Naturwissenschaften und Mathematik
500 Naturwissenschaften und Mathematik > 570 Biowissenschaften, Biologie |
| Fakultäten: | Fakultät für Biologie |
| Sprache der Hochschulschrift: | Englisch |
| Datum der mündlichen Prüfung: | 9. Oktober 2025 |
| 1. Berichterstatter:in: | Hellmann, Ines |
| MD5 Prüfsumme der PDF-Datei: | ccb2fc910230a42eadc6a7db230e79a8 |
| Signatur der gedruckten Ausgabe: | 0001/UMC 31554 |
| ID Code: | 35955 |
| Eingestellt am: | 12. Nov. 2025 14:29 |
| Letzte Änderungen: | 12. Nov. 2025 14:30 |