Dimension reduction and Classification with High-Dimensional Microarray Data

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

Usual microarray data sets include only a handful of observations, but several thousands of predictor variables. Transforming the high-dimensional predictor space to make classification (for instance cancer diagnosis) possible is a major challenge. This thesis deals with various dimension reduction approaches which can handle such data. Chapter 2 gives an introduction into classification with microarray data as well as an overview of a few specific problems such as variable selection and comparison of classification methods. In Chapter 3, I discuss a particular class of interaction structures in the classification framework: "emerging patterns". I propose a new and more general definition referring to underlying probabilities and present a new simple method which is based on the CART algorithm to find the corresponding empirical patterns in concrete data sets. In addition, the detected patterns can be used to define new variables for classification. Thus, I propose a simple scheme to use the patterns to improve the performance of classification procedures. I implemented the search algorithm as well as the classification procedure in the language R. Some of these programs are publicly available from my homepage. Chapter 4 deals with classical linear dimension reduction methods. In the context of binary classification with continuous predictors, I prove two properties concerning the connections between Partial Least Squares (PLS) dimension reduction, between-group PCA and between linear discriminant analysis and between-group PCA. PLS dimension reduction for classification is examined thoroughly in Chapter 5. The classification procedure consisting of PLS dimension reduction and linear discriminant analysis on the new components is compared favorably with some of the best state-of-the-art classification methods using nine real microarray cancer data sets. Moreover, I apply a boosting algorithm to this classification method, which is a novel approach. In addition, I suggest a simple procedure to choose the number of PLS components. At last, I examine the connection between PLS dimension reduction and variable selection and prove a property concerning the equivalence between a common univariate selection criterion and a variable selection approach based on the first PLS component.

Classification, supervised learning, discriminant analysis, dimension reduction, feature extraction, gene expression data, microarray data, partial least squares, emerging patterns

Boulesteix, Anne-Laure

22. Feb. 2005

2005

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-28017

Boulesteix, Anne-Laure (2005): Dimension reduction and Classification with High-Dimensional Microarray Data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

[thumbnail of Boulesteix_Anne-Laure.pdf]

Vorschau

PDF
Boulesteix_Anne-Laure.pdf
524kB

DOI: 10.5282/edoc.2801

URN: urn:nbn:de:bvb:19-28017

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Classification, supervised learning, discriminant analysis, dimension reduction, feature extraction, gene expression data, microarray data, partial least squares, emerging patterns
Themengebiete:	500 Naturwissenschaften und Mathematik 500 Naturwissenschaften und Mathematik > 510 Mathematik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	22. Februar 2005
1. Berichterstatter:in:	Tutz, Gerhard
MD5 Prüfsumme der PDF-Datei:	46e88227531636c13ecddb1d8521a0bc
Signatur der gedruckten Ausgabe:	0001/UMC 14459
ID Code:	2801
Eingestellt am:	05. Apr. 2005
Letzte Änderungen:	24. Oct. 2020 11:08

Nur für Administratoren und Editoren: Dokument bearbeiten