Sequential Dimension Reduction and Prediction Methods with High-dimensional Microarray Data

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

In this thesis, a novel sequential genes selection and classification (k-SS) method is proposed. The method is analogous to the classical non-linear stepwise variable selection (SVS) methods but unlike any of the SVS methods, this new method uses the misclassification error rates (MERs) as its search criteria for informative marker genes in any given microarray data. Here, the importance of any selected gene is determined based on its marginal contribution at improving the prediction accuracy of the classification rule. This method ensures continuous selection of more genes in as much as the improvements brought into the decision models by the selected genes are considered to be significant enough by some established test criteria. However, further gene selection terminates when none of the remaining genes is capable at improving the prediction accuracy (lowering the MER) of the current model. Therefore, our approach only seeks to select the best combination of k marker genes that are most predictive of the biological samples in any given microarray data sets. An important feature of our new k-SS method is that the size α used by its test is not arbitrarily fixed by the user as common to some of the classical SVS methods. Rather, the value of α at which the best prediction accuracy is achieved (or the best combination of genes is selected) is determined by cross-validation. The new k-SS classifier competes favourably with selected eight existing classification methods using eleven published microarray data sets. The k-SS classifier is very simple to apply and does not require any rigid assumption for its implementation. Another merit of this method lies in its ability to select only those genes that are of biological relevance to the existing cancer sub-groups in microarray data sets. Lastly, we proposed a new preliminary feature selection procedure that employs the cross-validated area under the ROC curve (CVAUC) for gene selection. This method is capable at removing all the irrelevant genes at the preliminary selection stage before any standard classifier like the k-SS method is employed on the remaining data set for final optimum gene selection and classification of mRNA samples. Unlike some other data pruning methods, the new method employs the sub-sampling technique of the v-fold cross-validation to ensure consistency and efficiency of selections made at the preliminary selection stage.

Sequential selection, dimension reduction, k-SS procedure, Shew-normal density.

Yahya, Waheed Babatunde

24. Jun. 2009

2009

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-102544

Yahya, Waheed Babatunde (2009): Sequential Dimension Reduction and Prediction Methods with High-dimensional Microarray Data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

[thumbnail of Yahya_Waheed_Babatunde.pdf]

Vorschau

PDF
Yahya_Waheed_Babatunde.pdf
2MB

DOI: 10.5282/edoc.10254

URN: urn:nbn:de:bvb:19-102544

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Sequential selection, dimension reduction, k-SS procedure, Shew-normal density.
Themengebiete:	500 Naturwissenschaften und Mathematik > 510 Mathematik 500 Naturwissenschaften und Mathematik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	24. Juni 2009
1. Berichterstatter:in:	Ulm, Kurt
MD5 Prüfsumme der PDF-Datei:	7a3585ac5315a1c7ea60ed3e00741143
Signatur der gedruckten Ausgabe:	0001/UMC 17870
ID Code:	10254
Eingestellt am:	30. Jun. 2009 12:37
Letzte Änderungen:	24. Oct. 2020 06:04

Nur für Administratoren und Editoren: Dokument bearbeiten