Boosting functional regression models

www.lmu.de | UB | Blättern | FAQ

Zur erweiterten Suche

English

Zur erweiterten Suche

In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates. In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor. For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}. The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates.

functional data analysis, functional regression, gradient boosting, variable selection

Brockhaus, Sarah

31. Aug. 2016

2016

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-198685

Brockhaus, Sarah (2016): Boosting functional regression models. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Brockhaus_Sarah.pdf
6MB

DOI: 10.5282/edoc.19868

URN: urn:nbn:de:bvb:19-198685

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	functional data analysis, functional regression, gradient boosting, variable selection
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	31. August 2016
1. Berichterstatter:in:	Greven, Sonja
MD5 Prüfsumme der PDF-Datei:	8c0945108291f9def51d98850b38e180
Signatur der gedruckten Ausgabe:	0001/UMC 24113
ID Code:	19868
Eingestellt am:	27. Sep. 2016 09:08
Letzte Änderungen:	23. Oct. 2020 20:12

Nur für Administratoren und Editoren: Dokument bearbeiten