Logo Logo
Hilfe
Kontakt
Switch language to English
Boosting functional regression models
Boosting functional regression models
In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates. In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor. For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}. The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates.
functional data analysis, functional regression, gradient boosting, variable selection
Brockhaus, Sarah
2016
Englisch
Universitätsbibliothek der Ludwig-Maximilians-Universität München
Brockhaus, Sarah (2016): Boosting functional regression models. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik
[thumbnail of Brockhaus_Sarah.pdf]
Vorschau
PDF
Brockhaus_Sarah.pdf

6MB

Abstract

In functional data analysis, the data consist of functions that are defined on a continuous domain. In practice, functional variables are observed on some discrete grid. Regression models are important tools to capture the impact of explanatory variables on the response and are challenging in the case of functional data. In this thesis, a generic framework is proposed that includes scalar-on-function, function-on-scalar and function-on-function regression models. Within this framework, quantile regression models, generalized additive models and generalized additive models for location, scale and shape can be derived by optimizing the corresponding loss functions. The additive predictors can contain a variety of covariate effects, for example linear, smooth and interaction effects of scalar and functional covariates. In the first part, the functional linear array model is introduced. This model is suited for responses observed on a common grid and covariates that do not vary over the domain of the response. Array models achieve computational efficiency by taking advantage of the Kronecker product in the design matrix. In the second part, the focus is on models without array structure, which are capable to capture situations with responses observed on irregular grids and/or time-varying covariates. This includes in particular models with historical functional effects. For situations, in which the functional response and covariate are both observed over the same time domain, a historical functional effect induces an association between response and covariate such that only past values of the covariate influence the current value of the response. In this model class, effects with more general integration limits, like lag and lead effects, can be specified. In the third part, the framework is extended to generalized additive models for location, scale and shape where all parameters of the conditional response distribution can depend on covariate effects. The conditional response distribution can be modeled very flexibly by relating each distribution parameter with a link function to a linear predictor. For all parts, estimation is conducted by a component-wise gradient boosting algorithm. Boosting is an ensemble method that pursues a divide-and-conquer strategy for optimizing an expected loss criterion. This provides great flexibility for the regression models. For example, minimizing the check function yields quantile regression and minimizing the negative log-likelihood generalized additive models for location, scale and shape. The estimator is updated iteratively to minimize the loss criterion along the steepest gradient descent. The model is represented as a sum of simple (penalized) regression models, the so called base-learners, that separately fit the negative gradient in each step where only the best-fitting base-learner is updated. Component-wise boosting allows for high-dimensional data settings and for automatic, data-driven variable selection. To adapt boosting for regression with functional data, the loss is integrated over the domain of the response and base-learners suited to functional effects are implemented. To enhance the availability of functional regression models for practitioners, a comprehensive implementation of the methods is provided in the \textsf{R} add-on package \pkg{FDboost}. The flexibility of the regression framework is highlighted by several applications from different fields. Some features of the functional linear array model are illustrated using data on curing resin for car production, heat values of fossil fuels and Canadian climate data. These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively. The methodological developments for non-array models are motivated by biotechnological data on fermentations, modeling a key process variable by a historical functional model. The motivating application for functional generalized additive models for location, scale and shape is a time series on stock returns where expectation and standard deviation are modeled depending on scalar and functional covariates.