Refinements, extensions and modern applications of conic multivariate adaptive regression splines

Yerlikaya Özkurt, Fatma
Conic Multivariate Adaptive Regression Splines (CMARS) which has been developed at the Institute of Applied Mathematics, METU, as an alternative approach to the well-known data mining tool Multivariate Adaptive Regression Splines (MARS). CMARS is based on given data and a penalized residual sum of squares for MARS, interpreted as a Tikhonov Regularization problem. CMARS treats this problem by a continuous optimization technique called Conic Quadratic Programming (CQP). This doctoral thesis adapts the CMARS model into a wide frame of advanced methods of statistics and applied mathematics. The first application is using CMARS in Generalized Partial Linear Models (GPLMs), a particular form of a semiparametric model, which extends the Generalized Linear Models (GLMs) in that the usual parametric terms are augmented by a single nonparametric component. We prefer GLMs because of their flexibility to the variety of statistical problems and the availability of software to fit the models. There are different kinds of estimation methods for GPLMs. One of the great advantages of semiparametric models consists of some grouping (linear and nonlinear or parametric and nonparametric) which could be done for the input dimensions (or features) in order to assign appropriate submodels to the groups specifically. In this thesis, for the estimation of the parametric model part, we apply the least-squares estimation. On the other hand, we consider CMARS for the nonparametric part to estimate the smooth function. This new algorithm, called CGPLM, has the advantage of higher speed and less complexity, as it accesses the use of interior point methods. The other extension is the use of CMARS method for the outlier identification problem. For this purposes, we provide a new solution by using regularization and CQP techniques to the mean-shift outlier model, which is considered as a parametric method. After that the proposed method is improved by using CMARS to represent the nonlinear structure in the data. The second track of this doctorate study is the use of CMARS method for the parameter identification of Stochastic Differential Equations (SDEs) driven by Brownian motions and fractional Brownian motions (fBms). Both systems of SDEs with standard multi-dimensional Brownian motions and systems of SDEs having correlated Brownian motions are covered in this thesis. Moreover, we introduce the CMARS method to estimate both the spline coefficients and, especially, the Hurst parameter of the SDEs driven by fBms. The theoretical results of this study may lead new implementations and applications in science, technology and finance. This PhD thesis ends with a conclusion and an outlook to future studies.