Estimating Box-Cox power transformation parameter via goodness-of-fit tests

Asar, Ozgur
İlk Dağ, Özlem
Box-Cox power transformation is a commonly used methodology to transform the distribution of the data into a normal distribution. The methodology relies on a single transformation parameter. In this study, we focus on the estimation of this parameter. For this purpose, we employ seven popular goodness-of-fit tests for normality, namely Shapiro-Wilk, Anderson-Darling, Cramer-von Mises, Pearson Chi-square, Shapiro-Francia, Lilliefors and Jarque-Bera tests, together with a searching algorithm. The searching algorithm is based on finding the argument of the minimum or maximum depending on the test, i.e., maximum for the Shapiro-Wilk and Shapiro-Francia, minimum for the rest. The artificial covariate method of Dag etal. (2014) is also included for comparison purposes. Simulation studies are implemented to compare the performances of the methods. Results show that Shapiro-Wilk and the artificial covariate method are more effective than the others and Pearson Chi-square is the worst performing method. The methods are also applied to two real-life datasets. The R package AID is proposed for implementation of the aforementioned methods.


Extended lasso-type MARS (LMARS) model in the description of biological network
Agraz, Melih; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2019-01-02)
The multivariate adaptive regression splines (MARS) model is one of the well-known, additive non-parametric models that can deal with highly correlated and nonlinear datasets successfully. From our previous analyses, we have seen that lasso-type MARS (LMARS) can be a strong alternative of the Gaussian graphical model (GGM) which is a well-known probabilistic method to describe the steady-state behaviour of the complex biological systems via the lasso regression. In this study, we extend our original LMARS m...
Diverse classifiers ensemble based on GMDH-type neural network algorithm for binary classification
DAĞ, OSMAN; KAŞIKCI, MERVE; KARABULUT, ERDEM; Alpar, Reha (Informa UK Limited, 2019-12-03)
Group Method of Data Handling (GMDH) - type neural network algorithm is the heuristic self-organizing algorithm to model the sophisticated systems. In this study, we propose a new algorithm assembling different classifiers based on GMDH algorithm for binary classification. A Monte Carlo simulation study is conducted to compare diverse classifier ensemble based on GMDH (dce-GMDH) algorithm to the other well-known classifiers and to give recommendations for applied researchers on the selection of appropriate ...
Minimum variance quadratic unbiased estimation for the variance components in simple linear regression with onefold nested error
Gueven, Ilgehan (Informa UK Limited, 2006-01-01)
The explicit forms of the minimum variance quadratic unbiased estimators (MIVQUEs) of the variance components are given for simple linear regression with onefold nested error. The resulting estimators are more efficient as the ratio of the initial variance components estimates increases and are asymptotically efficient as the ratio tends to infinity.
Effect of estimation in goodness-of-fit tests
Eren, Emrah; Sürücü, Barış; Department of Statistics (2009)
In statistical analysis, distributional assumptions are needed to apply parametric procedures. Assumptions about underlying distribution should be true for accurate statistical inferences. Goodness-of-fit tests are used for checking the validity of the distributional assumptions. To apply some of the goodness-of-fit tests, the unknown population parameters are estimated. The null distributions of test statistics become complicated or depend on the unknown parameters if population parameters are replaced by ...
An evaluation of a novel approach for clustering genes with dissimilar replicates
Cinar, Ozan; İyigün, Cem; İlk Dağ, Özlem (Informa UK Limited, 2020-12-01)
Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a hea...
Citation Formats
O. Asar, Ö. İlk Dağ, and O. DAĞ, “Estimating Box-Cox power transformation parameter via goodness-of-fit tests,” COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, pp. 91–105, 2017, Accessed: 00, 2020. [Online]. Available: