Asymmetric Confidence Interval with Box-Cox Transformation in R

2017-12-08
Dağ, Osman
İlk Dağ, Özlem
Normal distribution is important in statistical literature since most of the statistical methods are based on normal distribution such as t-test, analysis of variance and regression analysis. However, it is difficult to satisfy the normality assumption for real life datasets. Box–Cox power transformation is the most well-known and commonly utilized remedy [2]. The algorithm relies on a single transformation parameter. In the original article [2], maximum likelihood estimation was proposed for the estimation of transformation parameter. There are other algorithms to obtain transformation parameter. Some of them include the studies of [1], [3] and [4]. Box– Cox power transformation is given by 𝑦𝑖 𝑇 = { 𝑦𝑖 𝜆−1 𝜆 , 𝑖𝑓 𝜆 ≠ 0 𝑙𝑜𝑔 𝑦𝑖 , 𝑖𝑓 𝜆 = 0 . Here, 𝜆 is the power transformation parameter to be estimated, 𝑦𝑖 ’s are the observed data, 𝑦𝑖 𝑇 ’s are transformed data. In this study, we focus on obtaining the mean of data and a confidence interval for it when Box-Cox transformation is applied. Since the transformation is applied, the scale of the data has changed. Therefore, reporting the mean and confidence interval obtained from transformed data is not meaningful for the researchers. Besides, reporting mean and symmetric confidence interval obtained from original data becomes misleading for the researchers since the normality assumption is not satisfied. Therefore, it is pointed out that mean and asymmetric confidence interval obtained from back transformed data must be reported. We have written down a generic function to obtain the mean of data and a confidence interval for it when Box-Cox transformation is applied. It is released under R package AID with the name of “confInt” for implementation.
10th International Statistics Congress, (6 - 08 Aralık 2017)

Suggestions

Estimation and hypothesis testing in stochastic regression
Sazak, Hakan Savaş; Tiku, Moti Lal; İslam, Qamarul; Department of Statistics (2003)
Regression analysis is very popular among researchers in various fields but almost all the researchers use the classical methods which assume that X is nonstochastic and the error is normally distributed. However, in real life problems, X is generally stochastic and error can be nonnormal. Maximum likelihood (ML) estimation technique which is known to have optimal features, is very problematic in situations when the distribution of X (marginal part) or error (conditional part) is nonnormal. Modified maximum...
Parameter estimation in generalized partial linear models with conic quadratic programming
Çelik, Gül; Weber, Gerhard Wilhelm; Department of Scientific Computing (2010)
In statistics, regression analysis is a technique, used to understand and model the relationship between a dependent variable and one or more independent variables. Multiple Adaptive Regression Spline (MARS) is a form of regression analysis. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models non-linearities and interactions. MARS is very important in both classification and regression, with an increasing number of applications in many areas...
Non-normal bivariate distributions: estimation and hypothesis testing
Qunsiyeh, Sahar Botros; Tiku, Moti Lal; Department of Statistics (2007)
When using data for estimating the parameters in a bivariate distribution, the tradition is to assume that data comes from a bivariate normal distribution. If the distribution is not bivariate normal, which often is the case, the maximum likelihood (ML) estimators are intractable and the least square (LS) estimators are inefficient. Here, we consider two independent sets of bivariate data which come from non-normal populations. We consider two distinctive distributions: the marginal and the conditional dist...
Parameter estimation in generalized partial linear models with Tikhanov regularization
Kayhan, Belgin; Karasözen, Bülent; Department of Scientific Computing (2010)
Regression analysis refers to techniques for modeling and analyzing several variables in statistical learning. There are various types of regression models. In our study, we analyzed Generalized Partial Linear Models (GPLMs), which decomposes input variables into two sets, and additively combines classical linear models with nonlinear model part. By separating linear models from nonlinear ones, an inverse problem method Tikhonov regularization was applied for the nonlinear submodels separately, within the e...
Estimation and hypothesis testing in multivariate linear regression models under non normality
İslam, Muhammed Qamarul (Informa UK Limited, 2017-01-01)
This paper discusses the problem of statistical inference in multivariate linear regression models when the errors involved are non normally distributed. We consider multivariate t-distribution, a fat-tailed distribution, for the errors as alternative to normal distribution. Such non normality is commonly observed in working with many data sets, e.g., financial data that are usually having excess kurtosis. This distribution has a number of applications in many other areas of research as well. We use modifie...
Citation Formats
O. Dağ and Ö. İlk Dağ, “Asymmetric Confidence Interval with Box-Cox Transformation in R,” presented at the 10th International Statistics Congress, (6 - 08 Aralık 2017), 2017, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/85386.