The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.
Published in | American Journal of Theoretical and Applied Statistics (Volume 11, Issue 1) |
DOI | 10.11648/j.ajtas.20221101.14 |
Page(s) | 27-35 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2022. Published by Science Publishing Group |
Outliers, Coefficient of Determination Ratio, Linear Regression, Regression Diagnostics, Influential Observation
[1] | Chatterjee, S., & Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1 (3), 379-393. |
[2] | Chatterjee, S., & Hadi, A. S. (2006). Regression analysis by example. New Jersey, NJ: John Wiley & Sons. |
[3] | Hadi, A. S. (1992). A new measure of overall potential influence in linear regression. Journal of the Royal Statistical Society, series B (Methodological), 54, 761-771. |
[4] | Pena, D. (2005). A new statistic for influence in linear regression. Technometrics, 47 (1), 1-12. |
[5] | Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York, NY: John Wiley and Sons. |
[6] | Cook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression. New York, NY: Chapman and Hall. |
[7] | Draper, N. R., & John, J. A. (1981). Influential observations and outliers in regression. Technometrics, 23 (1), 21-26. |
[8] | Hadi, A. S., & Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association, 88 (424), 1264-1272. |
[9] | Hawkins, D. M. (1991). Diagnostics for the use with regression recursive residuals, Technometrics, 33 (2), 221-234. |
[10] | Lawrence, A. J. (1995). Deletion influence and masking in regression, Journal of the Royal Statistical Society, Series B (Methodological), 57 (1), 181-189. |
[11] | Cook, R. D. (1977). Detection of influential observations in linear regression, Technometrics, 22: 494–508. |
[12] | Belsley, D. A., Kuh, E. & Welsch, R. E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity (2nd ed.). New Jersey, NJ: John Wiley & Sons. |
[13] | Zakaria, A., Howard, N. K., & Nkansah, B. K. (2014). On the detection of influential outliers in linear regression analysis, American Journal of Theoretical and Applied Statistics, 3 (4), 100-106. doi: 10.11648/j.ajtas.20140304.14. |
[14] | Rencher, A. C. & Schaalje, G. B. (2008). Linear models in statistics (2nd ed.). New Jersey, NJ: John Wiley & Sons. |
[15] | Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., & Hothorn, T. (2021). mvtnorm: Multivariate Normal and t Distributions. R package version 1.1-3, https://CRAN.R-project.org/package=mvtnorm. |
[16] | Siniksaran, E. & Satman, M. H. (2011). PURO: A package for unmasking regression outliers, Gazi University Journal of Science, 24 (1), 59-68. |
[17] | Billor, N., Chatterjee, S., & Hadi, A. S. (2006). A re-weighted least squares method for robust regression estimation, American Journal of Mathematical and Management Science, 26 (3&4), 229-252. |
APA Style
Arimiyaw Zakaria, Benony Kwaku Gordor, Bismark Kwao Nkansah. (2022). On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 11(1), 27-35. https://doi.org/10.11648/j.ajtas.20221101.14
ACS Style
Arimiyaw Zakaria; Benony Kwaku Gordor; Bismark Kwao Nkansah. On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. Am. J. Theor. Appl. Stat. 2022, 11(1), 27-35. doi: 10.11648/j.ajtas.20221101.14
AMA Style
Arimiyaw Zakaria, Benony Kwaku Gordor, Bismark Kwao Nkansah. On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis. Am J Theor Appl Stat. 2022;11(1):27-35. doi: 10.11648/j.ajtas.20221101.14
@article{10.11648/j.ajtas.20221101.14, author = {Arimiyaw Zakaria and Benony Kwaku Gordor and Bismark Kwao Nkansah}, title = {On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {11}, number = {1}, pages = {27-35}, doi = {10.11648/j.ajtas.20221101.14}, url = {https://doi.org/10.11648/j.ajtas.20221101.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20221101.14}, abstract = {The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data.}, year = {2022} }
TY - JOUR T1 - On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis AU - Arimiyaw Zakaria AU - Benony Kwaku Gordor AU - Bismark Kwao Nkansah Y1 - 2022/02/09 PY - 2022 N1 - https://doi.org/10.11648/j.ajtas.20221101.14 DO - 10.11648/j.ajtas.20221101.14 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 27 EP - 35 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20221101.14 AB - The initial procedure of the Coefficient of Determination Ratio (CDR) for determining outliers in linear regression model is suggested for centred data and declares an observation as an outlier if the CDR value deviates from unity. Although the method performs very well and detects more precisely the requisite outliers than those observed by other well-known detection measures, the cut-off rule approach is a source of subjectivity and the data structure for which the method is designed is also restrictive. In this study therefore, a more rigorous cut-off rule of the same method for identifying influential observations is outlined for an updated method of the CDR that covers the more general case of a non-centred data. A cut-off rule is specified that involves the ratio of quantile values of the Beta distribution. An automated implementation of the procedure is presented that makes use of datasets in the literature and those that are simulated under various conditions of sample size, number and distribution of explanatory variables. The method is now made more generalized in application, objective and reliable as a detection measure than the initial proposal. It therefore provides most appreciable improvement in the explanatory power of linear regression models when the identified outliers are deleted from the data. VL - 11 IS - 1 ER -