| Peer-Reviewed

Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel

Received: 25 June 2021     Accepted: 21 August 2021     Published: 30 August 2021
Views:       Downloads:
Abstract

Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution.

Published in American Journal of Theoretical and Applied Statistics (Volume 10, Issue 4)
DOI 10.11648/j.ajtas.20211004.13
Page(s) 194-201
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Excel, Regression, Least Squares, Gauss-Markov, Non-parametric, Kendall-Theil

References
[1] Frost, J (2020) Regression analysis: an intuitive guide for using and interpreting linear methods. Jim Publishing.
[2] Johnson R. A. Wichern D. W. (2002), Applied Multivariate Statistical Analysis. Prentice hall (see chapter 7).
[3] Gelman, A., Hill, J., & Vehtari, A (2020) Regression and other stories. Cambridge University Press.
[4] Cantrell, J (2018) Simple linear regression in Power B1. https://www.bluegranite.com/blog/simple-linear-regression-in-power-bi
[5] Kassambara, A (2020) Linear regression essentials in R. http://www.sthda.com/english/articles/40-regression-analysis/165-linear-regression-essentials-in-r/
[6] Stojiljovic, M (2020) Linear regression in Python. https://realpython.com/linear-regression-in-python/#reader-comments.
[7] Fox, J., & Weisberg, S (2018) An R companion to applied regression (3rd edition). Sage Publications.
[8] Laverty WH, Miket MJ, Kelly IW., (2002) “Simulation of hidden Markov models with EXCEL” Journal of Royal Statistical Society: Series D. Vol 51 Issue 1, 31-40.
[9] Laverty WH, Kelly IW. (2018), “Using Excel to Simulate and Visualize Conditional Heteroskedastic Models” American Journal of Theoretical and Applied Statistics 2018; 7 (6): 242-246.
[10] Laverty WH, Kelly IW. (2019), “Using Excel to Visualize State Identification in Hidden Markov Models Using the Forward and Backward Algorithms” Applied Mathematical Sciences, Vol. 13, 2019, no. 4, 151 -162.
[11] Theil, H. (1950), "A rank-invariant method of linear and polynomial regression analysis. I, II, III", Nederl. Akad. Wetensch., Proc., 53: 386–392, 521–525, 1397–1412.
[12] Sen, P. K (1968), "Estimates of the regression coefficient based on Kendall's tau", Journal of the American Statistical Association, 63 (324): 1379–1389, doi: 10.2307/2285891.
[13] Gavrilova, Y (2020) Regression analysis overview: the hows and whys. https://serokell.io/blog/regression-analysis-overview
[14] Tsay, R S. (2010) Analysis of Financial Time Series. Third Edition. Wiley. Hoboken (NJ).
[15] Cook, J.D. (2018) Statistical Odds and Ends Blog. https://statisticaloddsandends.wordpress.com/2018/03/03/t-distribution-as-a-mixture-of-normals/
[16] Fishman, G.S. (1995) Monte Carlo, Concepts, Algorithms and Applications. Springer, Berlin.
[17] Farooqi, A. (2019), A Comparative Study of Kendall-Theil Sen, Siegel Vs Quantile Regression with Outliers” (2019). Wayne State Dissertations. 2352.
Cite This Article
  • APA Style

    William Henry Laverty, Ivan William Kelly. (2021). Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. American Journal of Theoretical and Applied Statistics, 10(4), 194-201. https://doi.org/10.11648/j.ajtas.20211004.13

    Copy | Download

    ACS Style

    William Henry Laverty; Ivan William Kelly. Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. Am. J. Theor. Appl. Stat. 2021, 10(4), 194-201. doi: 10.11648/j.ajtas.20211004.13

    Copy | Download

    AMA Style

    William Henry Laverty, Ivan William Kelly. Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel. Am J Theor Appl Stat. 2021;10(4):194-201. doi: 10.11648/j.ajtas.20211004.13

    Copy | Download

  • @article{10.11648/j.ajtas.20211004.13,
      author = {William Henry Laverty and Ivan William Kelly},
      title = {Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {10},
      number = {4},
      pages = {194-201},
      doi = {10.11648/j.ajtas.20211004.13},
      url = {https://doi.org/10.11648/j.ajtas.20211004.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211004.13},
      abstract = {Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Exploring the Effects of Assumption Violations on Simple Linear Regression and Correlation Using Excel
    AU  - William Henry Laverty
    AU  - Ivan William Kelly
    Y1  - 2021/08/30
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajtas.20211004.13
    DO  - 10.11648/j.ajtas.20211004.13
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 194
    EP  - 201
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20211004.13
    AB  - Regression analysis plays a central role in statistics and our understanding of the world. Linear regression models are the simplest type of regression and an understanding of them is an essential basis for more advanced models. In this article we will show how to use Excel to generate data from a simple linear regression model and illustrate how the statistical methods behave both when the fundamental assumptions of the model hold and when the fundamental assumptions are violated. The advantage of the using the program Excel is that when you press the recalculate button, under the Formulas menu, the data that is generated at random will be regenerated, statistical calculations will be recalculated and relevant graphs will be redrawn. Least squares is the statistical technique typically used when assumptions are satisfied. A statistical technique used when the normality assumption is violated is the non-parametric technique introduced by Kendall and Theil. The latter is useful when data are skewed or heteroskedastic, and is as powerful as least squares regression for Normally distributed data. Exercises are provided to illustrate both these procedures. In these exercises we generate samples of a Simple Linear Regression where the error term could follow a Normal distribution or the heavy tailed t-distribution.
    VL  - 10
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, Canada

  • Retired from Department of Educational Psychology & Special Education, University of Saskatchewan, Saskatoon, Canada

  • Sections