This paper presents an outlier detection technique for univariate normal datasets. Outliers are observations that lips an abnormal distance from the mean. Outlier detection is a useful technique in such areas as fraud detection, financial analysis, health monitoring and Statistical modelling. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier. Methods of outlier detection such as Gaussian method of outlier detection have been widely used in the detection of outliers for univariate data-sets, however, such methods use measure of central tendency and dispersion that are affected by outliers hence making the method to be less robust towards detection of outliers. The study aimed at providing an alternative method that can be used in outlier detection for univariate normal data sets by deploying the measures of variation and central tendency that are least affected by the outliers (median and the geometric measure of variation). The study formulated an outlier detection formula using median and geometric measure of variation and then applied the formulation on randomly simulated normal dataset with outliers and recorded the number of outliers detected by the method in comparison to the other two existing best methods of outlier detection. The study then compared the sensitivity of the three methods in outlier detection. The simulation was done in two different ways, the first considered the variation in mean with a constant standard deviation while the second test held the mean constant while varying the standard deviation. The formulated outlier detection technique performed the best, eliminating the most required number of outliers compared to other two Gaussian outlier detection techniques when there was variation in mean. The study also established that the formulated method of outlier detection was stricter when the standard deviation was varied but still stands out to be the best as an outlier is defined relative to the mean and not the standard deviation. The study established that the formulated method is more sensitive than the Gaussian Method of outlier detection but performed as well as the best existing outlier detection technique. In conclusion, the study established that the formulated method could be employed in outlier detections for univariate normal data-sets as it performed almost the same to the best existing method of outlier detection for univariate data-sets.
Published in | American Journal of Theoretical and Applied Statistics (Volume 11, Issue 1) |
DOI | 10.11648/j.ajtas.20221101.11 |
Page(s) | 1-12 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2022. Published by Science Publishing Group |
Outlier, Anomaly, Outlier Detection, Gaussian
[1] | I. Ben-Gal, "Outlier detection," in Data mining and knowledge discovery handbook, Springer, 2005, pp. 131-146. |
[2] | P. L. Clark, "Number theory: A contemporary introduction", 2012. |
[3] | C. E. Shannon, A mathematical theory of communication, vol. 27, Bell System Technical Journal, 1948, pp. 379-423. |
[4] | D. Papadopolus, T. Palpanas, D. Gonupulos and V. Kalogeraki, "Distributed deviation detection in sensor networks", vol. 32, Acm sigmod record, 2003, pp. 77-82. |
[5] | J. Orsborne and and A. Overbay, "The power of outliers (and why researchers should always check for them)", vol. 9, Practical Assessment, Research and Evaluation, 2004, p. 6. |
[6] | X. Li and and J. Han;, Mining approximate top k subspace anomalies in multidimensional time series data", in VDLBD, 2007, pp. 447-458. |
[7] | V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell and and J. French, "Clustering large datasets in arbistrary metric spaces", in Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), pp. 502-511. |
[8] | S. D. Bay and and M. Schwabacher, "Mining distance-based outliers in near linear time with randomization and a sample pruning rule", in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 29-38. |
[9] | E. Eskin, A. Arnold, M. Prerau, L. Portnoy and and S. Stolfo, "A geometric framework for unsupervised anomaly detection", in Applications of data mining in computer securty. Springer, 2002, pp. 77-101. |
[10] | J. Han and and M. Kamber, "Data mining concepts and techniques", San Francisco: morgan kaufmann publishers. |
[11] | V. Barnett, "The ordering of multivariate data", vol. 139, Journal of royal statistical society Series A (General), 1976, pp. 318-344. |
[12] | P. J. Rousseeuw and and M. Hubert, "Robust statistics for outlier detection", vol. 1, Wiley interdisciplinary reviews. Data mining and knowledge discovery, 2011, pp. 73-79. |
[13] | C. C. Aggarwals, J. Han, J. Wang and and P. S. Yu, A framework for projected clusterring of high dimensional data streams", vol. 30, in Proceedings of the Thirtieth international conference on very large databases, 2004, pp. 852-863. |
[14] | P. C. Wu, "The Central Limit Theorem and comparing means, trimmed means, one-step M-estimators and modified one-step M-estimators under non-normality", University of Southern Carlifornia, 2002. |
[15] | A. Biswas and and A. Bisaria, "A test of normality from allegorizing the bell curve or the gaussian probability distribution as memoryless and depthless like a black hole", vol. 14, Applied Mathematics Sciences, 2020, pp. 349-359. |
[16] | R. Lugannani and and S. Rice, "Saddle point approximation for the distribution of the sum of independent random variables", vol. 12, Advances in applied probability, 1980, pp. 475-490. |
[17] | C. Leys, C. Ley, O. Klein, P. Bernard and and L. Licata, Detecting outliers: Do not use standard devuiation adround the mean, use absolute deviation around the median", vol. 49, Journal of experimental social psycology, 2013, pp. 764-766. |
[18] | P. J. Rouseeuw and and B. C. Van Zomeren, "Unmasking multivariate outliers and leverage points", vol. 85, Journal of the American Statistics association, 1990, pp. 633-639. |
[19] | V. L. Sourd, "Performance measurement for traditional investment", vol. 58, Financial Analysis Journal, 2007, pp. 36-52. |
[20] | P. V. Hippel, "Mean, median and skew: correcting a textbook rule", vol. 13, Journal of statistics Education, 2005. |
[21] | E. M. Knorr and and R. T. Ng, "Finding intensional knowledge of distance-based outliers", vol. 99, in Vldb, 1999, pp. 211-222. |
[22] | W. Dixon, "Processing data for outliers", vol. 9, Biometry, 1953, pp. 74-89. |
[23] | F. Angiulli and and C. PIzzuti, "fast outlier detection in high dimensional spaces, principals of data mining and knowledge discovery", 2002. |
[24] | B. Troon, "Estimating average variation about the population mean using geometric measure of variation", 2020. |
[25] | T. Li, H. Fan, J. Garcia and and J. M. Corchado, "Second order statistics analysis and comarison between arithmetic and geometric average fusion: Application to multi-sensor target tracking", vol. 51, Information Fusion, 2019, pp. 233-243. |
[26] | V. Barnett and and T. Lewis, "Outliers in statistical data, Wiley series in Probability and Mathematical statistics. Applied Probability and Statistics 1984. |
APA Style
Ooko Silas Owuor, Troon John Benedict, Otieno Okumu Kevin. (2022). Outlier Detection Technique for Univariate Normal Datasets. American Journal of Theoretical and Applied Statistics, 11(1), 1-12. https://doi.org/10.11648/j.ajtas.20221101.11
ACS Style
Ooko Silas Owuor; Troon John Benedict; Otieno Okumu Kevin. Outlier Detection Technique for Univariate Normal Datasets. Am. J. Theor. Appl. Stat. 2022, 11(1), 1-12. doi: 10.11648/j.ajtas.20221101.11
AMA Style
Ooko Silas Owuor, Troon John Benedict, Otieno Okumu Kevin. Outlier Detection Technique for Univariate Normal Datasets. Am J Theor Appl Stat. 2022;11(1):1-12. doi: 10.11648/j.ajtas.20221101.11
@article{10.11648/j.ajtas.20221101.11, author = {Ooko Silas Owuor and Troon John Benedict and Otieno Okumu Kevin}, title = {Outlier Detection Technique for Univariate Normal Datasets}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {11}, number = {1}, pages = {1-12}, doi = {10.11648/j.ajtas.20221101.11}, url = {https://doi.org/10.11648/j.ajtas.20221101.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20221101.11}, abstract = {This paper presents an outlier detection technique for univariate normal datasets. Outliers are observations that lips an abnormal distance from the mean. Outlier detection is a useful technique in such areas as fraud detection, financial analysis, health monitoring and Statistical modelling. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier. Methods of outlier detection such as Gaussian method of outlier detection have been widely used in the detection of outliers for univariate data-sets, however, such methods use measure of central tendency and dispersion that are affected by outliers hence making the method to be less robust towards detection of outliers. The study aimed at providing an alternative method that can be used in outlier detection for univariate normal data sets by deploying the measures of variation and central tendency that are least affected by the outliers (median and the geometric measure of variation). The study formulated an outlier detection formula using median and geometric measure of variation and then applied the formulation on randomly simulated normal dataset with outliers and recorded the number of outliers detected by the method in comparison to the other two existing best methods of outlier detection. The study then compared the sensitivity of the three methods in outlier detection. The simulation was done in two different ways, the first considered the variation in mean with a constant standard deviation while the second test held the mean constant while varying the standard deviation. The formulated outlier detection technique performed the best, eliminating the most required number of outliers compared to other two Gaussian outlier detection techniques when there was variation in mean. The study also established that the formulated method of outlier detection was stricter when the standard deviation was varied but still stands out to be the best as an outlier is defined relative to the mean and not the standard deviation. The study established that the formulated method is more sensitive than the Gaussian Method of outlier detection but performed as well as the best existing outlier detection technique. In conclusion, the study established that the formulated method could be employed in outlier detections for univariate normal data-sets as it performed almost the same to the best existing method of outlier detection for univariate data-sets.}, year = {2022} }
TY - JOUR T1 - Outlier Detection Technique for Univariate Normal Datasets AU - Ooko Silas Owuor AU - Troon John Benedict AU - Otieno Okumu Kevin Y1 - 2022/01/21 PY - 2022 N1 - https://doi.org/10.11648/j.ajtas.20221101.11 DO - 10.11648/j.ajtas.20221101.11 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 1 EP - 12 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20221101.11 AB - This paper presents an outlier detection technique for univariate normal datasets. Outliers are observations that lips an abnormal distance from the mean. Outlier detection is a useful technique in such areas as fraud detection, financial analysis, health monitoring and Statistical modelling. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier. Methods of outlier detection such as Gaussian method of outlier detection have been widely used in the detection of outliers for univariate data-sets, however, such methods use measure of central tendency and dispersion that are affected by outliers hence making the method to be less robust towards detection of outliers. The study aimed at providing an alternative method that can be used in outlier detection for univariate normal data sets by deploying the measures of variation and central tendency that are least affected by the outliers (median and the geometric measure of variation). The study formulated an outlier detection formula using median and geometric measure of variation and then applied the formulation on randomly simulated normal dataset with outliers and recorded the number of outliers detected by the method in comparison to the other two existing best methods of outlier detection. The study then compared the sensitivity of the three methods in outlier detection. The simulation was done in two different ways, the first considered the variation in mean with a constant standard deviation while the second test held the mean constant while varying the standard deviation. The formulated outlier detection technique performed the best, eliminating the most required number of outliers compared to other two Gaussian outlier detection techniques when there was variation in mean. The study also established that the formulated method of outlier detection was stricter when the standard deviation was varied but still stands out to be the best as an outlier is defined relative to the mean and not the standard deviation. The study established that the formulated method is more sensitive than the Gaussian Method of outlier detection but performed as well as the best existing outlier detection technique. In conclusion, the study established that the formulated method could be employed in outlier detections for univariate normal data-sets as it performed almost the same to the best existing method of outlier detection for univariate data-sets. VL - 11 IS - 1 ER -