Outliers Identification Model in Point-of-Sales Data Using Enhanced Normal Distribution Method

A4 Konferenspublikationer

Interna författare/redaktörer

Publikationens författare: Fahed Yoseph, Markku Heikkilä, Daniel Howard
Redaktörer: Phill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard and Md Rezaul Bashar
Förlagsort: Taipei, Taiwan, Taiwan Taipei, Taiwan, Taiwan
Publiceringsår: 2019
Förläggare: IEEE
Moderpublikationens namn: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
Artikelns första sida, sidnummer: 72
Artikelns sista sida, sidnummer: 78
ISBN: 978-1-7281-0404-1


Data Mining extrapolates patterns drawing conclusions from data.
Outliers detection identifies those objects that fall some standard
deviations away from the mean and is an important tool of commercial
data mining. Characterizing the manner of outliers can lead to new
knowledge, such as the manner of fraudulent transactions. However,
outliers may represent meaningless aberrations and hence there is no
rigid mathematical or statistical definition of what constitutes an
outlier, and, in many scenarios, determination of the outlier is
ultimately a subjective exercise. Standard deviation is a central actor
in outlier detection and yet exhibits sensitivity to values and can be
distorted, inflated, by a single or even a few observations of
borderline and extreme values. It can mask the situation where less
extreme outliers or anomalies go undetected because of the existence of
the most extreme outliers. This study proposes a novel outlier
identification model using an enhanced normal distribution method. The
model can explore different types of outliers giving an end-user the
ability to fully or partially eliminate outliers found in a retail point
of sale (POS) dataset. Experiments revealed that the enhanced normal
distribution method appeared more accurate than the standard normal
distribution method, and results were also evaluated subjectively by the
client, who found most of the outliers to be truly outliers and some
representing potentially fraudulent transactions.


Noise, Normal distribution, Outlier detection, Point-of-sales analysis

Senast uppdaterad 2020-26-05 vid 05:53