A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

A4 Konferenspublikationer

Interna författare/redaktörer

Publikationens författare: Fahed Yoseph, Markku Heikkilä
Redaktörer: Phill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard and Md Rezaul Bashar
Förlagsort: Taipei, Taiwan, Taiwan
Publiceringsår: 2019
Förläggare: IEEE
Moderpublikationens namn: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
Artikelns första sida, sidnummer: 65
Artikelns sista sida, sidnummer: 71
ISBN: 978-1-7281-0404-1


Finding outliers, rare events from a collection of patterns, has become
an emerging issue in the area of machine learning concerned with
detecting and eventually removing anomalous objects in data. A key
challenge with outliers/anomalies detection is because they are not a
well-formulated issue. Outliers are defined as the extreme values that
deviate from the overall patterns in data; they may indicate
experimental errors, variability in measurement, or a novelty. Detecting
outliers in large databases can lead to the discovery of hidden
knowledge. However, identifying and removing outliers often helps to
assure that the observations represent the problem correctly. Though
there are several techniques for detecting outliers/anomalies in a given
database, thus, no single technique is proven to be the standard
universal choice. Depending on the nature of the target application,
different implementations require the use of different outlier detection
methods. The clustering method is a very powerful method in the field
of machine learning and defines outliers in terms of their distance to
the cluster centers. In this study, we propose a clustering-based
approach to identifying outliers in a retail point-of-sales dataset. To
select the best clustering algorithm for the purpose, two algorithms are
applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means
for soft clustering. The experimental results show that the K-means
algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of
outlier detection efficiency, and it is an effective outlier detection


Clustering, Noise, Outlier detection, Point-of-sales analysis

Senast uppdaterad 2020-10-04 vid 08:08