A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

A4 Conference proceedings


Internal Authors/Editors


Publication Details

List of Authors: Fahed Yoseph, Markku Heikkilä
Editors: Phill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard and Md Rezaul Bashar
Place: Taipei, Taiwan, Taiwan
Publication year: 2019
Publisher: IEEE
Book title: 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
Start page: 65
End page: 71
ISBN: 978-1-7281-0404-1


Abstract

Finding outliers, rare events from a collection of patterns, has become
an emerging issue in the area of machine learning concerned with
detecting and eventually removing anomalous objects in data. A key
challenge with outliers/anomalies detection is because they are not a
well-formulated issue. Outliers are defined as the extreme values that
deviate from the overall patterns in data; they may indicate
experimental errors, variability in measurement, or a novelty. Detecting
outliers in large databases can lead to the discovery of hidden
knowledge. However, identifying and removing outliers often helps to
assure that the observations represent the problem correctly. Though
there are several techniques for detecting outliers/anomalies in a given
database, thus, no single technique is proven to be the standard
universal choice. Depending on the nature of the target application,
different implementations require the use of different outlier detection
methods. The clustering method is a very powerful method in the field
of machine learning and defines outliers in terms of their distance to
the cluster centers. In this study, we propose a clustering-based
approach to identifying outliers in a retail point-of-sales dataset. To
select the best clustering algorithm for the purpose, two algorithms are
applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means
for soft clustering. The experimental results show that the K-means
algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of
outlier detection efficiency, and it is an effective outlier detection
solution.


Keywords

Clustering, Noise, Outlier detection, Point-of-sales analysis

Last updated on 2020-06-04 at 09:27