Abstract
Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.
Original language | English |
---|---|
Title of host publication | 2019 International Conference on Machine Learning and Data Engineering (iCMLDE) |
Editors | Phill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard, Md Rezaul Bashar |
Publisher | IEEE |
Pages | 65–71 |
ISBN (Print) | 978-1-7281-0404-1 |
DOIs | |
Publication status | Published - 2019 |
MoE publication type | A4 Article in a conference publication |
Event | International Conference on Machine Learning and Data Engineering (iCMLDE) - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE) Duration: 2 Dec 2019 → 4 Dec 2019 |
Conference
Conference | International Conference on Machine Learning and Data Engineering (iCMLDE) |
---|---|
Period | 02/12/19 → 04/12/19 |
Keywords
- Clustering
- Noise
- Outlier detection
- Point-of-sales analysis