A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

Fahed Yoseph; Markku Heikkilä

doi:10.1109/iCMLDE49015.2019.00023

A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussa › Konferenssiartikkeli › Tieteellinen › vertaisarvioitu

3 Sitaatiot (Scopus)

64 Lataukset (Pure)

Abstrakti

Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.

Alkuperäiskieli	Englanti
Otsikko	2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
Toimittajat	Phill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard, Md Rezaul Bashar
Kustantaja	IEEE
Sivut	65–71
ISBN (painettu)	978-1-7281-0404-1
DOI - pysyväislinkit	https://doi.org/10.1109/iCMLDE49015.2019.00023
Tila	Julkaistu - 2019
OKM-julkaisutyyppi	A4 Artikkeli konferenssijulkaisuussa
Tapahtuma	International Conference on Machine Learning and Data Engineering (iCMLDE) - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE) Kesto: 2 jouluk. 2019 → 4 jouluk. 2019

Konferenssi

Konferenssi	International Conference on Machine Learning and Data Engineering (iCMLDE)
Ajanjakso	02/12/19 → 04/12/19

Keywords

Clustering
Noise
Outlier detection
Point-of-sales analysis

Pääsy asiakirjaan

10.1109/iCMLDE49015.2019.00023

A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database.pdfHyväksytty kirjoittajan käsikirjoitus, 897 KBLisenssi: Publisher rights policy

http://urn.fi/URN:NBN:fi-fe2020100883280

Viittausmuodot

@inproceedings{8da0e13ab0214f5894a2cdc40441ccb9,

title = "A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database",

abstract = "Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.",

keywords = "Clustering, Noise, Outlier detection, Point-of-sales analysis, Clustering, Noise, Outlier detection, Point-of-sales analysis, Clustering, Noise, Outlier detection, Point-of-sales analysis",

author = "Fahed Yoseph and Markku Heikkil{\"a}",

note = "Bett om fulltext 3.6.2020, embargotid 24 m{\aa}n./EH; International Conference on Machine Learning and Data Engineering (iCMLDE) ; Conference date: 02-12-2019 Through 04-12-2019",

year = "2019",

doi = "10.1109/iCMLDE49015.2019.00023",

language = "English",

isbn = "978-1-7281-0404-1",

pages = "65–71",

editor = "{Kyu Rhee}, Phill and Kuo-Yuan Hwa and Tun-Wen Pai and Daniel Howard and {Rezaul Bashar}, Md",

booktitle = "2019 International Conference on Machine Learning and Data Engineering (iCMLDE)",

publisher = "IEEE",

}

Yoseph, F & Heikkilä, M 2019, A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database. julkaisussa P Kyu Rhee, K-Y Hwa, T-W Pai, D Howard & M Rezaul Bashar (toim), 2019 International Conference on Machine Learning and Data Engineering (iCMLDE). IEEE, Sivut 65–71, International Conference on Machine Learning and Data Engineering (iCMLDE), 02/12/19. https://doi.org/10.1109/iCMLDE49015.2019.00023

A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database. / Yoseph, Fahed; Heikkilä, Markku.
2019 International Conference on Machine Learning and Data Engineering (iCMLDE). toim. / Phill Kyu Rhee; Kuo-Yuan Hwa; Tun-Wen Pai; Daniel Howard; Md Rezaul Bashar. IEEE, 2019. s. 65–71.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussa › Konferenssiartikkeli › Tieteellinen › vertaisarvioitu

TY - GEN

T1 - A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

AU - Yoseph, Fahed

AU - Heikkilä, Markku

N1 - Bett om fulltext 3.6.2020, embargotid 24 mån./EH

PY - 2019

Y1 - 2019

N2 - Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.

AB - Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.

KW - Clustering

KW - Noise

KW - Outlier detection

KW - Point-of-sales analysis

KW - Clustering

KW - Noise

KW - Outlier detection

KW - Point-of-sales analysis

KW - Clustering

KW - Noise

KW - Outlier detection

KW - Point-of-sales analysis

U2 - 10.1109/iCMLDE49015.2019.00023

DO - 10.1109/iCMLDE49015.2019.00023

M3 - Conference contribution

SN - 978-1-7281-0404-1

SP - 65

EP - 71

BT - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)

A2 - Kyu Rhee, Phill

A2 - Hwa, Kuo-Yuan

A2 - Pai, Tun-Wen

A2 - Howard, Daniel

A2 - Rezaul Bashar, Md

PB - IEEE

T2 - International Conference on Machine Learning and Data Engineering (iCMLDE)

Y2 - 2 December 2019 through 4 December 2019

ER -

A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

Abstrakti

Konferenssi

Keywords

Pääsy asiakirjaan

Sormenjälki

Viittausmuodot