A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

Fahed Yoseph, Markku Heikkilä

    Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

    3 Sitaatiot (Scopus)
    64 Lataukset (Pure)

    Abstrakti

    Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.

    AlkuperäiskieliEnglanti
    Otsikko2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    ToimittajatPhill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard, Md Rezaul Bashar
    KustantajaIEEE
    Sivut65–71
    ISBN (painettu)978-1-7281-0404-1
    DOI - pysyväislinkit
    TilaJulkaistu - 2019
    OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
    TapahtumaInternational Conference on Machine Learning and Data Engineering (iCMLDE) - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    Kesto: 2 jouluk. 20194 jouluk. 2019

    Konferenssi

    KonferenssiInternational Conference on Machine Learning and Data Engineering (iCMLDE)
    Ajanjakso02/12/1904/12/19

    Keywords

    • Clustering
    • Noise
    • Outlier detection
    • Point-of-sales analysis

    Sormenjälki

    Sukella tutkimusaiheisiin 'A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Viittausmuodot