Outliers Identification Model in Point-of-Sales Data Using Enhanced Normal Distribution Method

    Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

    1 Sitaatiot (Scopus)
    12 Lataukset (Pure)

    Abstrakti

    Data Mining extrapolates patterns drawing conclusions from data. Outliers detection identifies those objects that fall some standard deviations away from the mean and is an important tool of commercial data mining. Characterizing the manner of outliers can lead to new knowledge, such as the manner of fraudulent transactions. However, outliers may represent meaningless aberrations and hence there is no rigid mathematical or statistical definition of what constitutes an outlier, and, in many scenarios, determination of the outlier is ultimately a subjective exercise. Standard deviation is a central actor in outlier detection and yet exhibits sensitivity to values and can be distorted, inflated, by a single or even a few observations of borderline and extreme values. It can mask the situation where less extreme outliers or anomalies go undetected because of the existence of the most extreme outliers. This study proposes a novel outlier identification model using an enhanced normal distribution method. The model can explore different types of outliers giving an end-user the ability to fully or partially eliminate outliers found in a retail point of sale (POS) dataset. Experiments revealed that the enhanced normal distribution method appeared more accurate than the standard normal distribution method, and results were also evaluated subjectively by the client, who found most of the outliers to be truly outliers and some representing potentially fraudulent transactions.

    AlkuperäiskieliEi tiedossa
    Otsikko2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    ToimittajatPhill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard, Md Rezaul Bashar
    KustantajaIEEE
    Sivut72–78
    ISBN (painettu)978-1-7281-0404-1
    DOI - pysyväislinkit
    TilaJulkaistu - 2019
    OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
    TapahtumaInternational Conference on Machine Learning and Data Engineering - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    Kesto: 2 jouluk. 20194 jouluk. 2019

    Konferenssi

    KonferenssiInternational Conference on Machine Learning and Data Engineering
    Ajanjakso02/12/1904/12/19

    Keywords

    • Noise
    • Normal distribution
    • Outlier detection
    • Point-of-sales analysis

    Viittausmuodot