A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type

Paul Blomstedt, J Tang, J Xiong, C Granlund, Jukka Corander

    Tutkimustuotos: LehtiartikkeliArtikkeliTieteellinenvertaisarvioitu

    15 Sitaatiot (Scopus)

    Abstrakti

    Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.
    AlkuperäiskieliEi tiedossa
    Sivut489–498
    Sivumäärä10
    JulkaisuIEEE Transactions on Pattern Analysis and Machine Intelligence
    Vuosikerta37
    Numero3
    DOI - pysyväislinkit
    TilaJulkaistu - 2015
    OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

    Keywords

    • Bayes methods
    • mixed distributions
    • predictive models
    • unsupervised learning

    Viittausmuodot