A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type

Paul Blomstedt; J Tang; J Xiong; C Granlund; Jukka Corander

doi:10.1109/TPAMI.2014.2359431

A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type

Paul Blomstedt, J Tang, J Xiong, C Granlund, Jukka Corander

Research output: Contribution to journal › Article › Scientific › peer-review

16 Citations (Scopus)

Abstract

Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.

Original language	Undefined/Unknown
Pages (from-to)	489–498
Number of pages	10
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	37
Issue number	3
DOIs	https://doi.org/10.1109/TPAMI.2014.2359431
Publication status	Published - 2015
MoE publication type	A1 Journal article-refereed

Keywords

Bayes methods
mixed distributions
predictive models
unsupervised learning

Access to Document

10.1109/TPAMI.2014.2359431

Cite this

@article{7071258a078c40b88eeecbd183452ad5,

title = "A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type",

abstract = "Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.",

keywords = "Bayes methods, mixed distributions, predictive models, unsupervised learning, Bayes methods, mixed distributions, predictive models, unsupervised learning, Bayes methods, mixed distributions, predictive models, unsupervised learning",

author = "Paul Blomstedt and J Tang and J Xiong and C Granlund and Jukka Corander",

year = "2015",

doi = "10.1109/TPAMI.2014.2359431",

language = "Odefinierat/ok{\"a}nt",

volume = "37",

pages = "489–498",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "Institute of Electrical and Electronics Engineers",

number = "3",

}

TY - JOUR

T1 - A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type

AU - Blomstedt, Paul

AU - Tang, J

AU - Xiong, J

AU - Granlund, C

AU - Corander, Jukka

PY - 2015

Y1 - 2015

N2 - Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.

AB - Advantages of model-based clustering methods over heuristic alternatives have been widely demonstrated in the literature. Most model-based clustering algorithms assume that the data are either discrete or continuous, possibly allowing both types to be present in separate features. In this paper, we introduce a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values. Such data may be encountered, for instance, in chemical and biological analyses, in the analysis of survey data, as well as in image analysis. Our model is formulated within a Bayesian predictive framework, where clustering solutions correspond to random partitions of the data. Using conjugate analysis, the posterior probability for each possible partition can be determined analytically, enabling the utilization of efficient computational search strategies for finding the posterior optimal partition. The derived model is illustrated using several synthetic and real datasets.

KW - Bayes methods

KW - mixed distributions

KW - predictive models

KW - unsupervised learning

KW - Bayes methods

KW - mixed distributions

KW - predictive models

KW - unsupervised learning

KW - Bayes methods

KW - mixed distributions

KW - predictive models

KW - unsupervised learning

U2 - 10.1109/TPAMI.2014.2359431

DO - 10.1109/TPAMI.2014.2359431

M3 - Artikel

SN - 0162-8828

VL - 37

SP - 489

EP - 498

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 3

ER -