TY - GEN
T1 - Pairwise exemplar clustering
AU - Yang, Yingzhen
AU - Chu, Xinqi
AU - Liang, Feng
AU - Huang, Thomas S.
PY - 2012
Y1 - 2012
N2 - Exemplar-based clustering methods have been extensively shown to be effective in many clustering problems. They adaptively determine the number of clusters and hold the appealing advantage of not requiring the estimation of latent parameters, which is otherwise difficult in case of complicated parametric model and high dimensionality of the data. However, modeling arbitrary underlying distribution of the data is still difficult for existing exemplar-based clustering methods. We present Pairwise Exemplar Clustering (PEC) to alleviate this problem by modeling the underlying cluster distributions more accurately with non-parametric kernel density estimation. Interpreting the clusters as classes from a supervised learning perspective, we search for an optimal partition of the data that balances two quantities: 1 the misclassification rate of the data partition for separating the clusters; 2 the sum of within-cluster dissimilarities for controlling the cluster size. The broadly used kernel form of cut turns out to be a special case of our formulation. Moreover, we optimize the corresponding objective function by a new efficient algorithm for message computation in a pairwise MRF. Experimental results on synthetic and real data demonstrate the effectiveness of our method.
AB - Exemplar-based clustering methods have been extensively shown to be effective in many clustering problems. They adaptively determine the number of clusters and hold the appealing advantage of not requiring the estimation of latent parameters, which is otherwise difficult in case of complicated parametric model and high dimensionality of the data. However, modeling arbitrary underlying distribution of the data is still difficult for existing exemplar-based clustering methods. We present Pairwise Exemplar Clustering (PEC) to alleviate this problem by modeling the underlying cluster distributions more accurately with non-parametric kernel density estimation. Interpreting the clusters as classes from a supervised learning perspective, we search for an optimal partition of the data that balances two quantities: 1 the misclassification rate of the data partition for separating the clusters; 2 the sum of within-cluster dissimilarities for controlling the cluster size. The broadly used kernel form of cut turns out to be a special case of our formulation. Moreover, we optimize the corresponding objective function by a new efficient algorithm for message computation in a pairwise MRF. Experimental results on synthetic and real data demonstrate the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=84868293520&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84868293520&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781577355687
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 1204
EP - 1211
BT - AAAI-12 / IAAI-12 - Proceedings of the 26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference
T2 - 26th AAAI Conference on Artificial Intelligence and the 24th Innovative Applications of Artificial Intelligence Conference, AAAI-12 / IAAI-12
Y2 - 22 July 2012 through 26 July 2012
ER -