Efficient yet accurate clustering

Manoranjan Dash, Kian Lee Tan, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

In this paper we show that most hierarchical agglomerative clustering (HAC) algorithms follow a 90-10 rule where roughly 90% iterations from the beginning merge cluster pairs with dissimilarity less than 10% of the maximum dissimilarity. We propose two algorithms - 2-phase and nested - based on partially overlapping partitioning (POP). To handle high-dimensional data eficiently, we propose a tree structure particularly suitable for POP. Extensive experiments show that the proposed algorithms reduce the time and memory requirement of existing HAC algorithms significantly without compromising in accuracy.

Original languageEnglish (US)
Title of host publicationProceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Pages99-106
Number of pages8
StatePublished - 2001
Event1st IEEE International Conference on Data Mining, ICDM'01 - San Jose, CA, United States
Duration: Nov 29 2001Dec 2 2001

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM

Other

Other1st IEEE International Conference on Data Mining, ICDM'01
Country/TerritoryUnited States
CitySan Jose, CA
Period11/29/0112/2/01

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Efficient yet accurate clustering'. Together they form a unique fingerprint.

Cite this