Dimensionality reduction of unsupervised data

M. Dash, Huan Liu, J. Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

150 Scopus citations

Abstract

Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal Component Analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. In this paper we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Tools with Artificial Intelligence
Editors Anon
PublisherIEEE
Pages532-539
Number of pages8
StatePublished - 1997
Externally publishedYes
EventProceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence - Newport Beach, CA, USA
Duration: Nov 3 1997Nov 8 1997

Other

OtherProceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence
CityNewport Beach, CA, USA
Period11/3/9711/8/97

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Dimensionality reduction of unsupervised data'. Together they form a unique fingerprint.

Cite this