Abstract

Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.

Original languageEnglish (US)
Title of host publication16th SIAM International Conference on Data Mining 2016, SDM 2016
EditorsSanjay Chawla Venkatasubramanian, Wagner Meira
PublisherSociety for Industrial and Applied Mathematics Publications
Pages387-395
Number of pages9
ISBN (Electronic)9781510828117
DOIs
StatePublished - 2016
Event16th SIAM International Conference on Data Mining 2016, SDM 2016 - Miami, United States
Duration: May 5 2016May 7 2016

Publication series

Name16th SIAM International Conference on Data Mining 2016, SDM 2016

Conference

Conference16th SIAM International Conference on Data Mining 2016, SDM 2016
Country/TerritoryUnited States
CityMiami
Period5/5/165/7/16

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Robust unsupervised feature selection on networked data'. Together they form a unique fingerprint.

Cite this