TY - GEN
T1 - Robust unsupervised feature selection on networked data
AU - Li, Jundong
AU - Hu, Xia
AU - Wu, Liang
AU - Liu, Huan
N1 - Funding Information: The authors wish to thank Harsh Dani who assisted in the proofreading of the manuscript. This material is, in part, supported by National Science Foundation (NSF) under grant number IIS-1217466. Publisher Copyright: Copyright © by SIAM.
PY - 2016
Y1 - 2016
N2 - Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.
AB - Feature selection has shown its effectiveness to prepare high-dimensional data for many data mining and machine learning tasks. Traditional feature selection algorithms are mainly based on the assumption that data instances are independent and identically distributed. However, this assumption is invalid in networked data since instances are not only associated with high dimensional features but also inherently interconnected with each other. In addition, obtaining label information for networked data is time consuming and labor intensive. Without label information to direct feature selection, it is difficult to assess the feature relevance. In contrast to the scarce label information, link information in networks are abundant and could help select relevant features. However, most networked data has a lot of noisy links, resulting in the feature selection algorithms to be less effective. To address the above mentioned issues, we propose a robust unsupervised feature selection framework NetFS for networked data, which embeds the latent representation learning into feature selection. Therefore, content information is able to help mitigate the negative effects from noisy links in learning latent representations, while good latent representations in turn can contribute to extract more meaningful features. In other words, both phases could cooperate and boost each other. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework.
UR - http://www.scopus.com/inward/record.url?scp=84991628944&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991628944&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974348.44
DO - 10.1137/1.9781611974348.44
M3 - Conference contribution
T3 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
SP - 387
EP - 395
BT - 16th SIAM International Conference on Data Mining 2016, SDM 2016
A2 - Venkatasubramanian, Sanjay Chawla
A2 - Meira, Wagner
PB - Society for Industrial and Applied Mathematics Publications
T2 - 16th SIAM International Conference on Data Mining 2016, SDM 2016
Y2 - 5 May 2016 through 7 May 2016
ER -