TY - GEN
T1 - Learning homophily couplings from non-IID data for joint feature selection and noise-resilient outlier detection
AU - Pang, Guansong
AU - Cao, Longbing
AU - Chen, Ling
AU - Liu, Huan
N1 - Funding Information: This work is partially supported by the ARC Discovery Grants DP130102691 and DP140100545.
PY - 2017
Y1 - 2017
N2 - This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-κ outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five stateof-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
AB - This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-κ outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five stateof-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
UR - http://www.scopus.com/inward/record.url?scp=85031931372&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031931372&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2017/360
DO - 10.24963/ijcai.2017/360
M3 - Conference contribution
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 2585
EP - 2591
BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
A2 - Sierra, Carles
PB - International Joint Conferences on Artificial Intelligence
T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
Y2 - 19 August 2017 through 25 August 2017
ER -