TY - GEN
T1 - Representation learning for imbalanced cross-domain classification
AU - Cheng, Lu
AU - Guo, Ruocheng
AU - Candan, K. Selçuk
AU - Liu, Huan
N1 - Funding Information: This material is based upon work supported by the National Science Foundation (NSF) Grants #1610282, #1633381, and #1909555. Publisher Copyright: Copyright © 2020 by SIAM.
PY - 2020
Y1 - 2020
N2 - Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.
AB - Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.
KW - Data Imbalance
KW - Domain Imbalance
KW - Representation Learning
KW - Unsupervised Domain Adaptation
UR - http://www.scopus.com/inward/record.url?scp=85089183457&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089183457&partnerID=8YFLogxK
U2 - 10.1137/1.9781611976236.54
DO - 10.1137/1.9781611976236.54
M3 - Conference contribution
T3 - Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
SP - 478
EP - 486
BT - Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
A2 - Demeniconi, Carlotta
A2 - Chawla, Nitesh
PB - Society for Industrial and Applied Mathematics Publications
T2 - 2020 SIAM International Conference on Data Mining, SDM 2020
Y2 - 7 May 2020 through 9 May 2020
ER -