TY - JOUR
T1 - Efficient active learning of sparse halfspaces with arbitrary bounded noise
AU - Zhang, Chicheng
AU - Shen, Jie
AU - Awasthi, Pranjal
N1 - Funding Information: Jie Shen is supported by NSF-IIS-1948133 and the startup funding of Stevens Institute of Technology. Chicheng Zhang would like to thank Ning Hao and Helen Hao Zhang for helpful discussions on marginal screening for variable selection, which inspired the averaging-based initialization procedure in this paper. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We study active learning of homogeneous s-sparse halfspaces in Rd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most ? for a parameter ? 2 [0, 12), known as the bounded noise. Even in the presence of mild label noise, i.e. ? is a small constant, this is a challenging problem and only recently have label complexity bounds of the form Õ (s · polylog (d, 1e)) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity Õ((s lned)2poly(1/(1-2?))), which is label-efficient only when the noise rate ? is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of ssparse halfspaces, with a label complexity of Õ((1-s2?)4 polylog (d, 1e)). This is the first efficient algorithm with label complexity polynomial in 1-12? in this setting, which is label-efficient even for ? arbitrarily close to 12 . Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.
AB - We study active learning of homogeneous s-sparse halfspaces in Rd under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most ? for a parameter ? 2 [0, 12), known as the bounded noise. Even in the presence of mild label noise, i.e. ? is a small constant, this is a challenging problem and only recently have label complexity bounds of the form Õ (s · polylog (d, 1e)) been established in [Zhang 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result [Awasthi et al. 2016] provides a computationally efficient algorithm with label complexity Õ((s lned)2poly(1/(1-2?))), which is label-efficient only when the noise rate ? is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of ssparse halfspaces, with a label complexity of Õ((1-s2?)4 polylog (d, 1e)). This is the first efficient algorithm with label complexity polynomial in 1-12? in this setting, which is label-efficient even for ? arbitrarily close to 12 . Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise.
UR - http://www.scopus.com/inward/record.url?scp=85107890809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107890809&partnerID=8YFLogxK
M3 - Conference article
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -