TY - GEN
T1 - Unsupervised Cyberbullying Detection via Time-Informed Gaussian Mixture Model
AU - Cheng, Lu
AU - Shu, Kai
AU - Wu, Siqi
AU - Silva, Yasin N.
AU - Hall, Deborah L.
AU - Liu, Huan
N1 - Funding Information: This work was in part supported by the National Science Foundation (NSF) Grants 1719722 and 1614576. Publisher Copyright: © 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - Social media is a vital means for information-sharing due to its easy access, low cost, and fast dissemination characteristics. However, increases in social media usage have corresponded with a rise in the prevalence of cyberbullying. Most existing cyberbullying detection methods aresupervised and, thus, have two key drawbacks: (1) The data labeling process is often time-consuming and labor-intensive; (2) Current labeling guidelines may not be generalized to future instances because of different language usage and evolving social networks. To address these limitations, this work introduces a principled approach forunsupervised cyberbullying detection. The proposed model consists of two main components: (1) Arepresentation learning network that encodes the social media session by exploiting multi-modal features, e.g., text, network, and time. (2) Amulti-task learning network that simultaneously fits the comment inter-arrival times and estimates the bullying likelihood based on a Gaussian Mixture Model. The proposed model jointly optimizes the parameters of both components to overcome the shortcomings of decoupled training. Our core contribution is an unsupervised cyberbullying detection model that not only experimentally outperforms the state-of-the-art unsupervised models, but also achieves competitive performance compared to supervised models.
AB - Social media is a vital means for information-sharing due to its easy access, low cost, and fast dissemination characteristics. However, increases in social media usage have corresponded with a rise in the prevalence of cyberbullying. Most existing cyberbullying detection methods aresupervised and, thus, have two key drawbacks: (1) The data labeling process is often time-consuming and labor-intensive; (2) Current labeling guidelines may not be generalized to future instances because of different language usage and evolving social networks. To address these limitations, this work introduces a principled approach forunsupervised cyberbullying detection. The proposed model consists of two main components: (1) Arepresentation learning network that encodes the social media session by exploiting multi-modal features, e.g., text, network, and time. (2) Amulti-task learning network that simultaneously fits the comment inter-arrival times and estimates the bullying likelihood based on a Gaussian Mixture Model. The proposed model jointly optimizes the parameters of both components to overcome the shortcomings of decoupled training. Our core contribution is an unsupervised cyberbullying detection model that not only experimentally outperforms the state-of-the-art unsupervised models, but also achieves competitive performance compared to supervised models.
KW - Gaussian mixture model
KW - cyberbullying detection
KW - representation learning
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85095864189&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095864189&partnerID=8YFLogxK
U2 - 10.1145/3340531.3411934
DO - 10.1145/3340531.3411934
M3 - Conference contribution
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 185
EP - 194
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Y2 - 19 October 2020 through 23 October 2020
ER -