TY - GEN
T1 - Deep Headline Generation for Clickbait Detection
AU - Shu, Kai
AU - Wang, Suhang
AU - Le, Thai
AU - Lee, Dongwon
AU - Liu, Huan
N1 - Funding Information: This material is based upon work supported by, or in part by, the National Science Foundation (NSF) under grant #1614576 and Office of Naval Research (ONR) under grant N00014- 17-1-2605. Thai Le and Dongwon Lee are supported by NSF awards #1422215, #1663343, #1742702, and #1820609 Funding Information: This material is based upon work supported by, or in part by, the National Science Foundation (NSF) under grant #1614576 and Office of Naval Research (ONR) under grant N00014-17-1-2605. Thai Le and Dongwon Lee are supported by NSF awards #1422215, #1663343, #1742702, and #1820609. Publisher Copyright: © 2018 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.
AB - Clickbaits are catchy social posts or sensational headlines that attempt to lure readers to click. Clickbaits are pervasive on social media and can have significant negative impacts on both users and media ecosystems. For example, users may be misled to receive inaccurate information or fall into click-jacking attacks. Similarly, media platforms could lose readers' trust and revenues due to the prevalence of clickbaits. To computationally detect such clickbaits on social media using a supervised learning framework, one of the major obstacles is the lack of large-scale labeled training data, due to the high cost of labeling. With the recent advancements of deep generative models, to address this challenge, we propose to generate synthetic headlines with specific styles and explore their utilities to help improve clickbait detection. In particular, we propose to generate stylized headlines from original documents with style transfer. Furthermore, as it is non-trivial to generate stylized headlines due to several challenges such as the discrete nature of texts and the requirements of preserving semantic meaning of document while achieving style transfer, we propose a novel solution, named as Stylized Headline Generation (SHG), that can not only generate readable and realistic headlines to enlarge original training data, but also help improve the classification capacity of supervised learning. The experimental results on real-world datasets demonstrate the effectiveness of SHG in generating high-quality and high-utility headlines for clickbait detection.
KW - Clickbait detection
KW - Data augmentation
KW - Deep generative model
UR - http://www.scopus.com/inward/record.url?scp=85061372085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061372085&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00062
DO - 10.1109/ICDM.2018.00062
M3 - Conference contribution
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 467
EP - 476
BT - 2018 IEEE International Conference on Data Mining, ICDM 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
Y2 - 17 November 2018 through 20 November 2018
ER -