TY - GEN
T1 - Leveraging knowledge across media for spammer detection in microblogging
AU - Hu, Xia
AU - Tang, Jiliang
AU - Liu, Huan
PY - 2014
Y1 - 2014
N2 - While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.
AB - While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.
KW - Cross-media mining
KW - Emails
KW - SMS
KW - Social media
KW - Spammer detection
KW - Twitter
KW - Web
UR - http://www.scopus.com/inward/record.url?scp=84904552431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904552431&partnerID=8YFLogxK
U2 - 10.1145/2600428.2609632
DO - 10.1145/2600428.2609632
M3 - Conference contribution
SN - 9781450322591
T3 - SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 547
EP - 556
BT - SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
T2 - 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
Y2 - 6 July 2014 through 11 July 2014
ER -