6 million spam tweets: a large ground truth for timely Twitter spam detection

Chen, Chao, Zhang, Jun, Chen, Xiao, Xiang, Yang, and Zhou, Wanlei (2015) 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: Proceedings of the IEEE International Conference on Communications. pp. 7065-7070. From: ICC 2015: IEEE International Conference on Communications, 8-12 June 2015, London, UK.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1109/ICC.2015.7249453


Twitter has changed the way of communication and getting news for people's daily life in recent years. Meanwhile, due to the popularity of Twitter, it also becomes a main target for spamming activities. In order to stop spammers, Twitter is using Google SafeBrowsing to detect and block spam links. Despite that blacklists can block malicious URLs embedded in tweets, their lagging time hinders the ability to protect users in real-time. Thus, researchers begin to apply different machine learning algorithms to detect Twitter spam. However, there is no comprehensive evaluation on each algorithms' performance for real-time Twitter spam detection due to the lack of large groundtruth. To carry out a thorough evaluation, we collected a large dataset of over 600 million public tweets. We further labelled around 6.5 million spam tweets and extracted 12 light-weight features, which can be used for online detection. In addition, we have conducted a number of experiments on six machine learning algorithms under various conditions to better understand their effectiveness and weakness for timely Twitter spam detection. We will make our labelled dataset for researchers who are interested in validating or extending our work

Item ID: 64417
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4673-6432-4
Copyright Information: Copyright © 2015 IEEE.
Date Deposited: 08 Oct 2020 03:05
FoR Codes: 08 INFORMATION AND COMPUTING SCIENCES > 0803 Computer Software > 080303 Computer System Security @ 100%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890299 Computer Software and Services not elsewhere classified @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page