Asymmetric self-learning for tackling Twitter spam drift

Chen, Chao, Zhang, Jun, Xiang, Yang, and Zhou, Wanlei (2015) Asymmetric self-learning for tackling Twitter spam drift. In: Proceedings of the IEEE Conference on Computer Communications Workshops. pp. 208-213. From: INFOCOM WKSHPS 2015: IEEE Conference on Computer Communications Workshops, 26 April - 1 May 2015, Hong Kong.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1109/INFCOMW.2015.717...
 
1


Abstract

Spam has become a critical problem on Twitter. In order to stop spammers, security companies apply blacklisting services to filter spam links. However, over 90% victims will visit a new malicious link before it is blocked by blacklists. To eliminate the limitation of blacklists, researchers have proposed a number of statistical features based mechanisms, and applied machine learning techniques to detect Twitter spam. In our labelled large dataset, we observe that the statistical properties of spam tweets vary over time, and thus the performance of existing ML based classifiers are poor. This phenomenon is referred as "Twitter Spam Drift". In order to tackle this problem, we carry out deep analysis of 1 million spam tweets and 1 million non-spam tweets, and propose an asymmetric self-learning (ASL) approach. The proposed ASL can discover new information of changed tweeter spam and incorporate it into classifier training process. A number of experiments are performed to evaluate the ASL approach. The results show that the ASL approach can be used to significantly improve the spam detection accuracy of using traditional ML algorithms.

Item ID: 64416
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4673-7131-5
Copyright Information: © 2015 IEEE
Date Deposited: 08 Oct 2020 01:54
FoR Codes: 08 INFORMATION AND COMPUTING SCIENCES > 0803 Computer Software > 080303 Computer System Security @ 100%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890299 Computer Software and Services not elsewhere classified @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page