Statistical features-based real-tme detection of drifted Twitter spam

Chen, Chao, Wang, Yu, Zhang, Jun, Xiang, Yang, Zhou, Wanlei, and Min, Geyong (2017) Statistical features-based real-tme detection of drifted Twitter spam. IEEE Transactions on Information Forensics and Security, 12 (4). pp. 914-925.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi-org.elibrary.jcu.edu.au/10.1...
 
72
1


Abstract

Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as “Twitter Spam Drift”. In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover “changed” spam tweets from unlabeled tweets and incorporate them into classifier's training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.

Item ID: 64424
Item Type: Article (Research - C1)
ISSN: 1556-6021
Keywords: social network security, twitter spam detection, machine learning
Copyright Information: © 2016 IEEE.
Additional Information:

This article is available Open Access via the publisher's website.

Funders: Australian Research Council (ARC), National Natural Science Foundation of China (NNSF)
Projects and Grants: ARC Linkage grant LP120200266, NNSF grant 61401371
Date Deposited: 27 Sep 2020 21:07
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4604 Cybersecurity and privacy > 460499 Cybersecurity and privacy not elsewhere classified @ 100%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8903 Information Services > 890399 Information Services not elsewhere classified @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page