An ensemble learning approach for addressing the class imbalance problem in twitter spam detection

Liu, Shigang, Wang, Yu, Chen, Chao, and Xiang, Yang (2016) An ensemble learning approach for addressing the class imbalance problem in twitter spam detection. In: Lecture Notes in Computer Science (9722) pp. 215-228. From: ACISP 2016: 21st Australasian Conference on Information Security and Privacy, 4-6 July 2016, Melbourne, VIC, Australia.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1007/978-3-319-40253-...
 
1


Abstract

Being an important source for real-time information dissemination in recent years, Twitter is inevitably a prime target of spammers. It has been showed that the damage caused by Twitter spam can reach far beyond the social media platform itself. To mitigate the threat, a lot of recent studies use machine learning techniques to classify Twitter spam and report very satisfactory results. However, most of the studies overlook a fundamental issue that is widely seen in real-world Twitter data, i.e., the class imbalance problem. In this paper, we show that the unequal distribution between spam and non-spam classes in the data has a great impact on spam detection rate. To address the problem, we propose an ensemble learning approach, which involves three steps. In the first step, we adjust the class distribution in the imbalanced data set using various strategies, including random oversampling, random undersampling and fuzzy-based oversampling. In the next step, a classification model is built upon each of the redistributed data sets. In the final step, a majority voting scheme is introduced to combine all the classification models. Experimental results obtained using real-world Twitter data indicate that the proposed approach can significantly improve the spam detection rate in data sets with imbalanced class distribution.

Item ID: 64421
Item Type: Conference Item (Research - E1)
ISBN: 978-3-319-40252-9
Keywords: Online social networks, Twitter spam, Machine learning, Class imbalance
Copyright Information: © Springer International Publishing Switzerland 2016
Date Deposited: 30 Sep 2020 23:33
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4604 Cybersecurity and privacy > 460499 Cybersecurity and privacy not elsewhere classified @ 100%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890299 Computer Software and Services not elsewhere classified @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page