A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict

Thapa, Surendrabikram, Shah, Aditya, Jafri, Farhan Ahmad, Naseem, Usman, and Razzak, Imran (2022) A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict. In: Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. From: CASE 2022: the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 7-8 December 2022, Abu Dhabi, United Arab Emirates.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
View at Publisher Website: https://doi.org/10.18653/v1/2022.case-1....
 
12
575


Abstract

Hate speech consists of types of content (e.g. text, audio, image) that express derogatory sentiments and hate against certain people or groups of individuals. The internet, particularly social media and microblogging sites, have become an increasingly popular platform for expressing ideas and opinions. Hate speech is prevalent in both offline and online media. A substantial proportion of this kind of content is presented in different modalities (e.g. text, image, video). Taking into account that hate speech spreads quickly during political events, we present a novel multimodal dataset composed of 5680 text-image pairs of tweets data related to the Russia-Ukraine war and annotated with a binary class:”hate” or”no-hate” The baseline results show that multimodal resources are relevant to leverage the hateful information from different types of data. The baselines and dataset provided in this paper may boost researchers in direction of multimodal hate speech, mainly during serious conflicts such as war contexts.

Item ID: 79258
Item Type: Conference Item (Research - E1)
ISBN: 9781959429050
Copyright Information: Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
Date Deposited: 09 Aug 2023 02:02
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460307 Multimodal analysis and synthesis @ 50%
46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 50%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
Downloads: Total: 575
Last 12 Months: 51
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page