Sarcasm Detection in Multilingual Text through Embedding-Enhanced Language Models: BERT Variants

Islam, Muhammad, and Azhar, Muhammad (2024) Sarcasm Detection in Multilingual Text through Embedding-Enhanced Language Models: BERT Variants. In: Proceedings of the IEEE International Multi Topic Conference. From: INMIC 2024: IEEE 26th International Multi-Topic Conference, 30-31 December 2024, Karachi, Pakistan.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1109/INMIC64792.2024....


Abstract

The rise of social media has amplified sarcastic content, where literal and intended meanings often diverge. Sarcasm poses significant challenges for sentiment analysis due to its subtle linguistic nuances. Traditional models struggle with sarcasm detection because of their limited contextual understanding and reliance on handcrafted features. This study curates a large dataset of English headline news, translated into Urdu to create a multilingual dataset. BERT variants, such as mBERT and UrduBERT, are employed to capture context and learn relevant features using deep learning architectures. The study compares the performance of these BERT variants on the dataset and evaluates traditional algorithms for classifying sarcastic and non-sarcastic headlines. Model performance is rigorously assessed using precision, recall, and F1-score metrics. Results indicate that fine-tuned mBERT outperforms other BERT variants and traditional models in sarcasm detection by effectively capturing complex semantic patterns. The dataset and code are publicly available on GitHub.https://github.com/MislamSatti/Sarcasam-Detection-with-code

Item ID: 87380
Item Type: Conference Item (Research - E1)
ISBN: 9798331507213
ISSN: 2835-8864
Keywords: BERT variants, DL, Embedding, ML, Multi-Head Attention, NLP, Pre-Trained
Date Deposited: 25 Nov 2025 02:28
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 100%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page