Identification of Disease or Symptom terms in Reddit to Improve Health Mention Classification

Naseem, Usman, Kim, Jinman, Khushi, Matloob, and Dunn, Adam G. (2022) Identification of Disease or Symptom terms in Reddit to Improve Health Mention Classification. In: Proceedings of the ACM Web Conference 2022. pp. 2573-2581. From: WWW '22: the ACM Web Conference 2022, 25-29 April 2022, Lyon, France.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1145/3485447.3512129


Abstract

In a user-generated text such as on social media platforms and online forums, people often use disease or symptom terms in ways other than to describe their health. In data-driven public health surveillance, the health mention classification (HMC) task aims to identify posts where users are discussing health conditions rather than using disease and symptom terms for other reasons. Existing computational research typically only studies health mentions in Twitter, with limited coverage of disease or symptom terms, ignore user behavior information, and other ways people use disease or symptom terms. To advance the HMC research, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD consists of 10,015 manually labeled Reddit posts that mention 15 common disease or symptom terms and are annotated with four labels: namely personal health mentions, non-personal health mentions, figurative health mentions, and hyperbolic health mentions. With RHMD, we propose HMCNET that combines a target keyword (disease or symptom term) identification and user behavior hierarchically to improve HMC. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods with an F1-Score of 0.75 (an increase of 11% over the state-of-the-art) and shows that our new dataset poses a strong challenge to the existing HMC methods.

Item ID: 79234
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4503-9096-5
Copyright Information: Copyright © 2022 by the Association for Computing Machinery, Inc.
Date Deposited: 11 Jul 2023 00:53
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 100%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page