Prompting Large Language Models for Topic Modeling

Wang, Han, Prakash, Nirmalendu, Hoang, Nguyen Khoi, Hee, Ming Shan, Naseem, Usman, and Lee, Roy Ka Wei (2023) Prompting Large Language Models for Topic Modeling. In: Proceedings of the 2023 IEEE International Conference on Big Data. pp. 1236-1241. From: BigData 2023: 2023 IEEE International Conference on Big Data, 15-18 December 2023, Sorrento, Italy.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1109/BigData59044.202...


Abstract

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics. Furthermore, qualitative analysis showcases PromptTopic's ability to uncover relevant topics in multiple datasets.

Item ID: 82402
Item Type: Conference Item (Research - E1)
ISBN: 9798350324457
Keywords: large language models, prompt engineering, topic modeling
Copyright Information: © 2023 IEEE
Date Deposited: 12 Mar 2024 23:42
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 100%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page