Document-level multi-topic sentiment classification of email data with BiLSTM and data augmentation
Liu, Sisi, Lee, Kyungmi, and Lee, Ickjai (2020) Document-level multi-topic sentiment classification of email data with BiLSTM and data augmentation. Knowledge Based Systems, 197. 105918.
PDF (Published Version)
- Published Version
Restricted to Repository staff only |
Abstract
Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.
Item ID: | 63010 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 1872-7409 |
Related URLs: | |
Copyright Information: | © 2020 Elsevier B.V. All rights reserved |
Additional Information: | A version of this publication was included as Chapter 6 of the following PhD thesis: Liu, Sisi (2020) Document-level sentiment analysis of email data. PhD thesis, James Cook University, which is available Open Access in ResearchOnline@JCU. Please see the Related URLs for access. |
Date Deposited: | 22 Jul 2020 02:24 |
FoR Codes: | 46 INFORMATION AND COMPUTING SCIENCES > 4605 Data management and data science > 460502 Data mining and knowledge discovery @ 100% |
SEO Codes: | 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890299 Computer Software and Services not elsewhere classified @ 100% |
Downloads: |
Total: 3 |
More Statistics |