Biomedical named entity recognition using natural language processing
Raza, Shaina, Bashir, Syed Raza, Thakkar, Vidhi, and Naseem, Usman (2023) Biomedical named entity recognition using natural language processing. In: Gupta, Akshansh, Verma, Hanuman, Prasad, Mukesh, Singh Kirar, Jyoti, and Lin, C.T., (eds.) Computational Intelligence Aided Systems for Healthcare Domain. CRC Press, Boca Raton, FL, USA, pp. 191-211.
PDF (Published Version)
- Published Version
Restricted to Repository staff only |
Abstract
Motivation: Clinical entities are a type of entity in biomedical research that can be used in a named entity recognition (NER) task to extract biomedical information. However, even the most recent state-of-the-art biomedical NER methods are trained on limited clinical entities (genes, proteins, diseases, chemicals) and do not take many other biomedical entity types into account.
Methodology: In this chapter, we propose a medical text mining pipeline that improves on previous efforts and recognizes a large number of biomedical entity types, such as those related to medical risk factors, vital signs, detailed biomedical entity types (biological functions, organs, and so). At a high level, this pipeline consists of the following phases: pre-processing, tokenization, data transformation, embedding lookup, and machine learning modelling for the biomedical NER task with hyperparameter optimization and evaluation, which is easily configurable, reusable and can scale up for training and inference. Additionally, we also de-identify the patients’ personal information by the Protected Health Information privacy rule through this pipeline.
Results: Experimental results show that our pipeline outperforms state-of-the-art methods on two benchmark datasets, as well as on our own COVID-19 publications dataset, with a micro average F1 score of 91.34 and a macro average F1 score of 92.78.
Item ID: | 79248 |
---|---|
Item Type: | Book Chapter (Research - B1) |
ISBN: | 9781003368342 |
Copyright Information: | © 2023 selection and editorial matter, Akshansh Gupta, Hanuman Verma, Mukesh Prasad, Jyoti Singh Kirar and C.T Lin; individual chapters, the contributors. |
Date Deposited: | 30 Jan 2024 02:22 |
FoR Codes: | 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 100% |
SEO Codes: | 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 100% |
Downloads: |
Total: 1 |
More Statistics |