K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering
Naseem, Usman, Khushi, Matloob, Dunn, Adam G., and Kim, Jinman (2024) K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering. IEEE Journal of Biomedical and Health Informatics, 28 (4). pp. 1886-1895.
PDF (Publisher Accepted Version)
- Published Version
Restricted to Repository staff only |
Abstract
Pathology imaging is routinely used to detect the underlying effects and causes of diseases or injuries. Pathology visual question answering (PathVQA) aims to enable computers to answer questions about clinical visual findings from pathology images. Prior work on PathVQA has focused on directly analyzing the image content using conventional pretrained encoders without utilizing relevant external information when the image content is inadequate. In this paper, we present a knowledge-driven PathVQA (K-PathVQA), which uses a medical knowledge graph (KG) from a complementary external structured knowledge base to infer answers for the PathVQA task. K-PathVQA improves the question representation with external medical knowledge and then aggregates vision, language, and knowledge embeddings to learn a joint knowledge-image-question representation. Our experiments using a publicly available PathVQA dataset showed that our K-PathVQA outperformed the best baseline method with an increase of 4.15% in accuracy for the overall task, an increase of 4.40% in open-ended question type and an absolute increase of 1.03% in closed-ended question types. Ablation testing shows the impact of each of the contributions. Generalizability of the method is demonstrated with a separate medical VQA dataset.
Item ID: | 79528 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 2168-2208 |
Keywords: | Bioinformatics, Medical diagnostic imaging, Medical Visual Question Answering, Multimodal Representation, Pathology, Pathology Images, Task analysis, Training, Transformers, Visualization |
Date Deposited: | 27 Jul 2023 00:01 |
FoR Codes: | 46 INFORMATION AND COMPUTING SCIENCES > 4602 Artificial intelligence > 460208 Natural language processing @ 75% 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460307 Multimodal analysis and synthesis @ 25% |
SEO Codes: | 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 100% |
Downloads: |
Total: 1 |
More Statistics |