A multi-resolution self-supervised learning framework for semantic segmentation in histopathology

Wang, Hao, Ahn, Euijoon, and Kim, Jinman (2024) A multi-resolution self-supervised learning framework for semantic segmentation in histopathology. Pattern Recognition. 110621. (In Press)

[img]
Preview
PDF (Accepted Author Manuscript) - Accepted Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview
View at Publisher Website: https://doi.org/10.1016/j.patcog.2024.11...
 
7


Abstract

Modern whole slide imaging technique together with supervised deep learning approaches have been advancing the field of histopathology, enabling accurate analysis of tissues. These approaches use whole slide images (WSIs) at various resolutions, utilising low-resolution WSIs to identify regions of interest in the tissue and high-resolution for detailed analysis of cellular structures. Due to the labour-intensive process of annotating gigapixels WSIs, accurate analysis of WSIs remains challenging for supervised approaches. Self-supervised learning (SSL) has emerged as an approach to build efficient and robust models using unlabelled data. It has been successfully used to pre-train models to learn meaningful image features which are then fine-tuned with downstream tasks for improved performance compared to training models from scratch. Yet, existing SSL methods optimised for WSI are unable to leverage the multi-resolutions and instead, work only in an individual resolution neglecting the hierarchical structure of multi-resolution inputs. This limitation prevents from the effective utilisation of complementary information between different resolutions, hampering discriminative WSI representation learning. In this paper we propose a Multi-resolution SSL Framework for WSI semantic segmentation (MSF-WSI) that effectively learns histopathological features. Our MSF-WSI learns complementary information from multiple WSI resolutions during the pre-training stage; this contrasts with existing works that only learn between the resolutions at the fine-tuning stage. Our pre-training initialises the model with a comprehensive understanding of multi-resolution features which can lead to improved performance in the subsequent tasks. To achieve this, we introduced a novel Context-Target Fusion Module (CTFM) and a masked jigsaw pretext task to facilitate the learning of multi-resolution features. Additionally, we designed Dense SimSiam Learning (DSL) strategy to maximise the similarities of image features from early model layers to enable discriminative learned representations. We evaluated our method using three public datasets on breast and liver cancer segmentation tasks. Our experiment results demonstrated that our MSF-WSI surpassed the accuracy of other state-of-the-art methods in downstream fine-tuning and semi-supervised settings.

Item ID: 82946
Item Type: Article (Research - C1)
ISSN: 1873-5142
Copyright Information: Accepted Version: © 2024. This manuscript version is made available under the the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Date Deposited: 11 Jun 2024 00:35
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4603 Computer vision and multimedia computation > 460306 Image processing @ 90%
42 HEALTH SCIENCES > 4299 Other health sciences > 429999 Other health sciences not elsewhere classified @ 10%
SEO Codes: 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 80%
28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280103 Expanding knowledge in the biomedical and clinical sciences @ 20%
Downloads: Total: 7
Last 12 Months: 7
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page