How Much Data Do You Need? An Analysis of Pelvic Multi-Organ Segmentation in a Limited Data Context
Lunardo, Febrio, Baker, Laura, Tan, Alex, Baines, John, Squire, Timothy, Dowling, Jason A., Rahimi Azghadi, Mostafa, and Gillman, Ashley G. (2025) How Much Data Do You Need? An Analysis of Pelvic Multi-Organ Segmentation in a Limited Data Context. Physical and Engineering Sciences in Medicine. (In Press)
|
PDF (Accepted Publisher Version)
- Accepted Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Training deep learning models generally requires large, costly datasets which can limit their application towards in-house segmentation tasks. This study investigates the trade-off in dataset size within the context of pelvic multi-organ MR segmentation where we evaluate the performance of nnU-Net, a well-known segmentation model, under conditions of limited domain and data availability. 12 participants undergoing treatment on an Elekta Unity were recruited, acquiring 58 MR images, with 4 participants (12 images) withheld for testing. Prostate, seminal vesicles (SV), bladder and rectum were contoured in each image by a radiation oncologist. Seven models were trained on progressively smaller subsets of the training dataset, simulating a limited dataset setting. To investigate the efficacy of data augmentation, another set of identical models were trained without augmentation. The performance of the networks was evaluated via the Dice Similarity Coefficient, mean surface distance, and 95% Hausdorff distance metrics. When trained with entire training dataset (46 images), the model achieved a mean Dice coefficient of 0.903 (Prostate), 0.851 (SV), 0.884 (Rectum) and 0.967 (Bladder). Segmentation performance remained stable when the number of training sets was > 12 images from 4 participants, but rapidly dropped in smaller data subsets. Data augmentation was found to be influential across all dataset sizes, but especially in very small datasets. This study demonstrated nnU-Net's proficiency in performing male pelvic multi-organ segmentation under a limited domain, a single scanner, and under limited data constraints. We found that the performance degradation was often modest until a threshold is reached (12 images), below which it dropped significantly. Data augmentation improved performance across all data sizes, but especially for very small datasets. We conclude that nnU-Net’s low data requirement can be advantageous for in-house cases with consistent protocol and scarce data availability.
Item ID: | 84525 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 2662-4737 |
Copyright Information: | Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2025 |
Research Data: | https://link.springer.com/article/10.1007/s13246-024-01514-w |
Date Deposited: | 25 Mar 2025 21:36 |
FoR Codes: | 42 HEALTH SCIENCES > 4203 Health services and systems > 420308 Health informatics and information systems @ 100% |
SEO Codes: | 20 HEALTH > 2001 Clinical health > 200199 Clinical health not elsewhere classified @ 100% |
Downloads: |
Total: 3 Last 12 Months: 3 |
More Statistics |