Advancing automated cell type annotation with large language models and single-cell isoform sequencing

Wijewardena, Hettiarachchige, Bhatia, Saloni, Bhattacharya, Namrata, Sengupta, Debarka, Wu, Siyuan, and Schmitz, Ulf (2025) Advancing automated cell type annotation with large language models and single-cell isoform sequencing. Computational and Structural Biotechnology Journal, 27. pp. 4952-4962.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
View at Publisher Website: https://www.sciencedirect.com/science/ar...


Abstract

Accurate cell type identification is critical for interpreting single-cell transcriptomic data and understanding complex biological systems. In this review, we discuss how natural language processing and large language models can enhance the accuracy and scalability of cell type annotation. We also highlight how emerging single-cell long-read sequencing technologies enable isoform-level transcriptomic profiling, offering higher resolution than conventional gene expression-based methods and providing opportunities to redefine cell types. By integrating the insights of key technical and algorithmic advances across sequencing and computational approaches, we provide a unified overview of recent developments that are reshaping automated cell type annotation and improving the precision of biological interpretation.

Item ID: 89620
Item Type: Article (Research - C1)
ISSN: 2001-0370
Copyright Information: © 2025 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ).
Funders: National Health and Medical Research Council (NHMRC)
Projects and Grants: NHMRC Grant #1196405
Date Deposited: 16 Nov 2025 22:23
FoR Codes: 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310201 Bioinformatic methods development @ 10%
31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310204 Genomics and transcriptomics @ 40%
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461199 Machine learning not elsewhere classified @ 50%
SEO Codes: 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280102 Expanding knowledge in the biological sciences @ 40%
28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 60%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page