Machine Learning-Based Genome-Wide Salivary DNA Methylation Analysis for Identification of Noninvasive Biomarkers in Oral Cancer Diagnosis

Adeoye, John, Wan, Chi Ching Joan, Zheng, Li Wu, Thomson, Peter, Choi, Siu Wai, and Su, Yu Xiong (2022) Machine Learning-Based Genome-Wide Salivary DNA Methylation Analysis for Identification of Noninvasive Biomarkers in Oral Cancer Diagnosis. Cancers, 14. 4935.

PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (3MB) | Preview
View at Publisher Website:


This study aims to examine the feasibility of ML-assisted salivary-liquid-biopsy platforms using genome-wide methylation analysis at the base-pair and regional resolution for delineating oral squamous cell carcinoma (OSCC) and oral potentially malignant disorders (OPMDs). A nested cohort of patients with OSCC and OPMDs was randomly selected from among patients with oral mucosal diseases. Saliva samples were collected, and DNA extracted from cell pellets was processed for reduced-representation bisulfite sequencing. Reads with a minimum of 10× coverage were used to identify differentially methylated CpG sites (DMCs) and 100 bp regions (DMRs). The performance of eight ML models and three feature-selection methods (ANOVA, MRMR, and LASSO) were then compared to determine the optimal biomarker models based on DMCs and DMRs. A total of 1745 DMCs and 105 DMRs were identified for detecting OSCC. The proportion of hypomethylated and hypermethylated DMCs was similar (51% vs. 49%), while most DMRs were hypermethylated (62.9%). Furthermore, more DMRs than DMCs were annotated to promoter regions (36% vs. 16%) and more DMCs than DMRs were annotated to intergenic regions (50% vs. 36%). Of all the ML models compared, the linear SVM model based on 11 optimal DMRs selected by LASSO had a perfect AUC, recall, specificity, and calibration (1.00) for OSCC detection. Overall, genome-wide DNA methylation techniques can be applied directly to saliva samples for biomarker discovery and ML-based platforms may be useful in stratifying OSCC during disease screening and monitoring.

Item ID: 77635
Item Type: Article (Research - C1)
ISSN: 2072-6694
Keywords: biomarkers, diagnosis, DNA methylation, epigenomics, oral cancer, oral potentially malignant disorders
Copyright Information: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Date Deposited: 16 Mar 2023 01:28
FoR Codes: 32 BIOMEDICAL AND CLINICAL SCIENCES > 3211 Oncology and carcinogenesis > 321109 Predictive and prognostic markers @ 100%
SEO Codes: 20 HEALTH > 2001 Clinical health > 200101 Diagnosis of human diseases and conditions @ 100%
Downloads: Total: 489
Last 12 Months: 52
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page