Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data

Sidorczuk, Katarzyna, Gagat, Przemyslaw, Pietluch, Filip, Kala, Jakub, Rafacz, Dominik, Bakala, Laura, Slowik, Jadwiga, Kolenda, Rafal, Rodiger, Stefan, Fingerhut, Legana C.H.W., Cooke, Ira R., Mackiewicz, Pawel, and Burdukiewicz, Michal (2022) Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Briefings in Bioinformatics, 23 (5). bbac343.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
View at Publisher Website: https://doi.org/10.1093/bib/bbac343
 
3
62


Abstract

Antimicrobial peptides (AMPs) are a heterogeneous group of short polypeptides that target not only microorganisms but also viruses and cancer cells. Due to their lower selection for resistance compared with traditional antibiotics, AMPs have been attracting the ever-growing attention from researchers, including bioinformaticians. Machine learning represents the most cost-effective method for novel AMP discovery and consequently many computational tools for AMP prediction have been recently developed. In this article, we investigate the impact of negative data sampling on model performance and benchmarking. We generated 660 predictive models using 12 machine learning architectures, a single positive data set and 11 negative data sampling methods; the architectures and methods were defined on the basis of published AMP prediction software. Our results clearly indicate that similar training and benchmark data set, i.e. produced by the same or a similar negative data sampling method, positively affect model performance. Consequently, all the benchmark analyses that have been performed for AMP prediction models are significantly biased and, moreover, we do not know which model is the most accurate. To provide researchers with reliable information about the performance of AMP predictors, we also created a web server AMPBenchmark for fair model benchmarking. AMP Benchmark is available at http://BioGenies.info/AMPBenchmark.

Item ID: 75949
Item Type: Article (Research - C1)
ISSN: 1477-4054
Keywords: antimicrobial peptides, benchmarks, machine learning, negative sampling, prediction, reproducibility
Copyright Information: © The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Date Deposited: 07 Sep 2022 09:03
FoR Codes: 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310206 Sequence analysis @ 25%
31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310201 Bioinformatic methods development @ 40%
49 MATHEMATICAL SCIENCES > 4905 Statistics > 490508 Statistical data science @ 35%
SEO Codes: 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280102 Expanding knowledge in the biological sciences @ 50%
28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 50%
Downloads: Total: 62
Last 12 Months: 8
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page