Efficient COI barcoding using high throughput single-end 400 bp sequencing

Yang, Chentao, Zheng, Yuxuan, Tan, Shangjin, Meng, Guanliang, Rao, Wei, Yang, Caiqing, Bourne, David G., O'Brien, Paul A., Xu, Junqiang, Liao, Sha, Chen, Ao, Chen, Xiaowei, Jia, Xinrui, Zhang, Ai-bing, and Liu, Shanlin (2020) Efficient COI barcoding using high throughput single-end 400 bp sequencing. BMC Genomics, 21. 862.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (3MB) | Preview
View at Publisher Website: https://doi.org/10.1186/s12864-020-07255...
 
16
789


Abstract

Background

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system).

Results

Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5′ and 3′ ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.

Conclusions

The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

Item ID: 66609
Item Type: Article (Research - C1)
ISSN: 1471-2164
Keywords: Biodiversity, COI, DNA barcode, High-throughput sequencing, MGISEQ-2000, SE400
Copyright Information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Funders: Chinese Postdoctoral Science Foundation (CPSF), Shenzen Municipal Government of China (SMGC), Shenzen Peacock Plan (SPP), China National Funds for Distinguished Young Scientists (CNFDYS)
Projects and Grants: CPSF 2019M660051, SMGC No. JCYJ20170817150755701, SPP No. KQTD2015033017150531, CNFDYS grant no. 31425023
Date Deposited: 21 May 2021 06:10
FoR Codes: 31 BIOLOGICAL SCIENCES > 3105 Genetics > 310509 Genomics @ 60%
41 ENVIRONMENTAL SCIENCES > 4104 Environmental management > 410401 Conservation and biodiversity @ 40%
Downloads: Total: 789
Last 12 Months: 8
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page