Canfam GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C

Field, Matt A., Rosen, Benjamin D., Dudchenko, Olga, Chan, Eva K. F., Minoche, Andre E., Edwards, Richard J., Barton, Kirston, Lyons, Ruth J., Tuipulotu, Daniel Enosi, Hayes, Vanessa M., Omer, Arina D., Colaric, Zane, Keilwagen, Jens, Skvortsova, Ksenia, Bogdanovic, Ozren, Smith, Martin A., Aiden, Erez Lieberman, Smith, Timothy P. L., Zammit, Robert A., and Ballard, J. William O. (2020) Canfam GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C. GigaScience, 9 (4). pp. 1-12.

[img]
Preview
PDF (Published version) - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview
View at Publisher Website: https://10.1093/gigascience/giaa027
 
21
981


Abstract

Background: The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties.

Findings: Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy.

Conclusions: GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology.

Item ID: 63612
Item Type: Article (Research - C1)
ISSN: 2047-217X
Keywords: Hi-C; long-read sequencing; optical mapping; de novo genome assembly; canine hip dysplasia; DNA Zoo
Copyright Information: © The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Funders: Australian Health Foundation, University of New South Wales (UNSW), Vanessa M. Hayes, DNA Zoo Consortium, National Science Foundation (NSF), Welch Foundation (WF), USDA, USA, National Institute of Health (NIH), USA, Australian Research Council (ARC), Australian Government National Collaborative Research Infrastructure Strategy, New South Wales Government
Projects and Grants: Hip2Fit Crowdfunding initiative, NSF Physics Frontiers Center award PHY1427654, WF Q-1866, USDA Agriculture and Food Research Initiative Grant (2017–05741), NIH 4D Nucleome Grant (U01HL130010), NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375), ARC LE150100031
Research Data: http://dx.doi.org/10.5524/100712.
Date Deposited: 13 Aug 2020 03:08
FoR Codes: 31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310204 Genomics and transcriptomics @ 50%
31 BIOLOGICAL SCIENCES > 3105 Genetics > 310509 Genomics @ 50%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970106 Expanding Knowledge in the Biological Sciences @ 100%
Downloads: Total: 981
Last 12 Months: 93
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page