A practical, bioinformatic workflow system for large datasets generated by next-generation sequencing

Cantacessi, Cinzia, Jex, Aaron R., Hall, Ross S., Young, Neil D., Campbell, Bronwyn E., Joachim, Anja, Nolan, Matthew J., Abubucker, Sahar, Sternberg, Paul W., Ranganatha, Shoba, Mitreva , Makedonka, and Gasser, Robin B. (2010) A practical, bioinformatic workflow system for large datasets generated by next-generation sequencing. Nucleic Acids Research, 38 (17). e171. pp. 1-12.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1093/nar/gkq667
 
58
3


Abstract

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

Item ID: 26488
Item Type: Article (Research - C1)
ISSN: 0305-1048
Keywords: parasitic nematodes; bioinformatics; high-throughput sequencing; transcriptomics; genomics; computational biology; software development
Funders: Australian Research Council (ARC)
Date Deposited: 26 Apr 2013 02:33
FoR Codes: 08 INFORMATION AND COMPUTING SCIENCES > 0803 Computer Software > 080301 Bioinformatics Software @ 70%
06 BIOLOGICAL SCIENCES > 0604 Genetics > 060408 Genomics @ 30%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890299 Computer Software and Services not elsewhere classified @ 70%
92 HEALTH > 9201 Clinical Health (Organs, Diseases and Abnormal Conditions) > 920109 Infectious Diseases @ 20%
97 EXPANDING KNOWLEDGE > 970106 Expanding Knowledge in the Biological Sciences @ 10%
Downloads: Total: 3
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page