Hadooping the genome: The impact of big data tools on biology

Stevens, Hallam (2016) Hadooping the genome: The impact of big data tools on biology. BioSocieties, 11. pp. 352-371.

PDF (Author Accepted Version) - Accepted Version
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (413kB) | Preview
View at Publisher Website: https://doi.org/10.1057/s41292-016-0003-...


This essay examines the consequences of the so-called ‘big data’ technologies in biomedicine. Analyzing algorithms and data structures used by biologists can provide insight into how biologists perceive and understand their objects of study. As such, I examine some of the most widely used algorithms in genomics: those used for sequence comparison or sequence mapping. These algorithms are derived from the powerful tools for text searching and indexing that have been developed since the 1950s and now play an important role in online search. In biology, sequence comparison algorithms have been used to assemble genomes, process next-generation sequence data, and, most recently, for ‘precision medicine.’ I argue that the predominance of a specific set of text-matching and pattern-finding tools has influenced problem choice in genomics. It allowed genomics to continue to think of genomes as textual objects and to increasingly lock genomics into ‘big data’-driven text-searching methods. Many ‘big data’ methods are designed for finding patterns in human-written texts. However, genomes and other’ omic data are not human-written and are unlikely to be meaningful in the same way.

Item ID: 73185
Item Type: Article (Research - C1)
ISSN: 1745-8560
Keywords: Big data, DNA sequence, Genomics, Google, Hadoop
Copyright Information: © 2022 Springer Nature Switzerland AG. This is a post-peer-review, pre-copyedit version of an article published in BioSocieties. The definitive publisher-authenticated version Stevens, H. Hadooping the genome: The impact of big data tools on biology. BioSocieties 11, 352–371 (2016). https://doi.org/10.1057/s41292-016-0003-6 is available online at: https://link.springer.com/article/10.1057/s41292-016-0003-6
Date Deposited: 27 Jun 2022 00:39
FoR Codes: 44 HUMAN SOCIETY > 4410 Sociology > 441007 Sociology and social studies of science and technology @ 80%
31 BIOLOGICAL SCIENCES > 3102 Bioinformatics and computational biology > 310204 Genomics and transcriptomics @ 20%
SEO Codes: 13 CULTURE AND SOCIETY > 1399 Other culture and society > 139999 Other culture and society not elsewhere classified @ 100%
Downloads: Total: 563
Last 12 Months: 127
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page