A systematic comparative evaluation of biclustering techniques

Padilha, Victor A., and Campello, Ricardo (2017) A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18:55. pp. 1-25.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview
View at Publisher Website: http://dx.doi.org/10.1186/s12859-017-148...
 
24
75


Abstract

Background: Biclustering techniques are capable of simultaneously clustering rows and columns of a data matrix. These techniques became very popular for the analysis of gene expression data, since a gene can take part of multiple biological pathways which in turn can be active only under specific experimental conditions. Several biclustering algorithms have been developed in the past recent years. In order to provide guidance regarding their choice, a few comparative studies were conducted and reported in the literature. In these studies, however, the performances of the methods were evaluated through external measures that have more recently been shown to have undesirable properties. Furthermore, they considered a limited number of algorithms and datasets.

Results: We conducted a broader comparative study involving seventeen algorithms, which were run on three synthetic data collections and two real data collections with a more representative number of datasets. For the experiments with synthetic data, five different experimental scenarios were studied: different levels of noise, different numbers of implanted biclusters, different levels of symmetric bicluster overlap, different levels of asymmetric bicluster overlap and different bicluster sizes, for which the results were assessed with more suitable external measures. For the experiments with real datasets, the results were assessed by gene set enrichment and clustering accuracy.

Conclusions: We observed that each algorithm achieved satisfactory results in part of the biclustering tasks in which they were investigated. The choice of the best algorithm for some application thus depends on the task at hand and the types of patterns that one wants to detect.

Item ID: 47672
Item Type: Article (Research - C1)
ISSN: 1471-2105
Keywords: clustering; biclustering; gene explosion
Additional Information:

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Funders: São Paulo Research Foundation (FAPESP), Brazilian National Research Foundation (CNPq)
Projects and Grants: FAPESP grant #2014/08840-0, FAPEPS grant #2013/18698-4, CNPq grant #30413/2013-8
Date Deposited: 12 Mar 2017 23:58
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 75
Last 12 Months: 14
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page