Evolving clusters in gene-expression data

Hruschka, Eduardo R., Campello, Ricardo J.G.B., and de Castro, Leandro N. (2006) Evolving clusters in gene-expression data. Information Sciences, 176 (13). pp. 1898-1927.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1016/j.ins.2005.07....
87


Abstract

Clustering is a useful. exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k.

Item ID: 47074
Item Type: Article (Research - C1)
ISSN: 1872-6291
Keywords: clustering, evolutionary computation, bioinformatics, gene-expression data
Funders: São Paulo Research Foundation (FAPESP), Brazilian National Council for Scientific and Technological Development (CNPq)
Date Deposited: 04 Jan 2017 08:04
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page