Comparison of distributed evolutionary k-means clustering algorithms

Naldi, M.C., and Campello, Ricardo R.J.G.B. (2015) Comparison of distributed evolutionary k-means clustering algorithms. Neurocomputing, 163. pp. 78-93.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1016/j.neucom.2014....
 
24
3


Abstract

Dealing with distributed data is one of the challenges for clustering, as most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable, and easily modifiable to a variety of contexts and application domains. However, exact distributed versions of k-means are still sensitive to the selection of the initial cluster prototypes and require the number of clusters to be specified in advance. Additionally, preserving data privacy among repositories may be a complicating factor. In order to overcome k-means limitations, two different approaches were adopted in this paper: the first obtains a final model identical to the centralized version of the clustering algorithm and the second generates and selects clusters for each distributed data subset and combines them afterwards. It is also described how to apply the algorithms compared while preserving data privacy. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The results obtained indicate which algorithm is more suitable for each application scenario.

Item ID: 47613
Item Type: Article (Research - C1)
ISSN: 1872-8286
Keywords: distributed clustering, evolutionary k-means, privacy preservation, low data transmission
Funders: CNPq-Brazil, São Paulo Research Foundation (FAPESP), FAPEMIG
Projects and Grants: CNPq #304137/2013-8, FAPESP #2013/18698-4, FAPEMIG #APQ-00156-14
Date Deposited: 08 Mar 2017 07:40
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 3
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page