Distributed k-means clustering with low transmission cost

Naldi, Murilo Coelho, and Campello, Ricardo José Gabrielle Barreto (2013) Distributed k-means clustering with low transmission cost. In: Proceedings of the 2013 Brazilian Conference on Intelligent Systems, pp. 70-75. From: BRACIS 2013: Brazilian Conference on Intelligent Systems, 20-24 October 2013, Fortaleza, Brazil.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1109/BRACIS.2013.20
 
3
1


Abstract

Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.

Item ID: 47650
Item Type: Conference Item (Research - E1)
ISBN: 978-0-7695-5092-3
Keywords: clustering, distributed data sets, k-means, low data transfer
Funders: CNPq, FAPESP
Date Deposited: 15 Jun 2017 01:22
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page