Efficient hierarchical clustering for single-dimensional data using CUDA

Rehn, Adam, Possemiers, Aidan, and Holdsworth, Jason (2018) Efficient hierarchical clustering for single-dimensional data using CUDA. In: Proceedings of ACSW 2018 Australiasian Computer Science Week Multiconference. 14. From: ACSW 2018 Australian Computer Science Week 2018, 30 January - 2 February 2018, Brisbane, QLD, Australia.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: https://doi.org/10.1145/3167918.3167929
 
2


Abstract

Hierarchical clustering is a widely-used and well-researched clustering technique. The classical algorithm for agglomerative hierarchical clustering is prohibitively expensive for use with large datasets. Numerous algorithms exist to improve the efficiency of hierarchical clustering for various linkage metrics, and for large datasets. Recent research has proposed approaches for improving the efficiency of hierarchical clustering through parallelism. The newest approaches utilise GPGPU technologies, which facilitate massive parallelism on commodity consumer hardware. Existing GPGPU implementations fail to maximise the number of merges that can be performed in parallel, and feature high use of memory. These limitations prevent existing implementations from achieving the full performance offered by GPGPU. In this paper, we propose a novel GPGPU algorithm for hierarchical clustering of single-dimensional data. Our proposed algorithm exploits the unique characteristics of one-dimensional data to maximise merge parallelism and significantly reduce memory requirements. Validation demonstrates that our proposed algorithm produces equivalent results to the classical algorithm for both the single-linkage and complete-linkage metrics. Benchmarking results show that our algorithm scales well to large datasets, and offers a substantial speed-up over the classical algorithm. Future work will look to extend our proposed approach to larger datasets with higher dimensions.

Item ID: 52776
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4503-5436-3
Keywords: agglomerative clustering, GPU acceleration, parallel
Date Deposited: 13 Jul 2018 00:12
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4606 Distributed computing and systems software > 460699 Distributed computing and systems software not elsewhere classified @ 90%
46 INFORMATION AND COMPUTING SCIENCES > 4612 Software engineering > 461299 Software engineering not elsewhere classified @ 10%
SEO Codes: 89 INFORMATION AND COMMUNICATION SERVICES > 8902 Computer Software and Services > 890205 Information Processing Services (incl. Data Entry and Capture) @ 100%
Downloads: Total: 2
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page