Efficient computation of multiple density-based clustering hierarchies

Araujo Neto, Antonio Cavalcante, Sander, Jörg, Campello, Ricardo J.G.B., and Nascimento, Mario A. (2017) Efficient computation of multiple density-based clustering hierarchies. In: Proceedings of the 17th IEEE International Conference on Data Mining, pp. 991-996. From: ICDM 2017: 17th IEEE International Conference on Data Mining, 18-21 November 2017, New Orleans, LA, USA.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1109/ICDM.2017.12
 
1


Abstract

HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN* about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

Item ID: 51876
Item Type: Conference Item (Research - E1)
ISBN: 978-1-5386-3835-4
Funders: National Sciences and Engineering Research Council of Canada (NSERC), CNPq, Brazil
Projects and Grants: CNPq Science without Borders Program
Date Deposited: 08 Jan 2018 01:03
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page