Efficient computation of multiple density-based clustering hierarchies
Araujo Neto, Antonio Cavalcante, Sander, Jörg, Campello, Ricardo J.G.B., and Nascimento, Mario A. (2017) Efficient computation of multiple density-based clustering hierarchies. In: Proceedings of the 17th IEEE International Conference on Data Mining. pp. 991-996. From: ICDM 2017: 17th IEEE International Conference on Data Mining, 18-21 November 2017, New Orleans, LA, USA.
PDF (Published Version)
- Published Version
Restricted to Repository staff only |
Abstract
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN* for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN* hierarchies for a range of mpts by replacing the graph used by HDBSCAN* with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN* about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.
Item ID: | 51876 |
---|---|
Item Type: | Conference Item (Research - E1) |
ISBN: | 978-1-5386-3835-4 |
Funders: | National Sciences and Engineering Research Council of Canada (NSERC), CNPq, Brazil |
Projects and Grants: | CNPq Science without Borders Program |
Date Deposited: | 08 Jan 2018 01:03 |
FoR Codes: | 49 MATHEMATICAL SCIENCES > 4905 Statistics > 490502 Biostatistics @ 100% |
SEO Codes: | 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100% |
Downloads: |
Total: 1 |
More Statistics |