On the combination of relative clustering validity criteria

Vendramin, Lucas, Jaskowiak, Pablo A., and Campello, Ricardo J.G.B. (2013) On the combination of relative clustering validity criteria. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management. pp. 1-12. From: SSDBM: International Conference on Scientific and Statistical Database Management, 29-31 July 2013, Baltimore, MD, USA.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1145/2484838.248484...


Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.

Item ID: 46785
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4503-1921-8
Funders: CNPq, Brazil, FAPESP
Date Deposited: 16 May 2017 03:24
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 2
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page