Relative clustering validity criteria: a comparative overview

Vendramin, Lucas, Campello, Ricardo J.G.B., and Hruschka, Eduardo R. (2010) Relative clustering validity criteria: a comparative overview. Statistical Analysis and Data Mining, 3. pp. 209-235.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1002/sam.10080
 
4


Abstract

Many different relative clustering validity criteria exist that are very useful in practice as quantitative measures for evaluating the quality of data partitions, and new criteria have still been proposed from time to time. These criteria are endowed with particular features that may make each of them able to outperform others in specific classes of problems. In addition, they may have completely different computational requirements. Then, it is a hard task for the user to choose a specific criterion when he or she faces such a variety of possibilities. For this reason, a relevant issue within the field of clustering analysis consists of comparing the performances of existing validity criteria and, eventually, that of a new criterion to be proposed. In spite of this, the comparison paradigm traditionally adopted in the literature is subject to some conceptual limitations. The present paper describes an alternative, possibly complementary methodology for comparing clustering validity criteria and uses it to make an extensive comparison of the performances of 40 criteria over a collection of 962,928 partitions derived from five well-known clustering algorithms and 1080 different data sets of a given class of interest. A detailed review of the relative criteria under investigation is also provided that includes an original comparative asymptotic analysis of their computational complexities. This work is intended to be a complement of the classic study reported in 1985 by Milligan and Cooper as well as a thorough extension of a preliminary paper by the authors themselves.

Item ID: 46794
Item Type: Article (Research - C1)
ISSN: 1932-1872
Keywords: clustering; validation; relative criteria
Funders: Brazilian National Council for Scientific and Technological Development (CNPq), São Paulo Research Foundation (FAPESP)
Date Deposited: 10 Mar 2017 04:43
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 4
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page