Communities validity: methodical evaluation of community mining algorithms

Rabbany, Reihaneh, Takaffoli, Mansoureh, Fagnan, Justin, Zaïane, Osmar R., and Campello, Ricardo J.G.B. (2013) Communities validity: methodical evaluation of community mining algorithms. Social Network Analysis and Mining, 3 (4). pp. 1039-1062.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1007/s13278-013-013...
 
6
1


Abstract

Grouping data points is one of the fundamental tasks in data mining, which is commonly known as clustering if data points are described by attributes. When dealing with interrelated data, that is represented in the form a graph wherein a link between two nodes indicates a relationship between them, there has been a considerable number of approaches proposed in recent years for mining communities in a given network. However, little work has been done on how to evaluate the community mining algorithms. The common practice is to evaluate the algorithms based on their performance on standard benchmarks for which we know the ground-truth. This technique is similar to external evaluation of attribute-based clustering methods. The other two well-studied clustering evaluation approaches are less explored in the community mining context; internal evaluation to statistically validate the clustering result and relative evaluation to compare alternative clustering results. These two approaches enable us to validate communities discovered in a real-world application, where the true community structure is hidden in the data. In this article, we investigate different clustering quality criteria applied for relative and internal evaluation of clustering data points with attributes and also different clustering agreement measures used for external evaluation and incorporate proper adaptations to make them applicable in the context of interrelated data. We further compare the performance of the proposed adapted criteria in evaluating community mining results in different settings through extensive set of experiments.

Item ID: 46786
Item Type: Article (Research - C1)
ISSN: 1869-5469
Keywords: evaluation approaches, quality measures, clustering evaluation, clustering objective function, community mining
Funders: Alberta Innovates Centre for Machine Learning (AICML), Natural Sciences and Engineering Research Council of Canada (NSERC), São Paulo Research Foundation (FAPESP), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Date Deposited: 09 Mar 2017 23:38
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page