Relative validity criteria for community mining algorithms

Rabbany, Reihaneh, Takaffoli, Mansoreh, Fagnan, Justin, Zaïane, Osmar R., and Campello, Ricardo (2014) Relative validity criteria for community mining algorithms. In: Alhajj, Reda, and Rokne, Jon, (eds.) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY, USA, pp. 1562-1576.

PDF (Published Version) - Published Version
Restricted to Repository staff only

DOI: 10.1007/978-1-4614-6170-8_356

View at Publisher Website: http://dx.doi.org/10.1007/978-1-4614-617...

Abstract

[Extract] The recent growing trend in the Data Mining field is the analysis of structured/interrelated data, motivated by the natural presence of relationships between data points in a variety of the present-day applications. The structures in these interrelated data are usually represented using networks, known as complex networks or information networks; examples are the hyperlink networks of web pages, citation or collaboration networks of scholars, biological networks of genes or proteins, trust and social networks of humans, and much more.

All these networks exhibit common statistical properties, such as power law degree distribution, small-world phenomenon, relatively high transitivity, shrinking diameter, and densification power laws (Newman 2010; Leskovec et al. 2005). Network clustering, a.k.a. community mining, is one of the principal tasks in the analysis of complex networks. Many community mining algorithms have been proposed in recent years (for a recent survey, refer to Fortunato 2010). These algorithms evolved very quickly from simple heuristic approaches to more sophisticated optimization-based methods that are explicitly or implicitly trying to maximize the goodness of the discovered communities. The broadly used explicit maximization objective is the modularity, first introduced by Newman and Girvan 2004.

Although there have been many methods presented for detecting communities, very little work has been done on how to evaluate the results and validate these methods. The difficulties of evaluation are due to the fact that the interesting communities that have to be discovered are hidden in the structure of the network; thus, the true results are not known for comparison. Furthermore, there are no other means to measure the goodness of the discovered communities in a real network. We also do not have any large enough dataset with known communities, often called ground truth, to use as a benchmark to generally test and validate the algorithms. The common practice is to use synthetic benchmark networks and compare the discovered communities with the built-in ground truth. However, it is shown that the networks generated with the current benchmarks disagree with some of the characteristics of real networks. These facts motivate investigating a proper objective for evaluation of community mining results.


Item ID:	47683
Item Type:	Book Chapter (Reference)
ISBN:	978-1-4614-6169-2
Keywords:	clustering evaluation; clustering objective function; community mining; evaluation approaches; graph clustering; graph partitioning; quality measures
Funders:	Alberta Innovates Centre for Machine Learning (AICML), Natural Sciences and Engineering Research Council of Canada (NSERC), São Paulo Research Foundation (FAPESP), Brazilian National Council for Scientific and Technological Development (CNPq)
Date Deposited:	11 May 2017 03:17
FoR Codes:	01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes:	97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads:	Total: 7
	More Statistics

Actions (Repository Staff Only)

Item Control Page