Model selection for semi-supervised clustering

Pourrajabi, Mojgan, Moulavi, Davoud, Campello, Ricardo J.G.B., Zimek, Arthur, Sander, Jörg, and Goebel, Randy (2014) Model selection for semi-supervised clustering. In: Proceedings of the 17th International Conference on Extending Database Technology. pp. 331-342. From: EDBT 2014: International Conference on Extending Database Technology, 24-28 March 2014, Athens, Greece.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (393kB) | Preview
View at Publisher Website: http://dx.doi.org/10.5441/002/edbt.2014....
 
123


Abstract

Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information (labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.

Item ID: 47736
Item Type: Conference Item (Research - E1)
ISBN: 978-3-89318065-3
Related URLs:
Additional Information:

© 2014, Copyright is with the authors. Published in Proc. 17th International Conference on Extending Database Technology (EDBT), March 24-28, 2014, Athens, Greece: ISBN 978-3-89318065-3, on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0

Date Deposited: 14 Mar 2017 00:28
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 123
Last 12 Months: 6
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page