On the internal evaluation of unsupervised outlier detection

Marques, Henrique O., Campello, Ricardo J.G.B., Zimek, Arthur, and Sander, Jörg (2015) On the internal evaluation of unsupervised outlier detection. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management. 7. From: SSBDM 2015: 26th International Conference on Scientific and Statistical Database Management, 29 June - 1 July 2015, La Jolla, CA, USA.

[img] PDF (Published Version) - Published Version
Restricted to Repository staff only

View at Publisher Website: http://dx.doi.org/10.1145/2791347.279135...


Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of top-n (binary) outlier detection results. Specifically, we propose an index called IREOS (Internal, Relative Evaluation of Outlier Solutions) that can evaluate and compare different candidate labelings of a collection of multivariate observations in terms of outliers and inliers. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real data sets.

Item ID: 47059
Item Type: Conference Item (Research - E1)
ISBN: 978-1-4503-3709-0
Keywords: outlier detection, unsupervised evaluation, validation
Funders: FAPESP Brazil, National Science and Engineering Research Council of Canada
Projects and Grants: FAPESP grant #304137/2013-8, FAPESP grant #400772/2014-0
Date Deposited: 04 Jan 2017 08:04
FoR Codes: 01 MATHEMATICAL SCIENCES > 0104 Statistics > 010401 Applied Statistics @ 100%
SEO Codes: 97 EXPANDING KNOWLEDGE > 970101 Expanding Knowledge in the Mathematical Sciences @ 100%
Downloads: Total: 3
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page