Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms

Forest, Elizabeth, Swinbourne, Anne, Myers, Trina, and Scovell, Mitchell (2021) Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms. In: [Presented at the 36th IEEE/ACM International Conference on Automated Software Engineering]. From: ASE 2021: 36th IEEE/ACM International Conference on Automated Software Engineering, 14-20 November 2021, Online.

[img] PDF (ASE LBR Short Paper) - Accepted Version
Restricted to Repository staff only

[img]
Preview
PDF (Conference Presentation) - Presentation
Download (198kB) | Preview
View at Publisher Website: https://doi.org/10.48550/arXiv.2108.1105...
 
9


Abstract

When approaching a clustering problem, choosing the right clustering algorithm and parameters is essential, as each clustering algorithm is proficient at finding clusters of a particular nature. Due to the unsupervised nature of clustering algorithms, there are no ground truth values available for empirical evaluation, which makes automation of the parameter selection process through hyperparameter tuning difficult. Previous approaches to hyperparameter tuning for clustering algorithms have relied on internal metrics, which are often biased towards certain algorithms, or having some ground truth labels available, moving the problem into the semi-supervised space. This preliminary study proposes a framework for semi-automated hyperparameter tuning of clustering problems, using a grid search to develop a series of graphs and easy to interpret metrics that can then be used for more efficient domain-specific evaluation. Preliminary results show that internal metrics are unable to capture the semantic quality of the clusters developed and approaches driven by internal metrics would come to different conclusions than those driven by manual evaluation.

Item ID: 74259
Item Type: Conference Item (Poster)
Keywords: Machine Learning, Clustering Algorithms, Hyperparameter Tuning
Date Deposited: 26 May 2022 02:49
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4612 Software engineering > 461201 Automated software engineering @ 30%
46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461106 Semi- and unsupervised learning @ 70%
SEO Codes: 22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220403 Artificial intelligence @ 70%
28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280115 Expanding knowledge in the information and computing sciences @ 30%
Downloads: Total: 9
Last 12 Months: 2
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page