How does dimensionality influence outlier detection effectiveness in multivariate geochemical data? insights from LOF and IF methods
Shahrestani, Shahed, and Sanislav, Ioan (2025) How does dimensionality influence outlier detection effectiveness in multivariate geochemical data? insights from LOF and IF methods. Earth Science Informatics, 18. 27.
PDF (Published Version)
- Published Version
Restricted to Repository staff only |
Abstract
This paper examines the impact of the curse of dimensionality on the performance of isolation forest (IF) and local outlier factor (LOF) in detecting mineralization-related geochemical anomalies from a high-dimensional geochemical dataset. Using subsets selected through random and supervised methods with varying dimensions, IF and LOF were tested against known mineral deposit locations to assess their effectiveness. This study evaluates the percentage of mineral occurrences classified as anomalies and the area under the ROC curve across different dimensionalities. Furthermore, the influence of dimension reduction techniques such as PCA and ISOMAP on IF and LOF performance is explored. IF demonstrates consistent performance, proving robust across various dimensions and particularly suited to high-dimensional datasets. In contrast, LOF displays sensitivity to dimensionality, with optimal performance in lower dimensions (5 to 10 variables) but diminishing effectiveness beyond this range. This sensitivity highlights the importance of judicious input variable selection for LOF to achieve effective anomaly detection in geochemical datasets. Additionally, this study reveals that the performance of IF remains stable with both PCA and ISOMAP, whereas LOF benefits more from PCA, where its variance-maximizing feature may retain sufficient structural integrity for effective anomaly detection. Conversely, the performance of LOF declines with ISOMAP due to its more significant impact on local density changes. This variation underscores the need for a careful selection of dimension reduction methods and the number of components used as input for outlier detection methods.
Item ID: | 84318 |
---|---|
Item Type: | Article (Research - C1) |
ISSN: | 1865-0473 |
Copyright Information: | © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. |
Date Deposited: | 17 Dec 2024 22:14 |
FoR Codes: | 37 EARTH SCIENCES > 3703 Geochemistry > 370301 Exploration geochemistry @ 50% 37 EARTH SCIENCES > 3705 Geology > 370508 Resource geoscience @ 40% 37 EARTH SCIENCES > 3703 Geochemistry > 370399 Geochemistry not elsewhere classified @ 10% |
SEO Codes: | 28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280107 Expanding knowledge in the earth sciences @ 100% |
More Statistics |