Kriterijų nustatymas anomalijų aptikimo algoritmo parinkimui
Pantechovskis, Aleksas |
In today’s world there is lots of data requiring automated processing: nobody can analyze and extract useful information from it manually. One of the existing processing modes is anomaly detection: detect failures, high traffic, dangerous states and so on. However, it often requires the developer or the user of such analysis systems to have a lot of knowledge on this subject making it less accessible. One of the aspects is the choice of a suitable algorithm and its parameters. The main goal of this work is to start creating guidelines or a decision tree to simplify the process of choosing the most suitable anomaly detection algorithm depending on the dataset characteristics and other requirements. This project was proposed by SAP and inspired by works of the Dawn research team from Stanford and their MacroBase system. In this work we review MacroBase architecture and functionality, describe commonly used real datasets for anomaly detection benchmarking and synthetic dataset generation methods, anomaly detection quality metrics, and develop a benchmarking platform, evaluate anomaly detection algorithms of different types: distance-based (MCOD), density-based (LOF), statistical (MAD, FastMCD, Percentile), Isolation forest.