Determining criteria for choosing anomaly detection algorithm

Pantechovskis, Aleksas

Use this url to cite ETD: https://hdl.handle.net/20.500.12259/92994

Kriterijų nustatymas anomalijų aptikimo algoritmo parinkimui

Type of publication (PDB)

Magistro darbas / Master Thesis

Area of Science

Field of Science

Type of publication

type::text::thesis::master thesis

Title

Kriterijų nustatymas anomalijų aptikimo algoritmo parinkimui

Other Title

Determining criteria for choosing anomaly detection algorithm

Author

Pantechovskis, Aleksas

Advisor

Krilavičius, Tomas

Extent

61 p.

Thesis Defence Date

2019-05-21

Keywords (lt)

Keywords (en)

Abstract (en)

In today’s world there is lots of data requiring automated processing: nobody can analyze and extract useful information from it manually. One of the existing processing modes is anomaly detection: detect failures, high traffic, dangerous states and so on. However, it often requires the developer or the user of such analysis systems to have a lot of knowledge on this subject making it less accessible. One of the aspects is the choice of a suitable algorithm and its parameters. The main goal of this work is to start creating guidelines or a decision tree to simplify the process of choosing the most suitable anomaly detection algorithm depending on the dataset characteristics and other requirements. This project was proposed by SAP and inspired by works of the Dawn research team from Stanford and their MacroBase system. In this work we review MacroBase architecture and functionality, describe commonly used real datasets for anomaly detection benchmarking and synthetic dataset generation methods, anomaly detection quality metrics, and develop a benchmarking platform, evaluate anomaly detection algorithms of different types: distance-based (MCOD), density-based (LOF), statistical (MAD, FastMCD, Percentile), Isolation forest.