Use this url to cite publication: https://hdl.handle.net/20.500.12259/57508
Options
Application of machine learning for MWE identification
Type of publication
Konferencijų tezės nerecenzuojamame leidinyje / Conference theses in non-peer-reviewed publication (T2)
Author(s)
Baltijos pažangiųjų technologijų institutas | LT | |||
Baltijos pažangiųjų technologijų institutas | LT | LT | ||
LT | Baltijos pažangių technologijų institutas, Vilnius | LT |
Title
Application of machine learning for MWE identification
Is part of
Data analysis methods for software systems – DAMSS: 9th International Workshop, Druskininkai, Lithuania, November 30-December 2, 2017 / editor Jolita Bernatavičienė. Vilnius : Vilnius University Institute of Data Science and Digital Technologies, 2017
Date Issued
Date Issued |
---|
2017 |
Publisher
Vilnius : Vilnius University Institute of Data Science and Digital Technologies, 2017
Publisher (trusted)
Index Copernicus |
Extent
p. 10-10
Field of Science
Abstract
Identification of Multiword Expressions is an important problem in Natural Language Processing, especially for machine translation and other semantic analysis tasks. Often, lexical association measures (LAM), such as pointwise mutual information (PMI), log likelihood ratio (LLR), Dice are used to identify MWE's. However, just LAMs are insufficient for MWE detection, especially for Lithuanian language, but could be very useful as additional features for Machine Learning (ML) algorithms. Early experiments with Lithuanian and Latvian languages show that using Random Forest with Resample filter, we can achieve almost 99% precision, 58% recall and 73% F-score. We discuss experiments with delfi.lt based corpora, different features, including LAMs, as well as experiments with different ML methods, i.e., Naive Bayes, Random Forests, Support Vector Machines, Artificial Neural Networks and others.
Type of document
type::text::conference output::conference proceedings::conference paper
Language
Anglų / English (en)
Coverage Spatial
Lietuva / Lithuania (LT)