Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/56839
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Krilavičius, Tomas
Title: Topic classification problem solving for morphologically complex languages
Is part of: ICIST 2016: 22nd international conference on Information and Software Technologies (ICIST), Kaunas University of Technology, October 13-15, 2016, Lithuania : proceedings / editors Giedre Dregvaite, Robertas Damasevicius. Cham : Springer, 2016
Extent: p. 511-524
Date: 2016
Note: This research is funded by ESFA (DADA, VP1-3.1-ŠMM-10-V-02-025)
Keywords: Topic classification;Supervised machine learning;Lexical and morpho-syntactic feature types;Lithuanian and Russian languages
ISBN: 9783319462530
Abstract: In this paper we are presenting a topic classification task for the morphologically complex Lithuanian and Russian languages, using popular supervised machine learning techniques. In our research we experimentally investigated two text classification methods and a big variety of feature types covering different levels of abstraction: character, lexical, and morpho-syntactic. In order to have comparable results for the both languages, we kept experimental conditions as similar as possible: the datasets were composed of the normative texts, taken from the news portals; contained similar topics; and had the same number of texts in each topic. The best results (*0.86 of the accuracy) were achieved with the Support Vector Machine method and the token lemmas as a feature representation type. The character feature type capturing relevant patterns of the complex inflectional morphology without any external morphological tools was the second best. Since these findings hold for the both Lithuanian and Russian languages, we assume, they should hold for the entire group of the Baltic and Slavic languages
Internet: https://hdl.handle.net/20.500.12259/56839
Affiliation(s): Informatikos fakultetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml8.24 kBXMLView/Open

MARC21 XML metadata

Show full item record
Export via OAI-PMH Interface in XML Formats
Export to Other Non-XML Formats

WEB OF SCIENCETM
Citations 5

1
checked on Jun 2, 2020

Page view(s)

132
checked on Jan 6, 2020

Download(s)

10
checked on Jan 6, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.