Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/53495
Type of publication: Straipsnis recenzuojamoje užsienio tarptautinės konferencijos medžiagoje / Article in peer-reviewed foreign international conference proceedings (P1d)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Šarkutė, Ligita;Utka, Andrius
Title: The Effect of author set size in authorship attribution for Lithuanian
Is part of: NODALIDA 2015 : proceedings of the 20th Nordic conference of computational linguistics, May 11–13, 2015, Institute of the Lithuanian language, Vilnius / editor Beata Megyesi. Linköping : Linköping University Electronic Press, 2015
Extent: p. 87-96
Date: 2015
Series/Report no.: (NEALT Proceedings, Vol. 23 1650-3740)
Note: ISSN (print): 1650-3638
Keywords: Autorystės nustatymas;Parlamento stenogramos;Authorship attribution;Parliamentary transcripts
ISBN: 9789175190983
Abstract: This paper reports the first authorship attribution results based on the effect of the author set size using automatic computational methods for the Lithuanian language. The aim is to determine how fast authorship attribution results are deteriorating while the number of candidate authors is gradually increasing: i.e. starting from 3, going up to 5, 10, 20, 50, and 100. Using supervised machine learning techniques we also investigated the influence of different features (lexical, character, morphological, etc.) and language types (normative parliamentary speeches and non-normative forum posts). The experiments revealed that the effectiveness of the method and feature types depends more on the language type rather than on the number of candidate authors. The content features based on word lemmas are the most useful type for the normative texts, due to the fact that Lithuanian is a highly inflective, morphologically and vocabulary rich language. The character features are the most accurate type for forum posts, where texts are too complicated to be effectively processed with external morphological tools
Internet: http://aclweb.org/anthology/W/W15/W15-1813.pdf
http://aclweb.org/anthology/W/W15/W15-1813.pdf
Affiliation(s): Informatikos fakultetas
Kauno technologijos universitetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml8.37 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

134
checked on Mar 5, 2020

Download(s)

12
checked on Mar 5, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.