Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/56634
Type of publication: Straipsnis konferencijos medžiagoje kitose duomenų bazėse / Article in conference proceedings in other databases (P1c)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Utka, Andrius;Šarkutė, Ligita
Title: Feature exploration for authorship attribution of Lithuanian parliamentary speeches
Is part of: Text, speech and dialogue : 17th international conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014 : proceedings. New York : Springer, 2014
Extent: p. 93-100
Date: 2014
Series/Report no.: (Lecture notes in computer science. Vol. 8655 0302-9743)
Keywords: Authorship attribution;Supervised ML;Lithuania
ISBN: 9783319108155
Abstract: This paper reports the first authorship attribution results based on the automatic computational methods for the Lithuanian language. Using supervised machine learning techniques we experimentally investigated the influence of different feature types (lexical, character, and syntactic) focusing on a few authors within three datasets, containing transcripts of the parliamentary speeches and debates. Due to our aim to keep as many interfering factors as possible to a minimum, all datasets were composed by selecting candidates having the same political views (avoiding ideology-based classification) from the overlapping parliamentary terms (avoiding topic classification task). Experiments revealed that content-based features are more useful compared with the function words or part-of-speech tags; moreover, lemma n-grams (sometimes used in concatenation with morphological information) outperform word or document-level character n-grams. Due to the fact that Lithuanian is highly inflective, morphologically and vocabulary rich; moreover, we were dealing with the normative language; therefore morphological tools were maximally helpful
Internet: https://hdl.handle.net/20.500.12259/56634
Affiliation(s): Informatikos fakultetas
Kauno technologijos universitetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml9.52 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

132
checked on Mar 5, 2020

Download(s)

12
checked on Mar 5, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.