Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/56632
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Šarkutė, Ligita;Utka, Andrius
Title: Automatic author profiling of Lithuanian parliamentary speeches : exploring the influence of features and dataset sizes
Is part of: Human language technologies - the Baltic perspective : proceedings of the 6th international conference, Baltic HLT 2014. Amsterdam : IOS Press, 2014
Extent: p. 99-106
Date: 2014
Series/Report no.: (Frontiers in artificial intelligence and applications. Vol. 268 0922-6389)
Keywords: Parliamentary speeches;Political view;Lithuania
ISBN: 9781614994411
Abstract: Extraction of demographic, cultural background characteristics or psychometric traits about an author from an anonymous text has a number of potential applications in such fields as forensics, security or user-targeted services. Despite significant advances in the automatic author profiling, the most of the research has been done on Germanic languages and not so much on morphologically rich languages. Consequently, this work is the first attempt at finding a good method for solving automatic author profiling in three dimensions for Lithuanian: age (6 categories), gender (2 categories) and political view (3 categories). To tackle this task we used the dataset, which contains text transcripts of Lithuanian parliamentary speeches and debates, thus representing formal spoken, but normative Lithuanian language. In our paper we explored different feature types (ultimate style markers, lexical, morphological, character, and aggregated) and dataset sizes (of 100, 200, 500, 1,000, 2,000, 5,000 instances in each category). The best results were obtained with Support Vector Machine method, the largest tested dataset and lemmas as features: i.e. 44.6% of accuracy for age with interpolation up to trigrams, 74.6% for gender and 58.7% for political view with interpolation up to bigrams
Internet: http://ebooks.iospress.nl/volumearticle/38011
Affiliation(s): Informatikos fakultetas
Kauno technologijos universitetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml11.07 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

132
checked on Mar 30, 2020

Download(s)

12
checked on Mar 30, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.