Please use this identifier to cite or link to this item:https://hdl.handle.net/20.500.12259/47858
Type of publication: Straipsnis konferencijos medžiagoje Clarivate Analytics Web of Science ar/ir Scopus / Article in Clarivate Analytics Web of Science or Scopus DB conference proceedings (P1a)
Field of Science: Informatika / Informatics (N009)
Author(s): Kapočiūtė-Dzikienė, Jurgita;Utka, Andrius;Šarkutė, Ligita
Title: Authorship attribution of internet comments with thousand candidate authors
Is part of: ICIST 2015 : Information and software technologies : 21sth international conference, Druskininkai, Lithuania, October 15-16 2015 : proceedings / editors Dregvaite, Giedre, Damasevicius, Robertas. Berlin : Springer International Publishing, 2015
Extent: p. 433-448
Date: 2015
Series/Report no.: (Communications in Computer and Information Science, Vol. 538 1865-0929)
Note: Online ISBN 978-3-319-24770-0
Keywords: Similiarity-based paradigm;Internet comments;Randomized feature set;Lithuanian language
ISBN: 9783319247694
Abstract: In this paper we report the first authorship attribution results for the Lithuanian language using Internet comments with a thousand of candidate authors. The task is complicated due to the following reasons: large number of candidate authors, extremely short non-normative texts, and problems associated with morphologically and vocabulary rich language. The effectiveness of the proposed similarity-based method was investigated using lexical, morphological, and character features; as well as several dimensionality reduction techniques. Marginally the best results were obtained with the word-level character tetra-grams and entire feature set. However, the technique based on the randomized feature sets even using a few thousands of features achieved very similar performance levels, besides it outperformed method’s implementations based on the sophisticated feature ranking. The best obtained f − score and accuracy values exceeded random and majority baselines by more than 10.9 percentage points
Internet: https://hdl.handle.net/20.500.12259/47858
Affiliation(s): Informatikos fakultetas
Kauno technologijos universitetas
Taikomosios informatikos katedra
Vytauto Didžiojo universitetas
Appears in Collections:Universiteto mokslo publikacijos / University Research Publications

Files in This Item:
marc.xml9.36 kBXMLView/Open

MARC21 XML metadata

Show full item record

Page view(s)

148
checked on Mar 5, 2020

Download(s)

14
checked on Mar 5, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.