Use this url to cite publication: https://hdl.handle.net/20.500.12259/36090
An overview of Lithuanian internet media n-gram corpus
Type of publication
Straipsnis konferencijos medžiagoje Scopus duomenų bazėje / Article in conference proceedings in Scopus database (P1a2)
Author(s)
Author | Affiliation | |||
---|---|---|---|---|
LT | Baltijos pažangiųjų technologijų institutas | LT | ||
Baltijos pažangiųjų technologijų institutas | LT | Vilniaus universitetas | ||
Title [en]
An overview of Lithuanian internet media n-gram corpus
Is part of
CEUR workshop proceedings [electronic resource]: SYSTEM 2017: proceedings of the symposium for Young Scientists in Technology, Engineering and Mathematics, Kaunas, Lithuania, April 28, 2017. Aachen : CEUR-WS, 2017, Vol. 1853
Date Issued
Date |
---|
2017 |
Publisher
Aachen : CEUR-WS
Is Referenced by
Extent
p. 24-28
Abstract (en)
This paper describes construction and properties of the open 70 million words Lithuanian Internet media n-gram corpus. Due to copyright limitations often contemporary media based resources availability is restricted, while n-grams corpora (e.g., Google N-gram viewer/corpus) solve the problem. Lithuanian language is under-resourced, hence n-gram corpus of Lithuanian media is designed to contribute to publicly available ready-to-use lexical resources. In this paper we report corpus construction procedure, preprocessing, corpus statistics and possible areas of application.
Type of document
type::text::journal::journal article::research article
Language
Anglų / English (en)
Coverage Spatial
Vokietija / Germany (DE)
File(s)
ISSN (of the container)
1613-0073
Other Identifier(s)
VDU02-000022098
Access Rights
Atviroji prieiga / Open Access
Creative Commons License
Journal | Cite Score | SNIP | SJR | Year | Quartile |
---|---|---|---|---|---|
CEUR Workshop Proceedings | 0.6 | 0.346 | 0.167 | 2017 | Q4 |