Use this url to cite dataset: https://hdl.handle.net/20.500.12259/274272
Lithuanian morphologically annotated corpus - MATAS v3.0
Type of document
dataset::survey data
Author(s)
Title [en]
Lithuanian morphologically annotated corpus - MATAS v3.0
Art Work Nature
3.0
Publisher
Vytauto Didžiojo universitetas / Vytautas Magnus University
Date Issued
2024-12-20
Keywords (lt)
Keywords (en)
Abstract (en)
MATAS corpus (version 3.0). Description - updated, manually checked, morphologically annotated corpus MATAS. Language - Lithuanian. Previous versions - 1. MATAS v0.2 (http://hdl.handle.net/20.500.11821/9) 2. MATAS v1.0 (http://hdl.handle.net/20.500.11821/33). Formats, standarts: 1. CoNLL-U (https://universaldependencies.org/format.html); 2. JABLONSKIS tagset v2 (https://sitti.vdu.lt/jablonskis-en/); 3. MULTEXT-East tagset (http://nl.ijs.si/ME/V4/msd/html/index.html); 4. UTF-8. Size - tokens (incl. punctuation): 2,137,287; words: 1,694,819; sentences: 144,047; documents: 1,234. Genres - contains 5 genres: documents (14%), fiction (19%), periodicals (36%), scientific texts (24%), transcripts (7%).
Is Referenced by
CLARIN-LT
Language
Lietuvių / Lithuanian (lt)
URI
URI | Access Rights |
---|---|
https://hdl.handle.net/20.500.12259/274272 | |
Lithuanian morphologically annotated corpus - MATAS v1.0 | |
https://doi.org/10.7220/20.500.12259/274272 | |
Duomenys CLARIN-LT platformoje | Duomenų rinkinys (tik metaduomenys) / Dataset (Only Metadata) |
Lithuanian morphologically annotated corpus - MATAS v0.2 |
Affiliation(s)