RNNs for Lithuanian multiword expressions identification
Author | Affiliation | |||
---|---|---|---|---|
Baltijos pažangiųjų technologijų institutas | LT | |||
Baltijos pažangiųjų technologijų institutas | LT | |||
Date |
---|
2018 |
We discuss an experiment on automatic identification of multiword expressions (MWEs) in Lithuanian corpus. Our training dataset was annotated morphologically (POS tagger). It was manually annotated with MWEs by 4 linguists as well. We also used word embeddings in our feature set. Deep learning methods are widely used in many NLP tasks and applications including MWEs identification. Thus, our experimental setup included deep learning methods (Recurrent Neural Networks; RNNs) and was used for automatic identification of contiguous and non-contiguous MWEs of different length. Best results (44.9% F1-Score) were achieved with RNNs and Stochastic Gradient Descent as optimizer together with Categorical Cross Entropy as loss function.
eISSN 2071-2987. This research was funded by the Research Council of Lithuania (No. LIP-027/2016)). This volume is comprised of research papers from the International Conference on Recent Advancements in Computing, Internet of Things (IoT) and Computer Engineering Technology (CICET), October 29-31, 2018, Taipei, Taiwan. CICET 2018 is hosted and organized by The Tamkang University