ALKSNIS v3.0
 ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) and CoNLL-U formats.
ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) and CoNLL-U formats.
The PML format allows researchers to visualise and edit syntactic trees by the editor TrED (https://ufal.mff.cuni.cz/tred/).
Each node of a tree corresponds to a word, a punctuation mark or other text element (symbol, digit etc.) within a sentence.
The following information is presented for each node:
- a used form;
- a lemma;
- a morphology tag, and
- a syntactic function (subject, object, etc.).
Dependencies are shown by links between words.
Syntactically annotated sentences are corrected according to guidelines that were created by scientists of VMU CCL, following rules of Prague Dependency Treebank. All the sentences are being manually checked and corrected by a group of linguists. The TreED editor and a style file is needed in order to view the files with .pml extension (with style file “antisDplus_schema“).
ALKSNIS v3.0 from v2 was developed during the Vytautas Magnus University project “Semantika2” (Nr. 02.3.1-CPVA-V-527-01-0002).
Modifications from v2 to 3.0 (2019-07-08):
- The older version undergone full review of syntactic information based on improved guidelines to enhance annotation quality.
- New layer added: non-compositional multiword expressions (light verbs and idioms).
- Added new data: scientific abstracts and reviews, additional administrative texts. – Schema version modified as 3.0.
- Jablonskis tagset, which is human-friendly, is used instead of MULTEXT-East tagset.
- Some syntactic relations were corrected or modified (details to be published in the improved guidelines).
- CoNLL-U files are added together with the pml files (CoNLL-U files do not keep the mwe field).
ALKSNIS v2.1
ALKSNIS v2.1 consists of 2,355 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit syntactic trees by the editor TrED (https://ufal.mff.cuni.cz/tred/).
Each node of a tree corresponds to a word, a punctuation mark or other text element (symbol, digit etc.) within a sentence.
The following information is presented for each node:
- a used form;
- a lemma;
- a morphology tag, and
- a syntactic function (subject, object, etc.).
Dependencies are shown by links between words. The morphology tag set of the corpus is based on the MULTEXT-East format (http://nl.ijs.si/ME/V4/msd/html/index.html). Syntactically annotated sentences are corrected according to guidelines that were created by scientists of VMU CCL, following rules of Prague Dependency Treebank. All the sentences are being manually checked and corrected by a group of linguists. The TreED editor and a style file is needed in order to view the files with .pml extension (with style file “antisDplus_schema“).
Reference: Bielinskienė A., Boizou L., Kovalevskaitė J., Rimkutė E. 2016: Lithuanian Dependency Treebank ALKSNIS. Proceedings of the Seventh International Conference Baltic HLT 2016. Amsterdam: IOS Press, 107–114. http://ebooks.iospress.nl/volumearticle/45523
