Resources

During the project, the following corpora were built:

  • English-Lithuanian Parallel Cybersecurity Corpus (approx. 1.4 million words)
  • English-Lithuanian Comparable Cybersecurity Corpus (approx. 4 million words)

Corpora are available in CLARIN-LT repository:
https://clarin.vdu.lt/xmlui/handle/20.500.11821/46
https://clarin.vdu.lt/xmlui/handle/20.500.11821/47

Clarin

Gold standard corpora with manually annotated cybersecurity terms have been created for the training of neural networks. The software QuickTag, designed specifically for the purposes of the project, has been used for the annotation of the terms.