This page pertains to UD version 2.

UD Maltese MUDT

Language: Maltese (code: mt)
Family: Afro-Asiatic, Semitic

This treebank has been part of Universal Dependencies since the UD v2.3 release.

The following people have contributed to making this treebank part of UD: Slavomír Čéplö, Daniel Zeman.

Repository: UD_Maltese-MUDT
License: CC BY-SA 4.0

Genre: news, legal, nonfiction, fiction, wiki

General annotation questions (either Maltese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github.

Annotation Source
Lemmas not available
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features not available
Relations annotated manually, natively in UD style


MUDT (Maltese Universal Dependencies Treebank) is a manually annotated treebank of Maltese, a Semitic language of Malta descended from North African Arabic with a significant amount of Italo-Romance influence. MUDT was designed as a balanced corpus with four major genres (see Splitting below) represented roughly equally.


This treebank is the product of the PhD thesis Constituent order in Maltese: A quantitative analysis by Slavomír Čéplö. The text (see References) contains a detailed description of the annotation decisions and composition of the treebank. The treebank was originally produced in accordance with UDv1, this version is brought up to the UDv2.3 standard.


MUDT contains 2074 sentences and 44,162 tokens (both defined orthographically) in the following text types:

Text type Subtype Sentence count
newspaper news 239
  op-eds 240
  Subtotal 479
quasi-spoken newspaper interviews 280
  parliament: debates and Q&A 294
  Subtotal 574
fiction short stories 246
  novel chapters 251
  Subtotal 497
non-fiction humanities 249
  science, encyclopedic and instructional 275
  Subtotal 524
  Total 2074

The annotated sentences have been manually split into train, test and dev sets as follows:

File Sentence count Token count
mt_mudt-ud-train.conllu 1123 22880
mt_mudt-ud-test.conllu 518 11073
mt_mudt-ud-dev.conllu 433 10209


