home edit page issue tracker

This page pertains to UD version 2.

UD Maltese MUDT

Language: Maltese (code: mt)
Family: Afro-Asiatic, Semitic

This treebank has been part of Universal Dependencies since the UD v2.3 release.

The following people have contributed to making this treebank part of UD: Slavomír Čéplö, Daniel Zeman.

Repository: UD_Maltese-MUDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: news, legal, nonfiction, fiction, wiki

Questions, comments? General annotation questions (either Maltese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [bulbul (æt) bulbul • sk]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas not available
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features not available
Relations annotated manually, natively in UD style

Description

MUDT (Maltese Universal Dependencies Treebank) is a manually annotated treebank of Maltese, a Semitic language of Malta descended from North African Arabic with a significant amount of Italo-Romance influence. MUDT was designed as a balanced corpus with four major genres (see Splitting below) represented roughly equally.

Origin

This treebank is the product of the PhD thesis Constituent order in Maltese: A quantitative analysis by Slavomír Čéplö. The text (see References) contains a detailed description of the annotation decisions and composition of the treebank. The treebank was originally produced in accordance with UDv1, this version is brought up to the UDv2.3 standard.

Splitting

MUDT contains 2074 sentences and 44,162 tokens (both defined orthographically) in the following text types:

Text type Subtype Sentence count
newspaper news 239
  op-eds 240
  Subtotal 479
quasi-spoken newspaper interviews 280
  parliament: debates and Q&A 294
  Subtotal 574
fiction short stories 246
  novel chapters 251
  Subtotal 497
non-fiction humanities 249
  science, encyclopedic and instructional 275
  Subtotal 524
     
  Total 2074

The annotated sentences have been manually split into train, test and dev sets as follows:

File Sentence count Token count
mt_mudt-ud-train.conllu 1123 22880
mt_mudt-ud-test.conllu 518 11073
mt_mudt-ud-dev.conllu 433 10209

Acknowledgments

Statistics of UD Maltese MUDT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

Relations

acladvcladvmodamodapposauxaux:negaux:partaux:passcasecase:detccccompcompoundconjcopcsubjdepdetdiscoursedislocatedexplfixedflatflat:namegoeswithiobjlistmarknmodnmod:possnsubjnsubj:passnummodobjoblobl:agentobl:argorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview