home edit page issue tracker

This page pertains to UD version 2.

UD Danish DDT

Language: Danish (code: da)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.1 release.

The following people have contributed to making this treebank part of UD: Anders Johannsen, Héctor Martínez Alonso, Barbara Plank.

Repository: UD_Danish-DDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: news, fiction, spoken, nonfiction

Questions, comments? General annotation questions (either Danish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS not available
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

The Danish UD treebank is a conversion of the Danish Dependency Treebank.

The Danish UD treebank has been converted from the Danish Dependency Treebank (Buch-Kromman,2003) into Universal Dependencies (UD). It consists of 5,512 sentences (100k words). The Danish source texts and the Danish part-of-speech tags were created by the PAROLE-DK project (Keson 1998) by the Danish Society for Language and Literature.

In the DDT formalism, determiners head nouns, and auxiliaries head verbs. In order to promote content words to heads, we have applied a cascade of graph transformations that make function words (determiners, auxiliaries, conjunctions, etc) leaves in the dependency tree, instead of intermediate elements between content heads.

The part-of-speech tags and labels from the original treebank have been partially converted using mappings, and partially using the new calculated tree structure as a reference to assign labels.

The Danish Dependency Treebank was released under the GNU GPL license, hence that license can be used for UD_Danish as well. However, since GPL is more suitable for programs than for data (see https://github.com/UniversalDependencies/docs/issues/296 for a discussion), we asked for and Matthias Buch-Kromann was kind enough to grant the permission to use the Creative Commons license as an alternative.

Acknowledgments

Contributors (in order of last names)

References

Statistics of UD Danish DDT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeCaseDefiniteDegreeForeignGenderMoodNumberNumber[psor]NumTypePartTypePersonPolitePossPronTypeReflexStyleTenseVerbFormVoice

Relations

acl:relcladvcladvmodamodapposauxcaseccccompcompound:prtconjcopdepdetdiscourseexplfixedflatgoeswithiobjlistmarknmodnmod:possnsubjnummodobjoblobl:locobl:tmodparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview