home edit page issue tracker

This page pertains to UD version 2.

UD Czech FicTree

Language: Czech (code: cs)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v2.1 release.

The following people have contributed to making this treebank part of UD: Tomáš Jelínek, Daniel Zeman.

Repository: UD_Czech-FicTree
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-NC-SA 4.0

Genre: fiction

Questions, comments? General annotation questions (either Czech-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [tomas • jelinek (æt) ff • cuni • cz, zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

FicTree is a treebank of Czech fiction, automatically converted into the UD format. The treebank was built at Charles University in Prague.

The treebank consists of 12,760 sentences (166,432 tokens). The texts come from eight literary works published in the Czech Republic between 1991 and 2007. The text data was manually annotated according to the Prague Dependency Treebank guidelines, then converted into the UD format. To comply with agreements concluded with the copyright holders, the texts are shuffled into random chunks of maximum 100 words. The treebank is licensed under the terms of [CC BY-NC-SA 3.0] (http://creativecommons.org/licenses/by-nc-sa/3.0/).

Acknowledgments

We wish to thank the participants in the annotation effort, including Milena Hnátková, Tomáš Jelínek, Ivana Klímová, Alena Kropíková, Hana Skoumalová and Olga Zitová; as well as Dan Zeman for the data conversion.

References

Statistics of UD Czech FicTree

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AbbrAdpTypeAnimacyAspectCaseConjTypeDegreeGenderGender[psor]HyphMoodNameTypeNumberNumber[psor]NumFormNumTypeNumValuePersonPolarityPossPrepCasePronTypeReflexStyleTenseVariantVerbFormVoice

Relations

acladvcladvmodadvmod:emphamodapposauxaux:passcaseccccompcompoundconjcopcsubjcsubj:passdepdetdet:numgovdet:nummoddiscourseexpl:passexpl:pvfixedflatiobjmarknmodnsubjnsubj:passnummodnummod:govobjoblobl:agentobl:argorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview