home edit page issue tracker

This page pertains to UD version 2.

UD Finnish TDT

Language: Finnish (code: fi)
Family: Uralic, Finnic

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Filip Ginter, Jenna Kanerva, Veronika Laippala, Niko Miekka, Anna Missilä, Stina Ojala, Sampo Pyysalo.

Repository: UD_Finnish-TDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: news, wiki, blog, legal, fiction, grammar-examples

Questions, comments? General annotation questions (either Finnish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [figint (æt) utu • fi, jmnybl (æt) utu • fi]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually, natively in UD style
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually in non-UD style, automatically converted to UD
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually, natively in UD style

Description

UD_Finnish-TDT is based on the Turku Dependency Treebank (TDT), a broad-coverage dependency treebank of general Finnish covering numerous genres. The conversion to UD was followed by extensive manual checks and corrections, and the treebank closely adheres to the UD guidelines.

The treebank contains texts from Wikipedia articles, Wikinews articles, University online news, Blog entries, Student magazine articles, Grammar examples, Europarl speeches, JRC-Acquis legislation, Financial news, and Fiction sourced from 674 individual documents. The original annotation of the treebank was in Stanford Dependencies, including secondary dependencies, and fully manually checked morphological annotation. The treebank is also accompanied by a PropBank annotation (http://turkunlp.github.io/Finnish_PropBank/) and a dependency parser pipeline substantially outperforming the baseline UDPipe model (http://turkunlp.github.io/Finnish-dep-parser/).

Acknowledgments

The team behind the Turku Dependency Treebank: Katri Haverinen, Jenna Kanerva (Nyblom), Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, Filip Ginter.

We are grateful for the funding received from:

We thank all the authors who kindly allowed us to include their texts into the treebank, either by explicit permission, or by releasing their text under an open license in the first place.

Statistics of UD Finnish TDT

POS Tags

ADJADPADVAUXCCONJINTJNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrAdpTypeCaseCliticConnegativeDegreeDerivationForeignInfFormMoodNumberNumber[psor]NumTypePartFormPersonPerson[psor]PolarityPronTypeReflexStyleTenseTypoVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcasecccc:preconjccompcompoundcompound:nncompound:prtconjcopcop:owncsubjcsubj:copdepdetdiscoursefixedflatflat:foreignflat:namegoeswithmarknmodnmod:gobjnmod:gsubjnmod:possnsubjnsubj:copnummodobjoblorphanparataxispunctrootvocativexcompxcomp:ds

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview