home edit page issue tracker

This page pertains to UD version 2.

UD French FTB

Language: French (code: fr)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.0 release.

The following people have contributed to making this treebank part of UD: Marie Candito, Bruno Guillaume, Teresa Lynn, Héctor Martínez Alonso, Benoît Sagot, Djamé Seddah, Eric Villemonte de la Clergerie.

Repository: UD_French-FTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: LGPL-LR. The underlying text is not included; the user must obtain it separately and then merge with the UD annotation using a script distributed with UD

Genre: news

Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [djame • seddah (æt) paris-sorbonne • fr, marie • candito (æt) linguist • univ-paris-diderot • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS not available
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

The Universal Dependency version of the French Treebank (Abeillé et al., 2003), hereafter UD_French-FTB, is a treebank of sentences from the newspaper Le Monde, initially manually annotated with morphological information and phrase-structure and then converted to the Universal Dependencies annotation scheme.

UD_French-FTB 2.3 is an automatic conversion of the French Treebank. The French Treebank constituency trees were first converted to dependency trees following (Candito et al., 2010), then the dependency trees were converted to UD scheme using B. Guillaume’s Sequoia treebank UD conversion rules. Finally a data-driven cross-treebank annotation transfer process (Seddah et al, 2017, forthcoming) was applied.

An evaluation on a gold standard leads to 94.75% of LAS, 99.40% UAS on the test set, on par with other high quality UD treebanks such as UD_English.

Acknowledgments

contributors: Marie Candito, Bruno Guillaume, Teresa Lynn, Hector Martinez-Alonso, Benoit Sagot, Djamé Seddah, Eric Villemonte de la Clergerie

contact: Djamé Seddah: djame.seddah@paris-sorbonne.fr Marie Candito: marie.candito@linguist.univ-paris-diderot.fr

Statistics of UD French FTB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

DefiniteGenderMoodNumberNumTypePersonPolarityPossPronTypeReflexTenseVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:causaux:passcaseccccompconjcopcsubjdepdetdislocatedexplfixedflatflat:nameiobjmarknmodnsubjnsubj:causnummodobjoblorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview