home edit page issue tracker

This page pertains to UD version 2.

UD French GSD

Language: French (code: fr)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Marie-Catherine de Marneffe, Bruno Guillaume, Ryan McDonald, Alane Suhr, Joakim Nivre, Matias Grioni, Carly Dickerson, Guy Perrier.

Repository: UD_French-GSD
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-NC-SA 3.0 US

Genre: blog, news, reviews, wiki

Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [demarneffe • 1 (æt) osu • edu, bruno • guillaume (æt) inria • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS not available
Features assigned by a program, with some manual corrections, but not a full manual verification
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

The French UD was converted in 2015 from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). It is updated since 2015 independently from the previous source.

The French UD is converted from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). The README for the original project is available below.

The version 2.2 of French data consists of 402,426 words (16,448 sentences). No sentence id were available in the original resource, so new sent_id were automatically introduced in the converted corpus with prefixes fr-ud-train, fr-ud-dev and fr-ud-test on the correponding original files, followed by a 5 digit number following the order of sentences.

:warning: to meet the size requirements of test data of 10K words, a part of the dev original file was moved to the test file! Since version 2.0, the splitting of data is:

Sentences are shuffled and there is no way to know what is the genre of a given sentence.

Probably due to some bug in a conversion program, version 1.2 contains many truncated sentences (date missing for instance). Almost every truncated sentence is from Wikipedia, so it was possible to recover the original text. Most of the truncated sentences were completed in version 1.3. Some sentences were completed later. There are probably still some truncated sentences.

Acknowledgments

The latest version of the corpus was produced by Marie-Catherine de Marneffe, Bruno Guillaume, Matias Grioni, Carly Dickerson and Guy Perrier. Automatic modifications and consistency checking were partly done using the Grew software.

See below for references and acknowledgments concerning the original corpus.

Statistics of UD French GSD

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

CaseDefiniteDegreeGenderMoodNumberNumTypePersonPolarityPossPronTypeReflexTenseVerbForm

Relations

aclacl:relcladvcladvcl:cleftadvmodamodapposauxaux:causaux:passcaseccccompcompoundconjcopcsubjcsubj:passdepdetdiscoursedislocatedexplexpl:passfixedflatflat:foreignflat:namegoeswithiobjiobj:agentmarknmodnsubjnsubj:causnsubj:passnummodobjobj:agentobj:lvcoblobl:agentobl:argobl:modorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview