UD French Spoken

Language: French (code: fr)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.2 release.

The following people have contributed to making this treebank part of UD: Kim Gerdes, Sylvain Kahane, Chunxiao Yan, Aline Etienne, Marine Courtin.

Repository: UD_French-Spoken
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kim (æt) gerdes • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation	Source
Lemmas	annotated manually in non-UD style, automatically converted to UD
UPOS	annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS	not available
Features	not available
Relations	annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

A Universal Dependencies corpus for spoken French.

The corpus was converted automatically from the Rhapsodie treebank with manual corrections.

Xpos and features (which are not available in v2.2 of UD_French-Spoken) will be added to future versions of this treebank as they are encoded in the Rhapsodie treebank.

Acknowledgments

Statistics of UD French Spoken

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

Relations

acl – acl:relcl – advcl – advcl:cleft – advcl:periph – advmod – advmod:periph – amod – appos:conj – appos:nmod – aux – aux:caus – aux:pass – case – cc – ccomp – compound – conj:coord – conj:dicto – cop – csubj – csubj:pass – dep – dep:iobj – dep:obj – det – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nsubj – nsubj:caus – nsubj:expl – nsubj:pass – nummod – obj – obl:comp – obl:mod – obl:periph – orphan – parataxis:discourse – parataxis:insert – parataxis:obj – parataxis:parenth – punct – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 2786 sentences and 34972 tokens.

All tokens in this corpus are followed by a space.

This corpus does not contain words with spaces.

This corpus contains 75 types of words that contain both letters and punctuation. Examples: c', l', d', j', qu', n', s', m', -là, -ce, jusqu', peut-être, aujourd', dix-huit, quelqu', -vous, -il, -même, t', dix-neuvième, vingt-cinq, vingt-deux, -on, -tu, dix-huitième, dix-neuf, lorsqu', quatre-vingt-six, soixante-dix, vingt-et-unième, vingt-neuf, -d', -ils, -moi, -nous, cinquante-six, dix-sept, puisqu', quarante-cinq, quarante-huit, quarante-neuf, quelqu'un, soixante-cinq, trente-neuf, vingt-six, -toi, Royaume-Uni, Sainte-Claire, aujourd'hui, c'est-à-dire

Morphology

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

This corpus uses 3 lemmas as copulas (cop). Examples: être, voici, voilà.

This corpus uses 15 lemmas as auxiliaries (aux). Examples: avoir, pouvoir, vouloir, être, devoir, aller, faire, falloir, changer, emmener, miser, méfier, revenir, réhabiliter, voir.
This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: être.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (191)
- VERB--NOUN-ADP(de+le) (5)
- VERB--PRON (2190)
- VERB--PRON-ADP(entre) (1)

obj
- VERB--NOUN (913)
- VERB--NOUN-ADP(de) (1)
- VERB--NOUN-ADP(de+le) (3)
- VERB--PRON (535)

iobj
- VERB--PRON (356)

Relations Overview

This corpus uses 23 relation subtypes: acl:relcl, advcl:cleft, advcl:periph, advmod:periph, appos:conj, appos:nmod, aux:caus, aux:pass, conj:coord, conj:dicto, csubj:pass, dep:iobj, dep:obj, nsubj:caus, nsubj:expl, nsubj:pass, obl:comp, obl:mod, obl:periph, parataxis:discourse, parataxis:insert, parataxis:obj, parataxis:parenth
The following 4 main types are not used alone, they are always subtyped: appos, conj, obl, parataxis
The following 4 relation types are not used in this corpus at all: clf, list, goeswith, reparandum