UD French Spoken
Language: French (code: fr
Family: Indo-European, Romance
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Kim Gerdes, Sylvain Kahane, Chunxiao Yan, Aline Etienne, Marine Courtin.
Repository: UD_French-Spoken
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either French-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kim (æt) gerdes • fr]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | not available |
Features | not available |
Relations | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
A Universal Dependencies corpus for spoken French.
The corpus was converted automatically from the Rhapsodie treebank with manual corrections.
Xpos and features (which are not available in v2.2 of UD_French-Spoken) will be added to future versions of this treebank as they are encoded in the Rhapsodie treebank.
Statistics of UD French Spoken
POS Tags
acl – acl:relcl – advcl – advcl:cleft – advcl:periph – advmod – advmod:periph – amod – appos:conj – appos:nmod – aux – aux:caus – aux:pass – case – cc – ccomp – compound – conj:coord – conj:dicto – cop – csubj – csubj:pass – dep – dep:iobj – dep:obj – det – discourse – dislocated – expl – fixed – flat – iobj – mark – nmod – nsubj – nsubj:caus – nsubj:expl – nsubj:pass – nummod – obj – obl:comp – obl:mod – obl:periph – orphan – parataxis:discourse – parataxis:insert – parataxis:obj – parataxis:parenth – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 2786 sentences and 34972 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 75 types of words that contain both letters and punctuation. Examples: c', l', d', j', qu', n', s', m', -là, -ce, jusqu', peut-être, aujourd', dix-huit, quelqu', -vous, -il, -même, t', dix-neuvième, vingt-cinq, vingt-deux, -on, -tu, dix-huitième, dix-neuf, lorsqu', quatre-vingt-six, soixante-dix, vingt-et-unième, vingt-neuf, -d', -ils, -moi, -nous, cinquante-six, dix-sept, puisqu', quarante-cinq, quarante-huit, quarante-neuf, quelqu'un, soixante-cinq, trente-neuf, vingt-six, -toi, Royaume-Uni, Sainte-Claire, aujourd'hui, c'est-à-dire
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 10 word types tagged as particles (PART): jamais, même, n', ne, non, n~, pas, plus, que, t
- This corpus contains 58 lemmas tagged as pronouns (PRON): aucun, autre, beaucoup, c', ce, ceci, cela, celui, certain, certains, chacun, chose, combien, comment, dont, elle, en, grand, il, je, j~, le, lequel, lui, me, mien, moi, nous, on, où, personne, plupart, plusieurs, pourquoi, qu', quand, que, quel, quelqu'un, qui, quoi, qu~, rien, se, sien, soi, te, tel, toi, tout, tu, un, uns, vivre, vous, y, à+lequel, ça
- This corpus contains 24 lemmas tagged as determiners (DET): Des, aucun, ce, certain, cet, chaque, de+le, du, l', le, les, l~, mon, plusieurs, quel, quelque, son, tel, tout, toute, un, une, un~, u~
- Out of the above, 9 lemmas occurred sometimes as PRON and sometimes as DET: aucun, ce, certain, le, plusieurs, quel, tel, tout, un
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): être
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: être
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Auxiliary Verbs and Copula
- This corpus uses 3 lemmas as copulas (cop). Examples: être, voici, voilà.
- This corpus uses 15 lemmas as auxiliaries (aux). Examples: avoir, pouvoir, vouloir, être, devoir, aller, faire, falloir, changer, emmener, miser, méfier, revenir, réhabiliter, voir.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: être.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (191)
- VERB--NOUN-ADP(de+le) (5)
- VERB--PRON (2190)
- VERB--PRON-ADP(entre) (1)
- obj
- VERB--NOUN (913)
- VERB--NOUN-ADP(de) (1)
- VERB--NOUN-ADP(de+le) (3)
- VERB--PRON (535)
- iobj
- VERB--PRON (356)
Relations Overview
- This corpus uses 23 relation subtypes: acl:relcl, advcl:cleft, advcl:periph, advmod:periph, appos:conj, appos:nmod, aux:caus, aux:pass, conj:coord, conj:dicto, csubj:pass, dep:iobj, dep:obj, nsubj:caus, nsubj:expl, nsubj:pass, obl:comp, obl:mod, obl:periph, parataxis:discourse, parataxis:insert, parataxis:obj, parataxis:parenth
- The following 4 main types are not used alone, they are always subtyped: appos, conj, obl, parataxis
- The following 4 relation types are not used in this corpus at all: clf, list, goeswith, reparandum