UD for French

Tokenization and Word Segmentation

In general, words are delimited by whitespace characters and punctuations are considered as separated words.
Only numbers can contain spaces (following [0-9 ,]+ regexp).
There are several closed classes of contractions that are treated as multi-word tokens and segmented to individual syntactic words. For instance, au -> à + le, auquel -> de + lequel. Note that du and des are ambiguous and can be split or not depending of their usage.

For more details, see tokenization.

Morphology

Features

TODO (see French features).

Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.

Syntax

TODO (see French relations).

Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.

Treebanks