UD Old French SRCMF
Language: Old French (code: fro
)
Family: Indo-European, Romance
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Sophie Prévost, Aurélie Collomb, Kim Gerdes, Isabelle Tellier, Marine Courtin, Alexei Lavrentiev, Céline Guillot-Barbance.
Repository: UD_Old_French-SRCMF
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-NC-SA 3.0
Genre: nonfiction, legal, poetry
Questions, comments?
General annotation questions (either Old French-specific or cross-linguistic) can be raised in the main UD issue tracker.
You can report bugs in this treebank in the treebank-specific issue tracker on Github.
If you want to collaborate, please contact [sophie • prevost (æt) ens • fr].
Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files.
You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release
and locate the source files there.
Contact the treebank maintainers if in doubt.
Annotation | Source |
---|---|
Lemmas | not available |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | annotated manually |
Features | assigned by a program, not checked manually |
Relations | assigned by a program, with some manual corrections, but not a full manual verification |
Description
UD_Old_French-SRCMF is a conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org).
UD_Old_French-SRCMF consists in 10 texts spanning from 9th to 13th century. It includes 17678 sentences and 170 741 tokens.
Sentences are annotated with the following metadata :
- sent_id : a unique id for each sentence in the treebank
- text : the sentence
- newdoc id : a unique id for each of the texts. This id can be split on underscores to get back :
- name of the text
- date
- form : verse and/or prose
The following table lists the texts used in this treebank :
ID | Author | Name of the text | Number of tokens |
---|---|---|---|
Strasbourg_842_prose | anonymous | Serments de Strasbourg | 115 |
StLegier_1000_verse | anonymous | Vie de saint Léger | 1,388 |
StAlexis_1050_verse | anonymous | Vie de saint Alexis | 4,750 |
Roland_1100_verse | anonymous | Chanson de Roland | 28,752 |
Lapidaire_mid12_prose | anonymous | Lapidaire en prose | 4,708 |
QuatreLivresReis_late12_prose | anonymous | Quatre livres des reis | 12,949 |
BeroulTristan_late12_verse | Beroul, Tristan | Tristan de Beroul | 26,766 |
TroyesYvain_1180_verse | Chrestien de Troyes, Yvain | Yvain de Chretien de Troyes | 41,256 |
Aucassin_early13_verse-prose | anonymous | Aucassin et Nicolet | 9,838 |
Graal_1225_prose | anonymous | Queste del Saint Graal | 40,219 |
Acknowledgments
UD_Old_French-SRCMF results from the conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French srcmf.org).
This conversion was achieved by Aurélie Collomb, in the frame of a internship funded by lab Lattice (Paris, CNRS, ENS & Université Sorbonne Nouvelle Paris 3, PSL & USPC), and supervised by Sophie Prévost, Isabelle Tellier and Kim Gerdes. Marine Courtin achieved the deposit of the files, and especially took in charge the validation of the corpus through the successive steps of the process.
The SRCMF corpus results from the SRCMF project which took place in 2008-2012, funded by the ANR (France) and the DFG (Germany), and supervised by Sophie Prévost and Achim Stein.
The SRCMF project consisted in the manual syntactic annotation of 15 texts (251,000 tokens) from the 9th to 13th C. Part-of-speech tags were for most of them retrieved from the already existing tagging of the texts (stemming from: Base de Français Medieval, Lyon, ENS de Lyon, IHRIM Laboratory http://txm.bfm-corpus.org, and the Nouveau Corpus d’Amsterdam http://www.uni-stuttgart.de/lingrom/stein/corpus#nca)
The contributors to the SRCMF project were: Stein, Achim; Prévost, Sophie; Rainsford, Tom; Mazziotta, Nicolas; Bischoff Béatrice; Glikman, Julie; Lavrentiev, Alexei; Heiden, Serge; Guillot-Barbance, Céline; Marchello-Nizia, Christiane.
The conversion from the original SRCMF annotation to the SRCMF-UD annotation was done automatically both for the POS and the syntactic relations, thanks to a set of elaborated rules. Some 1,200 syntactic relations left unlabelled were then manually annotated (Sophie Prévost), and significant spot-checking occurred, focusing on potential difficulties (eg. conj relation).
The whole SRCMF corpus (251,000 tokens) was actually automatically converted into UD dependencies, but only 172,000 tokens have so far undergone a significant checking: the remaining 80,000 tokens will be added to UD_Old_French-SRCMF for the next release.
References
- Stein, A. et Prévost, S. 2013. Syntactic annotation of medieval texts : the Syntactic Reference Corpus of Medieval French (SRCMF). In P. Bennett, M. Durrell, S. Scheible and R. Whitt (éds) New Methods in Historical Corpus Linguistics, Corpus Linguistics and International Perspectives on Language, CLIP Vol. 3. Tübingen: Narr., p. 75-82. [halshs-01122079]
Statistics of UD Old French SRCMF
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – PRON – PROPN – SCONJ – VERB
Features
Definite – Morph – NumType – Polarity – Poss – PronType – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – advmod:obl – amod – appos – aux – aux:pass – case – case:det – cc – cc:nc – ccomp – compound – conj – cop – csubj – det – discourse – dislocated – expl – fixed – flat – iobj – mark – mark:advmod – mark:obj – mark:obl – nmod – nsubj – nsubj:advmod – nsubj:obj – nummod – obj – obj:advmod – obj:advneg – obj:obl – obl – obl:advmod – parataxis – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 17678 sentences and 170741 tokens.
- All tokens in this corpus are followed by a space.
- This corpus contains 1 types of words with spaces. Examples: ambe .ii.
- This corpus contains 139 types of words that contain both letters and punctuation. Examples: l', qu', s', n', d', m', .i., t', c', j', jusqu', l'en, .ii., .iii., q', .iiii., g', .xii., entr', .xx., .vii., .c., ensembl', quanqu', un', ·l, .xxx., .v., tresqu', .x., entresqu', .vi., .xv., .xxiiii., .ix., josqu', .IIII.C., an.ii., cest', qui', ·s, Ço', ç', .XL., .l., .viii., jesqu', ·il, .VII.C., .lx.
Morphology
Tags
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, PRON, PROPN, SCONJ, VERB
- This corpus does not use the following tags: NUM, PART, SYM, PUNCT, X
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- There are 3 (de)verbal forms:
- Fin
- AUX: fu, est, soit, ert, furent, iert, estoit, fust, seit, sera
- VERB: est, a, ad, dist, fu, ot, fet, avoit, ai, estoit
- Inf
- AUX: estre, estr', iestre
- VERB: estre, dire, venir, avoir, aler, parler, faire, veoir, fere, feire
- Part
- VERB: fait, dit, mort, mis, fet, esté, venuz, pris, morz, ocis
Nominal Features
- Def
- ADP: au, des, del, el, as, al, dou, du, ou, es
- DET: la, li, le, l', les, lo, lu, lé, lis, las
- Ind
- DET: un, une, .i., uns, un', unes, I, u·, úne, U
Degree and Polarity
- Int
- ADV: enne, en, ene
- Neg
- ADV: ne, n', mie, pas, non, point, nen, nun, nes, nient
- PRON: nel, nes, nu, nen, nem, net, nul
Verbal Features
- Past
- VERB-Part: fait, dit, mort, mis, fet, esté, venuz, pris, morz, ocis
- Pres
- VERB-Part: querant, curant, plorant, recreant, parlant, recreanz, trenchant, veant, curanz, dolans
Pronouns, Determiners, Quantifiers
- Art
- ADP: au, des, del, el, as, al, dou, du, ou, es
- DET: la, li, le, l', les, un, une, .i., uns, un'
- Dem
- ADP: an
- ADV: en, i, an, í, em, u, o, ent, n, ·n
- DET: ceste, cest, cele, cel, ces, cil, cez, cist, ce, icest
- PRON: ce, cil, ço, çó, celui, cele, cels, c', ces, ceo
- Ind
- ADJ: autre, meïsmes, tel, altre, nule, meïsme, autres, tex, altres, tiex
- DET: tel, toz, nule, tuit, tote, nul, autre, tot, tuz, toutes
- PRON: autre, tuit, rien, nus, uns, l'en, en, un, autres, hom
- Int
- DET: quel, qel, quele, quels, Qanz, itels
- PRON: que, qui, coi, ou, qu', quoi, quei, ki, liquels, q'
- Ord
- ADJ: premier, tierce, cinquieme, premiere, premiers, tierz, disme, premeraine, premierz, prime
- Prs
- PRON: il, vos, li, le, l', je, s', se, ele, me
- SCONJ: S'
- Prs,Rel
- PRON: qui, que, ki, qu', ou, cui, quoi, dunt, u, don
- Rel
- DET: quel, quele, quelque, quiex, qel, quels, qual, quex, quanz, ques
- Card
- ADJ: .ii., .iii., dui, troi, deus, premer, .vii., dous, premereins, .iiii.
- DET: dous, cent, .ii., milie, trois, .xii., deus, mil, set, .iiii.
- PRON: milie, trois, dui, .ii., andui, deus, un, troi, uns, dous
- Ord
- DET: tierz, premiere, tierce
- PRON: tierz, quarte, terce, disme, quarz, sedme, noefme, premere, quinte, siste
- Yes
- ADJ: mien, vostre, suen, sue, men, nostre, soe, meie, moie, miens
- DET: sa, son, ses, sun, vostre, lor, ma, nostre, mon, mes
- PRON: suen, mien, noz, suens, vostre, soe, lor, lur, nostre, moie
Other Features
- Morph
- VFin
- ADJ: asuage
- ADP: a, ad
- ADV: oi
- CCONJ: Et
- INTJ: Os
- NOUN: acorde, aiüe, alge, chastie, curt, dreit, duinst, esrages, estencele, façon
- PROPN: cuntredie
- VERB: a
- VInf
- ADJ: droiturier, ácustumiers
- NOUN: deçoivre, Fuïr, clergier, curre, enconbrier, espleiter, parler, pleisir
- VPar
- ADJ: dolenz, dolent, avenanz, vaillant, vaillanz, confés, dolanz, flurie, joiant, avenant
- ADP: voiant, oiant
- ADV: errant
- NOUN: semblant, senblant, mort, sanblant, descovert, dit, fait, remanant, ajustee, anchanté
- PROPN: Flurit, Perdut, Sevree
- VFin
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
- This corpus uses 1 lemmas as passive auxiliaries (aux:pass). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--PRON (1)
- VERB-Fin--NOUN (2762)
- VERB-Fin--NOUN-ADP(_) (2)
- VERB-Fin--PRON (6934)
- VERB-Fin--PRON-ADP(_) (3)
- VERB-Inf--NOUN (146)
- VERB-Inf--PRON (763)
- VERB-Part--NOUN (867)
- VERB-Part--PRON (1519)
- obj
- VERB--NOUN (2)
- VERB-Fin--NOUN (4700)
- VERB-Fin--NOUN-ADP(_) (70)
- VERB-Fin--PRON (4563)
- VERB-Fin--PRON-ADP(_) (8)
- VERB-Inf--NOUN (879)
- VERB-Inf--NOUN-ADP(_) (10)
- VERB-Inf--PRON (849)
- VERB-Inf--PRON-ADP(_) (2)
- VERB-Part--NOUN (754)
- VERB-Part--NOUN-ADP(_) (6)
- VERB-Part--PRON (906)
- iobj
- VERB-Fin--PRON (2209)
- VERB-Fin--PRON-ADP(_) (328)
- VERB-Fin--PRON-ADP(_)-ADP(_) (5)
- VERB-Inf--PRON (199)
- VERB-Inf--PRON-ADP(_) (59)
- VERB-Part--PRON (409)
- VERB-Part--PRON-ADP(_) (70)
- VERB-Part--PRON-ADP(_)-ADP(_) (3)
Relations Overview
- This corpus uses 14 relation subtypes: acl:relcl, advmod:obl, aux:pass, case:det, cc:nc, mark:advmod, mark:obj, mark:obl, nsubj:advmod, nsubj:obj, obj:advmod, obj:advneg, obj:obl, obl:advmod
- The following 7 relation types are not used in this corpus at all: clf, list, orphan, goeswith, reparandum, punct, dep