UD Kurmanji MG
Language: Kurmanji (code: kmr
)
Family: Indo-European, Iranian
This treebank has been part of Universal Dependencies since the UD v2.1 release.
The following people have contributed to making this treebank part of UD: Memduh Gökırmak, Francis Tyers.
Repository: UD_Kurmanji-MG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 4.0
Genre: fiction, wiki
Questions, comments? General annotation questions (either Kurmanji-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [memduhg (æt) gmail • com ftyers (æt) hse • ru]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually, natively in UD style |
Description
The UD Kurmanji corpus is a corpus of Kurmanji Kurdish. It contains fiction and encyclopaedic texts in roughly equal measure. It has been annotated natively in accordance with the UD annotation scheme.
UD Kurmanji is a Kurmanji (Northern Kurdish) treebank developed with the UD framework. It consists of a Sherlock Holmes story, “The Adventure of the Speckled Band”, translated in 1944 by Celadet Bedirxan in the magazine Ronahi, and sentences from the Kurmanji Wikipedia.
Acknowledgments
If you use this treebank in your work, please cite:
@inproceedings{gokirmak:2017, author = {Memduh Gökırmak and Francis M. Tyers}, title = {A Dependency Treebank for Kurmanji Kurdish}, booktitle = {Proceedings of the Fourth International Conference on Dependency Linguistics (DepLing, 2017)}, pages = {64–73}, year = 2017 }
Statistics of UD Kurmanji MG
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
AdpType – Aspect – Case – Definite – Degree – Evident – Gender – Mood – Number – NumType – Person – Polarity – PronType – Reflex – Tense – VerbForm
Relations
acl – advcl – advmod – advmod:neg – amod – appos – aux – case – case:circ – cc – ccomp – compound – compound:lvc – compound:nn – compound:redup – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – mark – nmod – nmod:dat – nmod:poss – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 754 sentences, 10188 tokens and 10260 syntactic words.
- This corpus contains 1293 tokens (13%) that are not followed by a space.
- This corpus contains 73 types of words with spaces. Examples: bû bû, da bû, dîti bû, goti bû, keti bû, anî bû, bihîsti bû, bû bûn, gihaşti bûn, kiri bûn, miri bû, nas kir, nikarî bû, vêxisti bû, wêran dikin, xuya dikir, xuya kir, Diviya bû, Stok moranê, anî bûn, ava dibe, ava dike, ava kiribûn, ava kirin, avêti bû, belav kir, berda bûn, bicih dike, ceza dikir, da bûye, da bûyê, dagir kirin, dagir kiriye, danî bû, dest pê dike, dest pê kir, dikarî bû, dûr bixe, gerandi bû, gihandi bû, gihaşti bû, girti bû, girtî bû, girtî bûn, hati bû, hati bûm, hişti bû, kar kir, kiri bû, kom kirin
- This corpus contains 60 types of words that contain both letters and punctuation. Examples: Dr., 15'ê, 1932'an, 1991'ê, 4'ê, 12'ê, 14'ê, 1500'an, 1534'an, 1603'an, 1604'an, 17'emîn, 1788'an, 1821'ê, 1825'an, 1829'an, 1883'an, 1915'î, 1929'an, 1940'ê, 1944'î, 1949'î, 1951'î, 1954'î, 1960'î, 1961'î, 1970’î, 1975'an, 1977'an, 1978'an, 1980'an, 1980'î, 1983'an, 1986'an, 1990'î, 1997'an, 1998'an, 20'an, 2001-ê, 2003'î, 2005'ê, 2008'a, 2010'ê, 2012'a, 2015'an, 27'ê, 28'ê, 3'an, 30'ê, 6'emîn
- This corpus contains 72 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 11 types of multi-word tokens. Examples: jê, pê, lê, tê, emê, ezê, Honê, Tiwê, em, le, lev.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 7 word types tagged as particles (PART): _, de, dê, ji, jî, ma, ê
- This corpus contains 18 lemmas tagged as pronouns (PRON): em, ev, ew, ez, gelek, heryek, herçî, hev, hevdû, hevûdin, hûn, ku, kî, tu, tukesî, xwe, yekî, çi
- This corpus contains 19 lemmas tagged as determiners (DET): det, ev, ew, ewqas, gelek, heman, hemî, hemû, her, herdu, herçî, hin, hindek, pir, tu, yekî, çend, çi, çiqas
- Out of the above, 7 lemmas occurred sometimes as PRON and sometimes as DET: ev, ew, gelek, herçî, tu, yekî, çi
- This corpus contains 2 lemmas tagged as auxiliaries (AUX): bûn, hebûn
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: bûn
- There are 3 (de)verbal forms:
- Fin
- AUX: e, ye, bû, bûn, dibe, hebû, bûye, heye, hene, in
- VERB: tê, da, kir, hatiye, gote, hat, dike, hate, got, hatine
- Inf
- VERB: kirin, bikaranîn, çêkirin, gotin, avakirin, dîtin, girêdan, weşandin, zanîn, amadekirin
- Part
- VERB: mirî, qewimî, bû, dagirtî, daliqandî, diyarkirî, dîti, girtî, keti, parastî
Nominal Features
- Fem
- ADP: ya, a, Stêrên
- DET: vê, wê, _
- NOUN: sala, oda, xweha, kurdî, banû, caran, dema, odê, qesirê, salên
- PRON: wê, _, vê, we, ve
- PROPN: Stonêrê, Tirkiyeyê, Cûlyayê, Ewropayê, Hindistanê, Amedê, Stenbolê, Stokmoranê, roma, Amerîkaya
- Fem,Masc
- ADP: yên, ên, en
- DET: her, ev, hin, çend, hinek, ew, van, hemî, wan, hemû
- PRON: xwe, min, me, em, ew, ez, te, wan, _, ev
- PROPN: Hegel
- Masc
- ADP: yê, ê, ye
- DET: vî, wî
- NOUN: zirbavê, gund, navê, serê, zimanê, nav, derî, mirov, dengê, dest
- PRON: wî, _, ewî, Gelek, gelekê, wi
- PROPN: Holmes, Rweylot, Stonêrê, Stonêr, Şerlok, Wetsin, Feqiyê, Keya, Mistentiq, Îsa
- Plur
- ADP: yên, ên, Stêrên, en
- AUX-Fin: bûn, hene, in, ne, bûne, bibin, dibin, hebûn, bû bûn, hebûne
- DET: gelek, van, hemî, wan, hemû, gellek, pir
- NOUN: kurdan, caran, salên, salan, çavên, helbestên, navên, berhemên, destên, nivîsên
- NUM: du, sê, pênc, hezar, sed, çar, 1932'an, hezaran, sedan, 14an
- PRON: me, em, wan, emê, we, ew, Honê, hûn, Gelek, gelekê
- PROPN: Badînan, Botan
- VERB-Fin: hatine, bikin, dikin, tên, herin, kirin, derketin, didin, digirin, dihatin
- Plur,Sing
- DET: ev, hin, çend, hinek, ew, hindik, hinekî
- PRON: xwe, ev, hev, ew, kû, _, ku, xwê, çi
- Sing
- ADP: ya, a, yê, ê, ye
- AUX-Fin: e, ye, bû, dibe, hebû, bûye, heye, bû bû, im, dibû
- DET: vê, her, wê, vî, wî, Herçî, _, ewqas, herdu, yekî
- NOUN: sala, oda, xweha, navê, zirbavê, gund, serê, zimanê, derî, kurdî
- NUM: yek, yekê
- PRON: min, wî, _, ez, wê, ew, te, tu, ezê, tukesî
- PROPN: Holmes, Rweylot, Stonêrê, Stonêr, Tirkiyeyê, Şerlok, Cûlyayê, Ewropayê, Hindistanê, Wetsin
- VERB: tê, da, kir, hatiye, gote, hat, dike, hate, got, kiriye
- VERB-Fin: tê, da, kir, hatiye, gote, hat, dike, hate, got, kiriye
- Con
- ADJ: saliya, derûniyê, germa, kelha, kurê, orjinalê, saliyê, tevahiya, tirşê
- NOUN: sala, oda, xweha, zirbavê, navê, serê, zimanê, dema, dengê, aliyê
- PROPN: Stonêrê, Amerîkaya, Feqiyê, Evdirehmanê, Kurdistana, Badînan, Cizîra, Efrîqaya, Emerîkaya, Gulçîna
- X: mûrên
- Nom
- DET: ev, ew
- NOUN: kurdî, banû, mirov, nav, tişt, gor, gorî, hûr, mar, seh
- NUM: du, yek, sê, pênc, çar, deh, dido, dwanzde, dû, penc
- PRON: em, ew, ez, ev, tu, emê, ezê, Honê, hûn, Eve
- PROPN: Holmes, Rweylot, Stonêr, roma, Şerlok, Botan, Keya, Teyran, Abdusamet, Elî
- Obl
- DET: vê, wê, vî, wî, van, wan, _
- NOUN: gund, kurdan, caran, derî, odê, qesirê, dest, cih, salan, demê
- NUM: yekê, 1932'an, 14an, 1500'an, 1534'an, 1603'an, 1604'an, 1788'an, 1825'an, 1829'an
- PRON: min, me, wî, _, wê, te, wan, we, vê, ewî
- PROPN: Holmes, Stonêrê, Rweylot, Tirkiyeyê, Cûlyayê, Ewropayê, Hindistanê, Amedê, Stenbolê, Stokmoranê
- Voc
- NOUN: zanyariyên
- Def
- ADP: Stêrên
- NOUN: sala, oda, xweha, zirbavê, gund, navê, serê, zimanê, kurdan, derî
- NUM: du, yek, yekê, sê, pênc, çar, deh, dido, dwanzde, dû
- PROPN: Tirkiyeyê, Ewropayê, Hindistanê, Amedê, Stenbolê, Stokmoranê, roma, Amerîkaya, Botan, Germanistanê
- X: mûrên
Degree and Polarity
- Cmp
- ADJ: zêdetir, bêtir, xeternaktir, çêtir, bilindtir, dewlementir, girîngtir, meztir
- Pos
- ADJ: mezin, nû, bilind, aciz, ecêb, belek, dirêj, kûr, navîn, reş
- Sup
- ADJ: Zêdetirîn, aktîftirîn, dijwartirîn
- Neg
- ADV: ne
- AUX-Fin: nîne, tune
- DET: tu, ti
- VERB-Fin: meke, nayê, nehatiye, nikarî bû, nizanî, xuya dikir, ceza dikir, dernexist, maye, nade
Verbal Features
- Perf
- VERB-Fin: hate, hatine, hatiye, nehatiye, da bûye, nehatime, nehatine
- Prog
- AUX-Fin: dibû, dibûm
- VERB-Fin: dida, dikir, dihate, dadiket, dihatin, digeriya, digirt, rûdinişt, digeriyan, digot
- Imp
- AUX-Fin: nebe
- VERB-Fin: meke, bikin, Rahêje, berdin, bike, bikelînin, binêre, ke, rûne, werbigirin
- Ind
- AUX-Fin: e, ye, bû, bûn, dibe, hebû, bûye, heye, hene, in
- VERB-Fin: tê, da, kir, hatiye, gote, hat, dike, hate, got, hatine
- Opt
- VERB-Fin: bikira, bidîta, bigota, bihata, bistandana
- Sub
- AUX-Fin: be, hebe, bibe, bibim, bibin, bibî, bin, bit, nebe, nebit
- VERB-Fin: bike, bikin, bidî, bêje, herin, bibêje, herî, berde, bibînin, bidin
- Fut
- AUX-Fin: bibin, hebit
- PART: ê, _, dê, de
- VERB-Fin: bibihîsin, biborînin, bibîne, bibînim, bibînin, bidin, bikarim, bikarin, bikî, bixebitin
- Past
- AUX-Fin: bû, bûn, hebû, bûye, bûne, dibû, hebûn, bûm, bûya, biwa
- VERB-Fin: da, kir, hatiye, gote, hat, hate, got, hatine, kiriye, girt
- VERB-Part: mirî, qewimî, bû, dagirtî, daliqandî, diyarkirî, dîti, girtî, keti, parastî
- Pqp
- AUX-Fin: bû bû, bû bûn
- VERB-Fin: da bû, dîti bû, goti bû, keti bû, anî bû, bihîsti bû, gihaşti bûn, hatibû, kiri bûn, miri bû
- Pres
- AUX-Fin: e, ye, dibe, heye, hene, in, ne, im, be, dibin
- VERB-Fin: tê, dike, divêt, dide, bike, bikin, dikin, tên, bidî, bêje
- Nfh
- AUX-Fin: bûye, bûne, bûya, biwa, hebûne
- VERB-Fin: gote, hatiye, kiriye, dihate, maye, daye, digeriya, girtiye, avête, dane
Pronouns, Determiners, Quantifiers
- Dem
- DET: vê, ev, wê, ew, vî, van, wî, wan, _
- NOUN: demê, derê, navê, şanî, armancê, awayî, babetê, beşdarî, caran, gundî
- PRON: _, wan, ev, vê, ve
- Ind
- DET: her, hin, gelek, çend, tu, hinek, heman, hemî, hemû, ti
- NOUN: dengekî, tiştek, tiştekî, carekê, kesekî, malbateke, meymûnek, mirovekî, nişkekê, odeke
- PRON: tukesî, yekî, Gelek, Herçî, Heçî, gelekê, heryekê, hevdu, ve, vê
- Int
- ADV: ka
- DET: çi, çiqas
- PRON: çi, Kî
- Prs
- DET: wî
- PRON: xwe, min, me, wî, em, ew, ez, wê, te, tu
- Rcp
- PRON: hevûdin
- Rel
- ADV: çi
- PRON: kû, ku
- Card
- NUM: du, yek, yekê, 4, siseyan, yekem, sê, 1, 10, 15'ê
- Yes
- PRON: xwe, hev, _, xwê
- 1
- AUX-Fin: bûn, im, me, bibim, bibin, hene, bûm
- PRON: min, me, em, ez, emê, ezê
- VERB-Fin: herin, dizanin, ketin, rabûm, bidin, derketin, dibêjim, dibînim, dixwazim, dîtin
- 2
- AUX-Fin: bibî, nebe, bibin, yî, î
- PRON: te, tu, we, Honê, hûn, Tiwê, hon
- VERB-Fin: bidî, bikin, herî, meke, bikî, dixwazî, kirî, nizanî, Rahêje, anî
- 3
- AUX-Fin: e, ye, bû, dibe, hebû, bûye, heye, bûn, hene, in
- DET: wî
- PRON: wî, ew, wê, _, we, ewî, Ev, Eve, wan, wi
- VERB: tê, da, kir, hatiye, gote, hat, dike, hate, got, hatine
- VERB-Fin: tê, da, kir, hatiye, gote, hat, dike, hate, got, hatine
Other Features
- AdpType
- Post
- ADP: de, re, ve, ya, a, da, yên, yê, ê, ên
- PRON: gelekê
- Prep
- ADP: di, bi, ji, li, ser, bo, ber, piştî, jê, pê
- Post
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: bûn.
- This corpus uses 4 lemmas as auxiliaries (aux). Examples: hatin, dê, dan, bûn.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN-Nom (1)
- VERB-Fin--NOUN (9)
- VERB-Fin--NOUN-Con (85)
- VERB-Fin--NOUN-Con-ADP(yê) (3)
- VERB-Fin--NOUN-Con-ADP(yê)-ADP(yê) (1)
- VERB-Fin--NOUN-Nom (72)
- VERB-Fin--NOUN-Nom-ADP(bi) (1)
- VERB-Fin--NOUN-Obl (27)
- VERB-Fin--PRON (7)
- VERB-Fin--PRON-Nom (107)
- VERB-Fin--PRON-Obl (91)
- VERB-Fin--PRON-Obl-ADP(tenê) (1)
- VERB-Inf--NOUN-Con (34)
- VERB-Inf--NOUN-Con-ADP(yê) (3)
- VERB-Inf--NOUN-Nom (28)
- VERB-Inf--NOUN-Obl (4)
- VERB-Inf--PRON-Nom (5)
- VERB-Part--NOUN-Con (1)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Nom (1)
- obj
- VERB-Fin--NOUN-Con (131)
- VERB-Fin--NOUN-Con-ADP(yê) (6)
- VERB-Fin--NOUN-Nom (63)
- VERB-Fin--NOUN-Nom-ADP(ve) (1)
- VERB-Fin--NOUN-Obl (58)
- VERB-Fin--NOUN-Voc-ADP(yê) (1)
- VERB-Fin--PRON (26)
- VERB-Fin--PRON-Nom (24)
- VERB-Fin--PRON-Obl (9)
- VERB-Inf--NOUN-Con (2)
- VERB-Inf--NOUN-Nom (2)
- VERB-Part--NOUN-Nom (1)
- VERB-Part--PRON-Nom (1)
Verbs with Reflexive Core Objects
- This corpus contains 8 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: dan xwe, avêtin xwe, kirin xwe, berdan xwe, gihandin xwe, kişandin xwe, pêçandin xwe, xemilandin xwe
Relations Overview
- This corpus uses 7 relation subtypes: advmod:neg, case:circ, compound:lvc, compound:nn, compound:redup, nmod:dat, nmod:poss
- The following 7 relation types are not used in this corpus at all: iobj, vocative, expl, clf, list, goeswith, reparandum