UD Yoruba YTB
Language: Yoruba (code: yo
)
Family: Niger-Congo, Defoid
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Adédayọ̀ Olúòkun, Daniel Zeman, Seyi Williams, Ọlájídé Ishola.
Repository: UD_Yoruba-YTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 4.0
Genre: bible
Questions, comments? General annotation questions (either Yoruba-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [zeman (æt) ufal • mff • cuni • cz]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually, natively in UD style |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
Parts of the Yoruba Bible, hand-annotated natively in Universal Dependencies.
…
Acknowledgments
…
References
- (citation)
Statistics of UD Yoruba YTB
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Case – Number – NumType – Person – PronType – Typo
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – compound – compound:prt – compound:svc – conj – cop – csubj – det – discourse – expl – goeswith – iobj – list – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 100 sentences and 2666 tokens.
- This corpus contains 459 tokens (17%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 4 types of words that contain both letters and punctuation. Examples: kárùn-ún, kìn-ín-ní, níhìn-ín, níhín-ín
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: INTJ, SYM
- This corpus contains 4 word types tagged as particles (PART): kì, kìí, kò, mo
- This corpus contains 40 lemmas tagged as pronouns (PRON): a, ara, i, mi, mo, mí, ohun, rẹ, rẹ̀, ti, tirẹ̀, tiwọn, tèmi, tí, u, un, wa, wá, wọn, wọ́n, yín, Òun, àwa, àwọn, á, èmi, èwo, èyí, èéṣe, ìwọ, í, ó, ú, ún, ẹ, ẹni, ẹnìkan, ẹ̀yin, ọ, ọ́
- This corpus contains 5 lemmas tagged as determiners (DET): gbogbo, náà, wọ̀nyí, yìí, àwọn
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: àwọn
- This corpus contains 23 lemmas tagged as auxiliaries (AUX): baà, bá, jẹ́, kí, le, lè, má, máa, ni, ní, se, sì, sí, ti, tí, tíì, wà, yóò, í, ó, ń, ṣe, ṣeé
- Out of the above, 9 lemmas occurred sometimes as AUX and sometimes as VERB: jẹ́, lè, máa, ní, sí, wà, yóò, ń, ṣe
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- PRON: wọn, a, wọ́n, ẹ̀yin, yín, ẹ, wa, àwa, tiwọn
- Sing
- PRON: ó, rẹ̀, èmi, mi, i, ìwọ, un, tirẹ̀, mo, mí
- Acc
- PRON: wọn, mi, i, mí, wa, wọ́n, ọ́, èmi, ẹ, ìwọ
- Gen
- PRON: rẹ̀, un, yín, tirẹ̀, Òun, á, rẹ, tiwọn, ú, tèmi
- Nom
- PRON: ó, èmi, wọn, a, ìwọ, ẹ̀yin, mo, wọ́n, àwa, ẹ
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
- Dem
- PRON: èyí
- Emp
- PRON: ara
- Ind
- PRON: ẹni, àwọn, ẹnìkan
- Int
- PRON: èwo, èéṣe
- Prs
- PRON: ó, wọn, rẹ̀, èmi, mi, i, a, ìwọ, un, wọ́n
- Rel
- PRON: tí
- Card
- NUM: kan, Mẹ́ẹ̀dógún, kárùn-ún, méjì, mẹ́rin, Ọ̀kan
- 1
- PRON: èmi, mi, a, mo, mí, wa, àwa, tèmi
- 2
- PRON: ìwọ, ẹ̀yin, yín, ẹ, ọ́, rẹ, ọ
- 3
- PRON: ó, wọn, rẹ̀, i, un, wọ́n, tirẹ̀, Òun, á, tiwọn
Other Features
- Typo
- Yes
- PRON: wọ́n, ọ́
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 10 lemmas as copulas (cop). Examples: ni, wà, jẹ́, ní, mi, rẹ̀, sí, yóò, ń, ṣe.
- This corpus uses 23 lemmas as auxiliaries (aux). Examples: ń, kí, ni, ti, yóò, jẹ́, lè, máa, ṣe, bá, má, í, baà, se, le, nì, sì, sí, tí, tíì, ìbá, ó, ṣeé.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (64)
- VERB--NOUN-ADP(kí) (1)
- VERB--NOUN-ADP(sí) (1)
- VERB--PRON (19)
- VERB--PRON-ADP(lẹ́yìn) (1)
- VERB--PRON-Gen (1)
- VERB--PRON-Nom (123)
- VERB--PRON-Nom-ADP(nínú) (3)
- obj
- VERB--NOUN (66)
- VERB--NOUN-ADP(fún) (2)
- VERB--NOUN-ADP(jáde) (1)
- VERB--NOUN-ADP(kúrò) (2)
- VERB--NOUN-ADP(kúrò)-ADP(lọ́wọ́) (1)
- VERB--NOUN-ADP(láti) (3)
- VERB--NOUN-ADP(láti)-ADP(àárin) (1)
- VERB--NOUN-ADP(láti)-ADP(ọ̀dọ̀) (1)
- VERB--NOUN-ADP(lẹ́bàá) (1)
- VERB--NOUN-ADP(ní) (4)
- VERB--NOUN-ADP(nínú) (2)
- VERB--NOUN-ADP(nípa)-ADP(ti) (1)
- VERB--NOUN-ADP(pẹ̀lú) (1)
- VERB--NOUN-ADP(sí) (3)
- VERB--NOUN-ADP(sí)-ADP(àárin) (1)
- VERB--NOUN-ADP(sókè) (1)
- VERB--NOUN-ADP(ti) (2)
- VERB--PRON (4)
- VERB--PRON-ADP(nínú) (1)
- VERB--PRON-Acc (30)
- VERB--PRON-Acc-ADP(fún) (10)
- VERB--PRON-Acc-ADP(lẹ́yìn) (1)
- VERB--PRON-Acc-ADP(lọ́wọ́) (1)
- VERB--PRON-Acc-ADP(nínú) (1)
- VERB--PRON-Acc-ADP(sí) (4)
- VERB--PRON-Acc-ADP(ti) (1)
- VERB--PRON-Acc-ADP(ṣáájú) (1)
- VERB--PRON-Gen (17)
- VERB--PRON-Gen-ADP(fún) (14)
- VERB--PRON-Gen-ADP(kúrò)-ADP(lọ́dọ̀) (1)
- VERB--PRON-Gen-ADP(lọ́wọ́) (1)
- VERB--PRON-Gen-ADP(ní) (1)
- VERB--PRON-Gen-ADP(nínú) (1)
- VERB--PRON-Gen-ADP(sọ́dọ̀) (3)
- VERB--PRON-Nom (3)
- iobj
- VERB--NOUN (5)
- VERB--NOUN-ADP(sí) (1)
- VERB--PRON-Gen (1)
Relations Overview
- This corpus uses 2 relation subtypes: compound:prt, compound:svc
- The following 8 relation types are not used in this corpus at all: vocative, dislocated, appos, clf, fixed, flat, reparandum, dep