UD Coptic Scriptorium
Language: Coptic (code: cop
)
Family: Afro-Asiatic, Egyptian
This treebank has been part of Universal Dependencies since the UD v1.4 release.
The following people have contributed to making this treebank part of UD: Mitchell Abrams, Elizabeth Davidson, Amir Zeldes.
Repository: UD_Coptic-Scriptorium
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY 4.0
Genre: bible, fiction, nonfiction
Questions, comments? General annotation questions (either Coptic-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [amir • zeldes (æt) georgetown • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually, natively in UD style |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | assigned by a program, not checked manually |
Relations | annotated manually, natively in UD style |
Description
UD Coptic contains manually annotated Sahidic Coptic texts, currently from the Gospel of Mark, Shenoute of Atripe’s “Not Because a Fox Barks”, the Letters of Besa, and several short stories from the Apophthegmata Patrum.
The Coptic Universal Dependency Treebank is a manually annotated corpus of Sahidic Coptic texts, currently containing excerpts from the Sahidic New Testament Gospel of Mark, Archmandrite Shenoute of Atripe’s “Not Because a Fox Barks”, the Letters of Besa, and several short stories from the Apophthegmata Patrum (Sayings of the Desert Fathers). Detailed information about the treebank is available here:
http://copticscriptorium.org/treebank.html
The data was digitized and annotated manually for part of speech in the project Coptic Scriptorium. For individual credit and further information see:
http://copticscriptorium.org/
Coptic POS tags come from the Coptic Scriptorium tag set, which is available from the project and treebank websites.
Acknowledgments
The underlying POS tagged material was produced as part of the projects Coptic Scriptorium, KOMeT and KELLIA, funded by the NEH in the USA and BMBF and DFG in Germany (see http://copticscriptorium.org/ for more details). Treebank annotation was done mainly by Mitchell Abrams, Liz Davidson and Amir Zeldes. Thanks are also due to Israel Avrahamy, Asael Benyami, Yinon Kahan and Oran Szachter for their contributions.
Statistics of UD Coptic Scriptorium
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Definite – Gender – Gender[psor] – Number – Number[psor] – NumType – Person – Polarity – Poss – PronType – Reflex – VerbForm
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – csubj – dep – det – discourse – dislocated – fixed – flat – iobj – mark – nmod – nsubj – nummod – obj – obl – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 840 sentences, 10366 tokens and 22057 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 11 types of words that contain both letters and punctuation. Examples: ·ⲻ, [....]ⲥ, [...]ϥ̣, [ⲉⲃⲟ]ⲗ, ϩⲏ[..]ⲉ, ϭⲙ[ϣⲓ]ⲛⲉ, ⲉ[.....], ⲉⲃ[........], ⲟⲩⲇ[.......], ⲡ̣[…]ⲡⲟⲥ, ⲡⲁ[…]ϥⲟϭⲉ
- This corpus contains 6491 multi-word tokens. On average, one multi-word token consists of 2.80 syntactic words.
- There are 4100 types of multi-word tokens. Examples: ⲛⲁϥ, ⲙⲙⲟⲥ, ⲙⲡⲛⲟⲩⲧⲉ, ⲛⲁⲩ, ⲙⲙⲟϥ, ⲛⲧϩⲉ, ⲉⲣⲟϥ, ⲧⲏⲣⲟⲩ, ⲙⲙⲟⲟⲩ, ⲡⲉϫⲁϥ, ⲉⲣⲟⲟⲩ, ⲛⲧⲉⲩⲛⲟⲩ, ⲉϥϫⲱ, ⲛⲏⲧⲛ, ⲛⲣⲱⲙⲉ, ⲛϩⲏⲧϥ, ⲛϩⲏⲧ, ⲛϩⲏⲧⲟⲩ, ⲛⲁⲕ, ⲉⲧⲙⲙⲁⲩ, ⲡⲛⲟⲩⲧⲉ, ⲁⲩⲉⲓ, ⲛⲥⲱϥ, ⲡⲉⲭⲣⲓⲥⲧⲟⲥ, ⲡⲣⲣⲟ, ⲁϥⲉⲓ, ⲛⲧⲉⲡⲛⲟⲩⲧⲉ, ⲉⲣⲱⲧⲛ, ⲙⲙⲟⲕ, ⲙⲡϫⲟⲉⲓⲥ, ⲛϩⲏⲧⲧⲏⲩⲧⲛ, ⲛⲙⲙⲁϥ, ⲁϥⲃⲱⲕ, ⲉⲧⲃⲉⲡⲁⲓ, ⲛⲁⲥ, ⲛⲧⲉⲓϩⲉ, ⲧϩⲉ, ⲉⲧⲟⲩⲁⲁⲃ, ⲉⲧⲥⲏϩ, ⲙⲡⲉⲭⲣⲓⲥⲧⲟⲥ, ⲛⲙⲙⲁⲩ, ⲛⲟⲩⲱⲧ, ϣⲁⲣⲱⲧⲛ, ⲁϥϫⲟⲟⲥ, ⲉⲩϫⲱ, ⲙⲙⲟⲓ, ⲙⲡⲉⲛϫⲟⲉⲓⲥ, ⲛϣⲉ, ⲧⲏⲣϥ, ⲧⲏⲣⲥ.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: INTJ, SYM
- This corpus contains 33 word types tagged as particles (PART): ϩⲁⲙⲏⲛ, ϩⲏⲏⲛⲉ, ϩⲏⲏⲧⲉ, ϩⲛ, ϫⲉ, ϭⲉ, ⲁⲣⲁ, ⲅⲁⲣ, ⲇⲉ, ⲉ, ⲉϩⲉ, ⲉⲓⲉ, ⲉⲓⲥ, ⲉⲛⲉ, ⲉⲣⲉ, ⲉⲧⲃⲉ, ⲉⲧⲉ, ⲙⲉⲛ, ⲙⲙⲟ, ⲙⲙⲟⲛ, ⲙⲛⲛⲥⲁ, ⲛ, ⲛϭⲓ, ⲛⲁ, ⲛⲉ, ⲛⲧ, ⲛⲧⲉ, ⲟⲩⲛ, ⲟⲩⲟⲓ, ⲡⲉ, ⲣⲱ, ⲭⲱⲣⲓⲥ, ⲱ
- This corpus contains 54 lemmas tagged as pronouns (PRON): ϩⲁϩⲧⲛ, ϩⲱⲱ_ⲁⲛⲟⲕ, ϯ, ⲁ, ⲁ_ⲛⲧⲟ, ⲁϣ, ⲁⲛⲟⲕ, ⲁⲛⲟⲛ, ⲅ, ⲉ_ⲛⲧⲟ, ⲉϫⲛ_ⲛⲧⲟ, ⲉⲕⲉ, ⲉⲛⲉ, ⲉⲣϣⲁⲛ_ⲁⲛⲟⲕ, ⲉⲣϣⲁⲛ_ⲛⲧⲟϥ, ⲉⲣϣⲁⲛ_ⲛⲧⲟⲕ, ⲉⲣϣⲁⲛ_ⲛⲧⲟⲟⲩ, ⲉⲣϣⲁⲛ_ⲛⲧⲱⲧⲛ, ⲉⲣⲉ, ⲉⲣⲉ_ⲁⲛⲟⲕ, ⲉⲣⲉ_ⲁⲛⲟⲛ, ⲉⲣⲉ_ⲛⲧⲟ, ⲉⲣⲉ_ⲛⲧⲟϥ, ⲉⲣⲉ_ⲛⲧⲟⲟⲩ, ⲉⲣⲉ_ⲛⲧⲱⲧⲛ, ⲉⲥ, ⲉⲧⲉⲣⲉ_ⲛⲧⲟ, ⲙⲙⲓⲛⲙⲙⲟ_ⲛⲧⲟ, ⲙⲡⲉ_ⲛⲧⲟ, ⲛ, ⲛ_ⲛⲧⲟ, ⲛⲉ, ⲛⲉⲣⲉ_ⲛⲧⲟ, ⲛⲓⲙ, ⲛⲥⲁ_ⲛⲧⲟ, ⲛⲧⲉ, ⲛⲧⲉ_ⲁⲛⲟⲕ, ⲛⲧⲉⲧⲛ, ⲛⲧⲟ, ⲛⲧⲟϥ, ⲛⲧⲟⲕ, ⲛⲧⲟⲟⲩ, ⲛⲧⲟⲥ, ⲛⲧⲱⲧⲛ, ⲟⲩ, ⲟⲩⲏⲣ, ⲡ, ⲡⲉ, ⲣⲟ, ⲥϥ, ⲥⲉ, ⲧⲉⲧ, ⲧⲉⲧⲛ, ⲧⲱⲛ
- This corpus contains 26 lemmas tagged as determiners (DET): ϥ, ϩⲛ, ϭⲉ, ⲕⲉ, ⲛ, ⲛⲁ, ⲛⲟⲩ, ⲛⲟⲩⲕ, ⲛⲧⲟⲟⲩ, ⲟⲩ, ⲡ, ⲡⲁ, ⲡⲁⲓ, ⲡⲉ, ⲡⲉϥ, ⲡⲉⲓ, ⲡⲉⲕ, ⲡⲉⲛ, ⲡⲉⲥ, ⲡⲉⲧⲛ, ⲡⲉⲩ, ⲡⲏ, ⲡⲓ, ⲡⲟⲩ, ⲡⲱⲧⲛ, ⲧ
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: ⲛ, ⲛⲧⲟⲟⲩ, ⲟⲩ, ⲡ, ⲡⲉ
- This corpus contains 19 lemmas tagged as auxiliaries (AUX): ϣⲁ, ϣⲁⲣⲉ, ⲁ, ⲉⲣⲉ, ⲙⲁ, ⲙⲁⲣⲉ, ⲙⲉ, ⲙⲉⲣⲉ, ⲙⲡⲁⲧⲉ, ⲙⲡⲉ, ⲙⲡⲣⲧⲣⲉ, ⲛ, ⲛⲁ, ⲛⲉ, ⲛⲉⲣⲉ, ⲛⲛⲉ, ⲛⲧⲉ, ⲧⲁⲣ, ⲧⲁⲣⲉ
- Out of the above, 4 lemmas occurred sometimes as AUX and sometimes as VERB: ϣⲁ, ⲙⲁ, ⲙⲉ, ⲛⲁ
- There are 2 (de)verbal forms:
- Fin
- VERB: ϫⲱ, ϣⲱⲡⲉ, ⲉⲓ, ⲃⲱⲕ, ϯ, ⲡⲉϫⲁ, ⲛⲁⲩ, ⲥⲱⲧⲙ, ϫⲟⲟ, ⲣ
- Inf
- VERB: ϯ, ϣⲁϫⲉ, ϫⲓ, ⲉⲓ, ⲉⲓⲙⲉ, ⲛⲁⲩ, ⲣ, ⲧⲟⲩϫⲉ, ϫⲟⲟ, ϭⲱ
Nominal Features
- Fem
- DET: ⲧ, ⲧⲉ, ⲧⲉϥ, ⲧⲉⲓ, ⲧⲁⲓ, ⲧⲉⲩ, ⲧⲁ, ⲧⲟⲩ, ⲧⲉⲥ, ⲧⲉⲕ
- PRON: ⲥ, ⲧⲉ, ⲉ, ⲉⲣⲟ, ⲁⲣ, ⲛⲉⲣ, ⲙⲙⲟ, ⲉⲣ, ⲛⲧⲟⲥ, ⲉⲣⲉ
- Masc
- DET: ⲡ, ⲡⲉ, ⲡⲉϥ, ⲡⲁⲓ, ⲡⲉⲛ, ⲡⲉⲕ, ⲡⲁ, ⲡⲉⲓ, ⲡⲉⲩ, ⲡⲓ
- PRON: ϥ, ⲕ, ⲡⲉ, ⲛⲧⲟϥ, ⲅ, ⲛⲧⲟⲕ, ⲉϥϣⲁⲛ, ⲉⲕϣⲁⲛ, ⲡ, ⲉϥⲉ
- Plur
- DET: ⲛ, ⲛⲉ, ⲛⲉϥ, ⲛⲉⲩ, ⲛⲁⲓ, ⲛⲉⲧⲛ, ⲛⲉⲕ, ⲛⲁ, ⲛⲓ, ⲛⲉⲛ
- PRON: ⲩ, ⲟⲩ, ⲧⲛ, ⲧⲉⲧⲛ, ⲛ, ⲥⲉ, ⲛⲉ, ⲧⲏⲩⲧⲛ, ⲛⲧⲱⲧⲛ, ⲉⲩⲉ
- Sing
- DET: ⲡ, ⲧ, ⲟⲩ, ⲡⲉ, ϩⲉⲛ, ⲡⲉϥ, ⲧⲉ, ⲡⲁⲓ, ⲩ, ⲡⲉⲛ
- PRON: ϥ, ⲥ, ⲕ, ⲡⲉ, ⲓ, ϯ, ⲧⲉ, ⲁⲛⲟⲕ, ⲛⲧⲟϥ, ⲉ
- Def
- ADV: ⲙⲙⲓⲛⲙⲙⲟ, ⲙⲙⲓⲛⲙⲙⲱ
- DET: ⲡ, ⲛ, ⲧ, ⲡⲉ, ⲡⲉϥ, ⲧⲉ, ⲡⲁⲓ, ⲛⲉ, ⲛⲉϥ, ⲛⲉⲩ
- PRON: ϥ, ⲩ, ⲟⲩ, ⲥ, ⲕ, ⲧⲛ, ⲓ, ⲧⲉⲧⲛ, ⲛ, ⲥⲉ
- Ind
- DET: ⲟⲩ, ϩⲉⲛ, ⲩ
Degree and Polarity
- Neg
- ADV: ⲁⲛ, ⲛ, ⲧⲙ, ⲙⲡⲣ, ⲙ, ⲟⲩⲕ, ⲁⲙ, ⲟⲩ
- CCONJ: ⲟⲩⲇⲉ, ⲙⲙⲟⲛ
- PART: ⲙⲙⲟⲛ
- X: ⲟⲩ
Verbal Features
Pronouns, Determiners, Quantifiers
- Art
- DET: ⲡ, ⲛ, ⲧ, ⲟⲩ, ⲡⲉ, ϩⲉⲛ, ⲧⲉ, ⲕⲉ, ⲛⲉ, ⲩ
- Dem
- DET: ⲡⲁⲓ, ⲛⲁⲓ, ⲧⲉⲓ, ⲡⲉⲓ, ⲧⲁⲓ, ⲛⲓ, ⲡⲓ, ⲛⲉⲓ, ⲛⲏ
- Ind
- PRON: ⲛⲓⲙ, ⲟⲩ
- Int
- PRON: ⲟⲩ, ⲛⲓⲙ, ⲁϣ, ⲧⲱⲛ, ⲟⲩⲏⲣ
- Prs
- ADV: ⲙⲙⲓⲛⲙⲙⲟ, ⲙⲙⲓⲛⲙⲙⲱ
- DET: ⲡⲉϥ, ⲛⲉϥ, ⲛⲉⲩ, ⲡⲉⲛ, ⲧⲉϥ, ⲡⲉⲕ, ⲡⲁ, ⲛⲉⲧⲛ, ⲛⲉⲕ, ⲡⲉⲩ
- PRON: ϥ, ⲩ, ⲟⲩ, ⲥ, ⲕ, ⲧⲛ, ⲓ, ⲧⲉⲧⲛ, ⲛ, ⲥⲉ
- Rcp
- NOUN: ⲉⲣⲏⲩ
- Tot
- ADV: ⲧⲏⲣ
- Card
- NUM: ⲟⲩⲁ, ϣⲉ, ⲙⲏⲧ, ⲙⲛⲧⲥⲛⲟⲟⲩⲥ, ⲥⲛⲁⲩ, ⲧⲃⲁ, ϣⲏⲧ, ϣⲟⲙⲛⲧ, ⲙⲁⲁⲃ, ⲥⲉ
- Yes
- DET: ⲡⲉϥ, ⲛⲉϥ, ⲛⲉⲩ, ⲡⲉⲛ, ⲧⲉϥ, ⲡⲉⲕ, ⲡⲁ, ⲛⲉⲧⲛ, ⲛⲉⲕ, ⲡⲉⲩ
- PRON: ⲟⲩ, ϥ, ⲥ, ⲕ, ⲧⲛ, ⲛ, ⲩ, ⲧ, ⲧⲏⲩⲧⲛ, ⲉ
- Yes
- ADV: ⲙⲙⲓⲛⲙⲙⲟ, ⲙⲙⲓⲛⲙⲙⲱ
- PRON: ⲙⲙⲓⲛⲙⲙⲟ
- 1
- DET: ⲡⲉⲛ, ⲡⲁ, ⲛⲁ, ⲧⲁ, ⲛⲉⲛ, ⲧⲉⲛ
- PRON: ⲓ, ⲛ, ϯ, ⲧⲛ, ⲁⲛⲟⲕ, ⲁⲛⲟⲛ, ⲁⲛⲅ, ⲧⲁ, ⲉⲓ, ⲧ
- 2
- DET: ⲡⲉⲕ, ⲛⲉⲧⲛ, ⲛⲉⲕ, ⲡⲟⲩ, ⲧⲟⲩ, ⲛⲟⲩ, ⲡⲉⲧⲛ, ⲧⲉⲕ, ⲧⲉⲧⲛ
- PRON: ⲕ, ⲧⲉⲧⲛ, ⲧⲛ, ⲧⲏⲩⲧⲛ, ⲛⲧⲱⲧⲛ, ⲉ, ⲅ, ⲛⲧⲟⲕ, ⲧⲉ, ⲉⲣⲟ
- 3
- DET: ⲡⲉϥ, ⲛⲉϥ, ⲛⲉⲩ, ⲧⲉϥ, ⲡⲉⲩ, ⲧⲉⲩ, ⲧⲉⲥ, ⲡⲉⲥ, ⲛⲉⲥ
- PRON: ϥ, ⲩ, ⲥ, ⲟⲩ, ⲥⲉ, ⲛⲧⲟϥ, ⲉⲩⲉ, ⲛⲧⲟⲟⲩ, ⲉϥϣⲁⲛ, ⲉⲩϣⲁⲛ
- Fem
- DET: ⲡⲟⲩ, ⲧⲟⲩ, ⲛⲟⲩ, ⲧⲉⲥ, ⲡⲉⲥ, ⲛⲉⲥ
- Masc
- DET: ⲡⲉϥ, ⲛⲉϥ, ⲧⲉϥ, ⲡⲉⲕ, ⲛⲉⲕ, ⲧⲉⲕ
- Plur
- DET: ⲛⲉⲩ, ⲡⲉⲛ, ⲛⲉⲧⲛ, ⲡⲉⲩ, ⲧⲉⲩ, ⲛⲉⲛ, ⲡⲉⲧⲛ, ⲧⲉⲛ, ⲧⲉⲧⲛ
- Sing
- DET: ⲡⲉϥ, ⲛⲉϥ, ⲧⲉϥ, ⲡⲉⲕ, ⲡⲁ, ⲛⲉⲕ, ⲛⲁ, ⲧⲁ, ⲡⲟⲩ, ⲧⲟⲩ
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 3 lemmas as copulas (cop). Examples: ⲡⲉ, ⲡ, ⲛⲉ.
- This corpus uses 37 lemmas as auxiliaries (aux). Examples: ⲁ, ⲛⲁ, ⲛⲧⲉ, ⲛⲧⲉⲣⲉ, ⲙⲡⲉ, ϣⲁⲣⲉ, ⲛⲛⲉ, ϣ, ⲉⲣϣⲁⲛ, ⲙⲁⲣⲉ, ⲙⲉⲣⲉ, ϣⲁⲛⲧⲉ, ⲙⲛ, ⲙⲡⲁⲧⲉ, ⲟⲩⲛ, ⲉ, ⲉⲣϣⲁⲛ_ⲛⲧⲟⲕ, ⲙⲉ, ⲉⲣϣⲁⲛ_ⲛⲧⲟϥ, ⲉⲣⲉ, ⲙⲡⲣⲧⲣⲉ, ⲛ, ⲧⲁⲣ, ⲧⲁⲣⲉ, ϣⲁ, ϣⲁⲁⲣⲉ, ⲉϣ, ⲉⲓⲥ, ⲉⲣⲉ_ⲁⲛⲟⲕ, ⲉⲧⲉⲣⲉ, ⲙⲁ, ⲙⲙⲛⲧⲉ, ⲙⲛⲧⲉ, ⲛⲉϣ, ⲛⲉⲣⲉ, ⲡⲁⲓ, ⲡⲉϫⲉ.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (47)
- VERB--NOUN-ADP(ⲛ) (2)
- VERB--NOUN-ADP(ⲛ)-ADP(ⲛ) (1)
- VERB--PRON (13)
- VERB-Fin--NOUN (229)
- VERB-Fin--NOUN-ADP(ϩⲓⲣⲛ) (1)
- VERB-Fin--NOUN-ADP(ⲛ) (2)
- VERB-Fin--NOUN-ADP(ⲡ) (1)
- VERB-Fin--PRON (1666)
- obj
- VERB--NOUN (1)
- VERB--NOUN-ADP(ⲛ) (1)
- VERB--PRON (2)
- VERB-Fin--NOUN (146)
- VERB-Fin--NOUN-ADP(ϩⲛ) (4)
- VERB-Fin--NOUN-ADP(ⲉ) (3)
- VERB-Fin--NOUN-ADP(ⲙⲛ) (1)
- VERB-Fin--NOUN-ADP(ⲛ) (188)
- VERB-Fin--NOUN-ADP(ⲛ)-ADP(ⲛ) (4)
- VERB-Fin--NOUN-ADP(ⲛⲧⲛ) (1)
- VERB-Fin--PRON (308)
- VERB-Fin--PRON-ADP(ⲉ) (4)
- VERB-Fin--PRON-ADP(ⲉϩⲣⲁⲓ) (1)
- VERB-Fin--PRON-ADP(ⲛ) (160)
- VERB-Fin--PRON-ADP(ⲛⲁ) (4)
- VERB-Fin--PRON-ADP(ⲛⲧⲛ) (1)
- VERB-Inf--NOUN (11)
- VERB-Inf--NOUN-ADP(ⲛ) (7)
- VERB-Inf--PRON (19)
- VERB-Inf--PRON-ADP(ⲛ) (3)
- VERB-Inf--PRON-ADP(ⲛⲁ) (1)
- iobj
- VERB--NOUN (2)
- VERB--PRON (26)
- VERB--PRON-ADP(ⲛ) (2)
- VERB-Fin--PRON (1)