UD Amharic ATT
Language: Amharic (code: am
)
Family: Afro-Asiatic, Semitic
This treebank has been part of Universal Dependencies since the UD v2.2 release.
The following people have contributed to making this treebank part of UD: Binyam Ephrem, Gashaw Arutie, Tsegay Woldemariam, Juan Ignacio Navarro Horñiacek.
Repository: UD_Amharic-ATT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 4.0
Genre: grammar-examples, fiction, nonfiction, bible, news
Questions, comments? General annotation questions (either Amharic-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [binephrem (æt) gmail • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually, natively in UD style |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD_Amharic-ATT is a manual developed Treebanks for Amharic. Sentences were collected from grammar books, fictions, biographies, religious texts and news.
UD_Amharic-ATT is a manually annotated Treebanks. It is annotated for POS tag, morphological information and dependency relations. Since Amharic is a morphologically-rich, pro-drop, and languages having a feature of clitic doubling, clitics have been segmented manually.
Acknowledgments
The treebank is developed by Binyam Ephrem, Gashaw Arutie, and Tsegay Woldemariam. The syntactic annotation was checked and corrected manually by Binyam Ephrem.
References
- Binyam Ephrem Seyoum ,Yusuke Miyao and Baye Yimam Mekonnen.2018.Universal Dependencies for Amharic. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 2216–2222, Miyazaki, Japan: European Language Resources Association (ELRA)
Statistics of UD Amharic ATT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Case – Gender – Mood – Number – NumType – Person – Polarity – Poss – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – amod – aux – case – cc – ccomp – clf – compound – compound:svc – conj – cop – csubj – csubj:pass – dep – det – discourse – expl – fixed – flat – goeswith – iobj – mark – nmod – nsubj – nsubj:pass – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 1074 sentences, 5245 tokens and 10010 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 5 types of words that contain both letters and punctuation. Examples: ዬ_ው, ኣ_ት, ኣለ_ቅኔ, እየ_ዞርክ, ኧ_ሁ
- This corpus contains 2672 multi-word tokens. On average, one multi-word token consists of 2.78 syntactic words.
- There are 1857 types of multi-word tokens. Examples: ነው, ልጁ, ሄደ, አለ, ልጆቹ, ብሎ, ልብሱን, ልጁን, መጽሐፉን, መጣ, ቢሆን, ናት, አስተማሪው, እንደሆነ, ሲል, አልማዝን, ይሻላል, ቤቱን, ይመስላል, ሆነ, ምሳውን, በሩን, ብዬ, አለች, ከሆነ, ይሄዳል, ለምን, ለአልማዝ, መንገዱ, በላ, ናቸው, ኖሮ, አላውቅም, አይደለም, ከእናቱ, ይህን, ይሆን, ሄደች, ለመሄድ, ለአስቴር, መሬቱ, ሞተ, ሥራውን, በሄድኩ, በመኪና, በድንገት, ብትሆን, ተኮሰ, አለበት, አልመጣም.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: SYM, X
- This corpus contains 17 word types tagged as particles (PART): ም, በ, ን, አለ, አል, አይ, ኣ, ኣለ, ኣል, ኣን, ኣይ, እም, እየ, ከ, ው, የ, ያለ
- This corpus contains 69 lemmas tagged as pronouns (PRON): ሁ, ሁል, ህ, ለምን, ል, መቼ, መች, ማን, ማንም, ምነው, ምን, ምንም, ምንድ, ምንድን, ስንት, ሽ, ት, ን, ኝ, አንተ, ኡ, ኡት, ኢ, ኣ, ኣ_ት, ኣሁ, ኣት, ኣቸው, ኣች, ኣችሁ, ኣችን, ኣን, ኣንተ, ኣው, ኤ, እ, እርሱ, እርስ, እሱ, እሷ, እነሱ, እኔ, እን, እንዴት, እንግዲያ, እኛ, ኦ, ኧ, ኧ_ሁ, ኧሁ, ኧህ, ኧሽ, ኧት, ኧች, ኧን, ኧኝ, ኧው, ኩ, ክ, ኸ, ዋ, ው, ዎ, የት, የትም, የትኛ, የትኛው, ዬ, ይ
- This corpus contains 21 lemmas tagged as determiners (DET): ምነው, ብዙ, ኡ, ኤ, እነ, እንዲህ, እዚህ, እዚያ, እየ, ዋ, ዋን, ው, ውን, ያ, ዬ, ዬ_ው, ይህ, ይህን, ይቺ, ይች, ይኸው
- Out of the above, 6 lemmas occurred sometimes as PRON and sometimes as DET: ምነው, ኡ, ኤ, ዋ, ው, ዬ
- This corpus contains 16 lemmas tagged as auxiliaries (AUX): ሆን, ማለት, ብል, ቻል, ችል, ነ, ነበር, ን, ኖር, ኖሮ, አል, አይደል, አድርግ, ኣለ, ኣል, እየ
- Out of the above, 12 lemmas occurred sometimes as AUX and sometimes as VERB: ሆን, ማለት, ብል, ቻል, ችል, ነበር, ኖር, አል, አይደል, አድርግ, ኣለ, ኣል
- There are 2 (de)verbal forms:
- Conv
- VERB: ገዝት, ብል, መጥት, በልት, ከፍት, ዘንብ, ይዝ, ሄድ, ሠራርት, ረጥብ
- Vnoun
- NOUN: መሄድ, መምጣት, መሆን, መሥራት, መግደል, መክፈል, መስራት, መቆየት, መታመም, መጻፍ
- VERB: መምጣት, ማለፍ, መሄጃ, መሆን, መሆኛ, መስል, መስረቅ, መኖር, መወሰን, መውደቅ
Nominal Features
- Com
- PRON: ኝ, ኧሁ, ሁ, ን, ኤ, እ
- Fem
- DET: ዋ
- PRON: ኧች, ት, ኣት, ኣ, ዋ, ኢ, ሽ, ኣች, ኧሽ, ኡ
- Masc
- DET: ው
- PRON: ኧ, ይ, ት, ኦ, ው, ኡ, ህ, ኧት, ኧህ, ክ
- VERB: ይ
- Neut
- PRON: ኝ, ኧሁ, ሁ, ኣቸው, ን, ኡ, ኣሁ, ኧ, ኧኝ, ዋ
- Dual
- PRON: ኢ
- Plur
- ADJ: ብዙዎች, ተሰዳጆች
- NOUN: ልጆች, ተማሪዎች, ሰዎች, ወታደሮች, ጓደኞች, ልጁች, መጽሐፎች, ሴቶች, በጎች, አይኖች
- NUM: መቶዎች
- PRON: ኡ, ኣቸው, ኣችን, ን, ኧን, እን, ኣችሁ, ኧ, ት, ኡት
- Sing
- DET: ው
- PRON: ኧ, ይ, ት, ኧው, ኧች, ኦ, ው, ኝ, ኣት, ኡ
- VERB: ይ
- Abl
- ADP: ለ, ከ
- Ben
- ADP: ል, ለ, በ, ብ
- Ins
- ADP: በ
- Loc
- ADP: በ, ከ, ወደ, እ, ላይ, እሰከ
- Mal
- ADP: ብ, በ, ል
Degree and Polarity
- Neg
- PART: ኣል, አል, ኣለ, ያለ, አለ, ኣይ, ም, አይ
Verbal Features
- Jus
- VERB: ሂድ, ብላ, ላክ, መጣ, ሰጥ, ስበር, አሳይ, አውጣ, ውረድ, ዘጋ
- Past
- AUX: ነበር
- Cau
- VERB: አስወሰድ, አስገደል, አስያዝ, አሳርፍ, አሳጠብ, አስመሽ, አስረዘም, አስሸለም, አስሸከም, አስቀመስ
- Pass
- VERB: ተሻል, ተሰረቅ, ተደሰት, ተቀመጥ, ተለወጥ, ተመለስ, ተመኝ, ተሸለም, ተበደር, ተገነዘብ
- VERB-Conv: ተብል, ተቸግር, ተይዝ, ታስር
- Rcp
- VERB: አጋደል, ሰባበር, ተለዋወጥ, ተነጋገር, ተናነቅ, ተንከባከብ, ተወራውር, ተደባደብ, ተገዳደል, ተጋደል
- Trans
- VERB: አለቀስ, አመጥ, አነሥ, አነበብ, አደናቀፍ, አገነፍ, አገኘ, አግዝ, አጠብ
Pronouns, Determiners, Quantifiers
- Card
- NUM: አንድ, ሁለት, ሦስት, ብዙ, አስር, 1.85, ስምንት, ሶስት, ሺህ, በጣም
- Yes
- PRON: ኡ, ዋ, ኤ, ህ, ኣችን, ኣቸው, ው, ዬ, ሽ, ኣችሁ
- 1
- PRON: ኝ, ኤ, እ, ሁ, ኧሁ, ኩ, ኧኝ, ን, ኧን, እን
- 2
- PRON: ህ, ኧህ, ት, ሽ, ኢ, ክ, ኣችን, ኧ, ኣችሁ, ኧሽ
- 3
- DET: ው
- PRON: ኧ, ይ, ኡ, ት, ኧው, ኧች, ኦ, ው, ኣት, ኣ
- VERB: ይ
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 7 lemmas as copulas (cop). Examples: ን, ነበር, ሆን, አይደል, መልካም, ነ, ጮማ.
- This corpus uses 19 lemmas as auxiliaries (aux). Examples: ኣል, ን, ነበር, ሆን, ችል, ኖር, እየ, ብል, ማለት, ሻል, ቻል, ኖሮ, አል, አይደል, አድርግ, ኣለ, ወዴት, ወድ, ጀመር.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (238)
- VERB--NOUN-ADP(ን) (3)
- VERB--NOUN-ADP(የ) (1)
- VERB--PRON (872)
- VERB-Conv--NOUN (4)
- VERB-Conv--NOUN-ADP(ን) (1)
- VERB-Conv--PRON (34)
- VERB-Vnoun--PRON (9)
- obj
- VERB--NOUN (201)
- VERB--NOUN-ADP(ለ) (1)
- VERB--NOUN-ADP(መ) (1)
- VERB--NOUN-ADP(ን) (205)
- VERB--NOUN-ADP(ከ)-ADP(በስተቀር) (1)
- VERB--NOUN-ADP(ወደ) (2)
- VERB--NOUN-ADP(የ)-ADP(ን) (1)
- VERB--PRON (76)
- VERB--PRON-ADP(ን) (9)
- VERB-Conv--NOUN (9)
- VERB-Conv--NOUN-ADP(ን) (11)
- VERB-Conv--PRON-ADP(ን) (1)
- VERB-Vnoun--NOUN (1)
- VERB-Vnoun--NOUN-ADP(ለ) (1)
- VERB-Vnoun--NOUN-ADP(ን) (3)
- iobj
- VERB--NOUN (5)
- VERB--NOUN-ADP(ን) (7)
- VERB--PRON (12)
Relations Overview
- This corpus uses 3 relation subtypes: compound:svc, csubj:pass, nsubj:pass
- The following 6 relation types are not used in this corpus at all: vocative, dislocated, appos, list, orphan, reparandum