UD Irish IDT
Language: Irish (code: ga
)
Family: Indo-European, Celtic
This treebank has been part of Universal Dependencies since the UD v1.0 release.
The following people have contributed to making this treebank part of UD: Teresa Lynn, Jennifer Foster.
Repository: UD_Irish-IDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 3.0
Genre: news, fiction, web, legal
Questions, comments? General annotation questions (either Irish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [teresa • lynn (æt) adaptcentre • ie; jennifer • foster (æt) dcu • ie]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | assigned by a program, with some manual corrections, but not a full manual verification |
Features | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Relations | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Description
A Universal Dependencies 1020-sentence treebank for modern Irish.
The Irish UD Treebank is a conversion of the Irish Dependency Treebank (IDT).
IDT development was part of a PhD research project by Teresa Lynn at Dublin City University, Ireland (Lynn, 2016). The IDT data has been released on [GitHub] (https://github.com/tlynn747/IrishDependencyTreebank). The Treebank contains 1020 sentences taken from the New Corpus of Ireland-Irish (NCII), with text from books, newswire, websites and other media. These sentences are a subset of a gold-standard POS-tagged corpus for Irish.
The conversion from the IDT annotation scheme to the UD annotation scheme was designed by Teresa Lynn and Jennifer Foster at Dublin City University, Ireland. The mapping to UD is reported in Lynn et al., (2016)
The UD Treebank is split into three sets as follows:
- 454 trees (test)
- 445 trees - 11,533 tokens (dev)
- 121 trees - 3425 tokens (train)
Note: the split was formerly 150- test, 150-dev, 720-train, but have split as above for the 2017 CoNLL shared task on dependency parsing.
Acknowledgments
We wish to thank all of the contributors to the original IDT annotation, including Elaine Uí Dhonnchadha for her gold POS-tagged corpus and linguistic advice. We would also like to acknowledge linguistic advice offered by Kevin Scannell in the conversion to UD effort.
This research is partially supported by Science Foundation Ireland through the ADAPT Centre for Digital Content Technology. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Statistics of UD Irish IDT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Abbr – Case – Definite – Degree – Dialect – Foreign – Form – Gender – Mood – NounType – Number – NumType – PartType – Person – Polarity – Poss – PrepForm – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl:relcl – advcl – advmod – amod – appos – case – case:voc – cc – ccomp – compound – compound:prt – conj – cop – csubj:cleft – csubj:cop – det – discourse – fixed – flat – flat:name – list – mark – mark:prt – nmod – nmod:poss – nsubj – nummod – obj – obl – obl:prep – obl:tmod – parataxis – punct – root – vocative – xcomp – xcomp:pred
Tokenization and Word Segmentation
- This corpus contains 1020 sentences and 23964 tokens.
- This corpus contains 2516 tokens (10%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 204 types of words that contain both letters and punctuation. Examples: d', b', fho-alt, (a), (b), a', m', (c), Co., 'n, Uimh., s', t-am, ", O', a'm, n', n-a, n-oibreacha, n-áirítear, nua-aimseartha, t-airgead, t-ábhar, (d), (i), (ii), Anne-Marie, Ard-Chomhairle, Ard-Mhúsaem, I.R., J., P., bhfo-alt, h-Íde, mb', meán-suidhte, n-athair, n-éireoidh, nea-mbrí, t-eolas, t-ionad, 's, (W), (e), (f), (iii), (iv), (vi), -e, .i.
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 34 word types tagged as particles (PART): Mac, Mc, Mhic, Nic, O', Uí, a, a', ab, an, ar, ba, d', de, do, faoina, go, gur, ina, inar, is, le, lena, lenar, n', n-a, nach, ná, nár, ní, níor, o, trasna, Ó
- This corpus contains 28 lemmas tagged as pronouns (PRON): cad, ceachtar, cibé, cé, céard, ea, eisean, féin, iad, ise, iúd, mise, muid, mé, pé, seisean, seo, siad, sibh, sin, sinn, siúd, sé, sí, tusa, tú, é, í
- This corpus contains 21 lemmas tagged as determiners (DET): a, an, aon, cad, cibé, cé, do, eile, gach, gach_uile, iomaí, leath, mo, na, s, seo, sin, siúd, uile, ár, úd
- Out of the above, 6 lemmas occurred sometimes as PRON and sometimes as DET: cad, cibé, cé, seo, sin, siúd
- This corpus contains 9 lemmas tagged as auxiliaries (AUX): ar, cad, cé, is, má, ní, seo, sin, sé
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: ar
- There are 4 (de)verbal forms:
- Cop
- AUX: is, ba, gur, ní, nach, b', gurb, níor, ar, gurbh
- SCONJ: más, dar, ós, Sular, murab
- X: Caidé
- Inf
- NOUN: fáil, bheith, chur, dhéanamh, rá, dul, thabhairt, cur, tabhairt, bhaint
- Part
- ADJ: déanta, bunaithe, imithe, leagtha, ráite, scríofa, tugtha, Aontaithe, bailithe, briste
- Vnoun
- NOUN: dul, faire, obair, teacht, éirí, déanamh, iarraidh, brath, cur, breathnú
Nominal Features
- Fem
- ADJ: mhaith, mhór, bheag, chéanna, iomlán, poiblí, óg, fada, inrátaithe, luachmhar
- ADP: uirthi, ina, di, aici, á, inti, dá, léi, chuici, léithe
- DET: na, a
- NOUN: chuid, leith, bhliain, áit, uair, bliain, gcuid, cuid, aghaidh, gceist
- PRON: sí, í, ise, hí
- PROPN: Gaeilge, hÉireann, Éirinn, Ghaeilge, Mháire, Fraince, nGaeilge, Éire, Meiriceá, Máire
- X: open
- Masc
- ADJ: mór, éigin, beag, chéanna, fada, óg, ard, bán, deireanach, eachtraigh
- ADP: ann, air, ina, leis, aige, á, dá, dó, chuige, lena
- AUX-Cop: Sé
- DET: a
- NOUN: rud, duine, fear, lá, daoine, chéile, alt, am, ceann, measc
- PRON: sé, é, seisean, eisean, hé
- PROPN: Átha, Bhaile, Seán, mBaile, Phádraig, Bhreandán, Chill, Chorcaí, Eoin, Fianna
- VERB: Tá's
- X: fá, Kill
- Plur
- ADJ: beaga, éagsúla, difriúla, aisteacha, deonacha, lansacha, maithe, móra, poiblí, príomhúla
- ADP: acu, ina, orthu, dóibh, leo, sna, againn, dúinn, á, díobh
- DET: na, a, ár
- NOUN: daoine, dhaoine, tíortha, blianta, rudaí, rialacha, scrúduithe, Ballstáit, scéalta, seirbhísí
- PRON: iad, siad, muid, hiad, siadsan, sibh, sinn, sinne, iadsan
- PROPN: mBaile, Eorpach, Mumhan, gCill, hIceadha, Éireannaigh, Comharsain, Doirí, Fíoncheannaithe, Gaeil
- VERB: Táimid, bhíodar, Amharcaigí, Bhíomar, Casaimid, Chaitheamar, Chuamar, Chuiridís, Creidimidne, Fuaireamar
- Sing
- ADJ: éigin, mór, chéanna, mhaith, mhór, óg, fada, beag, iomlán, ard
- ADP: sa, ann, den, ina, air, san, leis, ón, don, á
- AUX-Cop: Sé, Cén
- DET: an, na, a, mo, do, m', a', 'n, cén, d'
- NOUN: bith, níos, rud, duine, chuid, fear, féidir, lá, leith, oiread
- PRON: sé, é, sí, mé, í, tú, cén, mise, ea, seisean
- PROPN: Gaeilge, Átha, Bhaile, Seán, hÉireann, Éirinn, Ghaeilge, Mháire, nGaeilge, Éire
- VERB: féach, bíodh, déan, bhíos, cuir, rabhas, Tabhair, faigh, glacaim, leanas
- X: fá, dein, Bhraitheas, Kill, chuireas, domhsa, open
- Dat
- NOUN: gcrích, cionn, Tigh, láimh, chionn, chois, gcionn, leabhair, mbliana, ndíg
- PROPN: Éirinn, hÉirinn
- Gen
- ADJ: beaga, eachtraigh, mhóir, speisialta, éagsúla, Caitlicí, Ultacha, bána, cháilithigh, deonacha
- DET: na
- NOUN: airgid, cinn, tíre, bliana, cathrach, chloig, hoíche, teanga, Fómhair, Gaeltachta
- NOUN-Inf: Fiosraithe, bhreithnithe, chleachta, claonta, cosanta, craptha, dhéanta, dumpála, eagraithe, fuaraithe
- PROPN: Gaeilge, Átha, Bhaile, hÉireann, mBaile, Fraince, Chill, Mumhan, hEorpa, Chaoin
- NomAcc
- ADJ: éigin, mór, chéanna, mhaith, mhór, óg, beaga, fada, beag, iomlán
- NOUN: rud, chuid, duine, fear, lá, leith, bhliain, áit, daoine, uair
- PROPN: Seán, Ghaeilge, Gaeilge, Mháire, nGaeilge, Éire, Bhreandán, Chorcaí, Fianna, John
- VERB: Tá's
- X: Kill, open
- Voc
- NOUN: Chapaillín, Dhochtúir, Oideachais, bhithiúnaigh, chúil, dhaoine, fheara, ghrá, naofacht, pheaitín
- PROPN: Dhoráid, Mháiréad, Phádraig, Pháidín, Sheáin, Tom
- Def
- DET: an, na, gach, chuile, a', a, 'n, ngach, iomaí
- NOUN: fear, bhliain, saol, leabhar, lá, hoíche, áit, bliana, chloig, méid
- NUM: dá, dhá
- PROPN: Gaeilge, hÉireann, Ghaeilge, hEorpa, Coileánach, Afraic, Bheilg, Breathnach, Ceallach, Chaitlín
- X: achan
- Ind
- NOUN: láimh, Criosanna, dlíthe, gnóthaí, oibre, uibheacha
Degree and Polarity
- Cmp,Sup
- ADJ: mó, fearr, fhearr, mhó, déanaí, báine, caoile, ceolmhaire, ciallmhaire, daoire
- Pos
- ADJ: amháin, maith, léir, mór, áirithe, fada, mó, beag, céanna, náisiúnta
- NOUN: ceart
- Neg
- AUX-Cop: ní, nach, níor, nár, níorbh, nárbh
- PART: ní, nach, níor, ná, nár, n'
- VERB: raibh, níl, bheidh, bhfuil, fhaca, bhain, chuireann, dhéanfadh, mbeidh, thugann
- X: chan, cha, dein, ná
Verbal Features
- Cnd
- AUX-Cop: Ba, mba
- VERB: bheadh, mbeadh, dtiocfadh, dhéanfadh, rachadh, bhféadfadh, bhféadfaí, chuirfeadh, dtuigfí, fhéadfadh
- Imp
- PART: ná
- VERB: bhíodh, féach, bíodh, déan, cuir, dhéanadh, Tabhair, faigh, mbínn, mbíodh
- X: dein
- Ind
- VERB: bhí, tá, raibh, atá, bhfuil, bheidh, beidh, thug, tháinig, mbeidh
- X: dhein, Bhraitheas, chuireas, deineadh
- Int
- AUX-Cop: nach, an, Cén, cad, nár
- PART: nach
- Sub
- PART: go
- VERB: Roinne, chroma, n-imí, raibh, shéide
- Fut
- VERB: bheidh, beidh, mbeidh, caithfidh, déanfaidh, cuirfidh, déanfar, féadfaidh, féadfidh, measfaidh
- Past
- AUX-Cop: ba, b', gur, níor, gurbh, mba, nár, ab, níorbh, ar
- PART: gur, níor, ar, nár
- SCONJ: Sular, murab
- SCONJ-Cop: Sular, murab
- VERB: bhí, raibh, thug, tháinig, chuir, dúirt, bhíodh, cuireadh, rinneadh, rinne
- X: dhein, chuireas, deineadh
- Pres
- AUX-Cop: is, gur, ní, nach, gurb, an, ar, sea
- VERB: tá, atá, bhfuil, níl, deir, bhaineann, dar, adeir, chuireann, cuirtear
- X: Bhraitheas
- Auto
- VERB: cuireadh, rinneadh, cuirtear, deonadh, dhéantar, dtagraítear, déanfar, déantar, faightear, shonraítear
- X: deineadh
Pronouns, Determiners, Quantifiers
- Art
- ADP: sa, den, san, ón, don, faoin, sna, fén, ins
- AUX-Cop: Cén
- DET: an, na, a, a', 'n
- Dem
- AUX-Cop: Seo, Sin
- DET: seo, sin, eile, úd, s', siúd
- PRON: sin, seo, siúd, shin, san, in, iúd, súd
- X: san, so
- Emp
- ADP: againne, agamsa, agatsa, domsa, leatsa, liomsa, airsean, dósan, leosan, tríothusan
- PRON: mise, seisean, eisean, ise, siadsan, sinne, iadsan, tusa
- Ind
- DET: aon, cibé, uile, haon, n-uile
- PRON: pé, Cibé, ceachtar, cheachtar
- Int
- ADV: conas, cá
- DET: cad, cén
- PRON: cad, cé, cén, céard
- Prs
- ADP: á, dhá
- Rel
- AUX-Cop: ba, nach, is, ab, nár, nárbh
- PART: a, ina, nach, inar, ar, lena, n-a, nár, faoina, lenar
- VERB: atá, leanas, eireos, atáid, atáimse, bhíos, chaoinfeas, chuireas, fhéadas, rachas
- X: ná
- Card
- NOUN: céad
- NUM: dhá, trí, céad, seacht, aon, ceithre, fiche, sé, dá, cúig
- Ord
- NUM: chéad, dara, 10ú, gcéad, 11ú, 17ú, 18ú, 3ú, cheathrú, dtríú
- Yes
- ADP: ina, á, dá, lena, faoina, arna, óna, dhá, lenár, dár
- DET: a, mo, do, m', ár, d'
- Yes
- NOUN: fhéin
- PRON: féin
- 1
- ADP: liom, agam, againn, dúinn, orm, dom, linn, chugainn, chugam, a'm
- DET: mo, m', ár
- PRON: mé, muid, mise, sinn, sinne
- VERB: bhíos, rabhas, Táimid, glacaim, leanas, mbeinn, mbínn, Bhíomar, Casaimid, Chaitheamar
- X: Bhraitheas, chuireas, domhsa
- 2
- ADP: leat, agat, duit, ort, agatsa, leatsa, uait, agaibh, asat, oraibh
- DET: do, d'
- PRON: tú, sibh, thusa, thú, tusa
- VERB: féach, déan, cuir, Tabhair, faigh, éist, Amharcaigí, BUAIL, Cheapfá, Lig
- X: dein
- 3
- ADP: ann, ina, air, leis, acu, á, dá, aige, orthu, dó
- AUX-Cop: Sé
- DET: a
- PRON: sé, é, sí, iad, siad, í, ea, seisean, eisean, ise
- VERB: bíodh, bhíodar, Chuiridís, Sheoladar, atáid, bheidís, bogadh, chríochnaíodar, chuadar, chuireadar
- X: fá
Other Features
- Abbr
- Yes
- PROPN: UNDDSMS
- SYM: post@clubsult.com
- X: Co., Uimh., A, FÁS, I.R., IO, J., P., RTÉ, SEIF
- Yes
- Dialect
- Connaught
- X: Caidé
- Munster
- VERB: dhein
- X: san, so, age, dein, des, dhein, fachta, Bhraitheas, chuireas, deineadh
- Ulster
- X: chan, fá, cha, Caidé, achan, domhsa
- X-Cop: Caidé
- Connaught
- Foreign
- Yes
- ADJ: necessary, other, white
- ADJ-Part: white
- ADP: by
- NOUN: Office, Regeneration, tasks, cheap
- PROPN: major
- VERB: deemed
- X: -e, Comptroller-General, Cyrano, Forget, I, Love, May, September, The, TrueType
- Yes
- Form
- Ecl
- ADJ: gcéanna
- AUX-Cop: mba
- DET: ngach, gach
- NOUN: gcuid, gcás, gceist, ndóigh, bhfeidhm, gcónaí, gcomhairle, dtús, mbealach, ndáil
- NOUN-Inf: bhfeidhmiú, bhfíorú, gcailliúint, gcomhlíonadh, gcraitheadh, gcraoladh, gcur, ndéanamh, ngabháil, ngoradh
- NOUN-Vnoun: titim
- NUM: gcéad, dtríú, gceithre
- PART: n-a
- PROPN: mBaile, nGaeilge, Mumhan, gCill, nGaillimh, Mullen, Mí, Sergeant, gCambridge, gCuan
- VERB: raibh, bhfuil, mbeidh, mbeadh, mbíonn, mbaineann, bhfaca, dtiocfadh, ndeachaigh, bhfuair
- Emp
- NOUN: liostasa
- VERB: Creidimidne, atáimse, deirimse, gcaithfeadsa, nílirse
- X: domhsa
- HPref
- ADJ: háirithe, hamháin, hiontach, han-luath, hiomlán, hálainn, héifeachtach, héifeachtúil
- DET: haon
- NOUN: háit, hordú, haigne, heagraíochtaí, hOllscoile, haghaidh, hainm, hainmhithe, halt, ham
- NOUN-Inf: himeacht, hithe, hordú, húsáid
- NUM: haon
- PROPN: h-Íde, hÉirinn, mí
- Len
- ADJ: mhaith, cheart, chóir, Bhriotáineach, céanna, chéanna, mhór, chomhionann, chultúrtha, chuí
- ADJ-Part: Bhunaithe
- ADP: dhaoibh, dhom, dhíobh, dhúinn
- ADV: Thuaidh
- NOUN: bheith, chur, dhéanamh, chuid, chéile, thabhairt, fho-alt, bhaint, fhios, chineál
- NOUN-Inf: bheith, chur, dhéanamh, thabhairt, bhaint, chaitheamh, fháil, dhul, sheoladh, choinneáil
- NUM: chéad, dhó, cheathrú, thrí, cheithre, sheacht, sheasca, thríú
- PRON: cheachtar, thusa, thú
- PROPN: Bhaile, Átha, Mháire, Phádraig, Bhreandán, Chill, Bhéal, Cholm, Chonamara, Ghaeilge
- VERB: bhí, bheidh, raibh, thug, tháinig, bheadh, chuir, bhíonn, bhíodh, bhaineann
- X: dhein, chuireas
- VF
- AUX-Cop: b', gurb, gurbh, ab, níorbh, mb', nárbh
- Ecl
- NounType
- NotSlender
- ADJ: beaga, difriúla, príomhúla, seachtracha, aisteacha, bríomhara, crúcacha, deasa, deonacha, eachtracha
- Slender
- ADJ: éagsúla, aisteacha, bhradacha, chuí, cháilitheacha, dheonacha, ghlasa, mhéadracha, orgánacha, phríomhúla
- Strong
- ADJ: beaga, éagsúla, Ultacha, bána, deonacha, difriúla, inmheánacha, maithe, móra, núicléacha
- NOUN: daoine, oibreacha, ban, n-oibreacha, ndaoine, ceoltóirí, gCailíní, nduillí, orduithe, rudaí
- PROPN: Fíoncheannaithe
- Weak
- ADJ: orgánach, plaisteach, saor
- NOUN: Fiontar, súl, Náisiún, bhflaitheas, breiseán, fear, mballstát, Bhéal, Ealaíon, Foras
- PROPN: Eorpach, Éireannach
- NotSlender
- PartType
- Ad
- PART: go, le
- Cmpl
- PART: go, nach, nár, ná
- Comp
- AUX-Cop: ba
- NOUN: níos
- PART: ní
- Cop
- PART: a
- Deg
- ADP: dá
- PART: a
- Inf
- PART: a, do, trasna, a'
- Num
- PART: a
- Pat
- PART: Ó, de, Mac, Uí, Nic, Ní, O', Mc, Mhic, O
- Vb
- PART: a, d', ní, gur, do, nach, níor, ar, an, ná
- X: chan, cha, ná
- Voc
- ADJ: Eachtraigh, fáinneach, ghil, ghrinn, uaisle
- PART: a
- Ad
- PrepForm
- Cmpd
- ADP: i, go, ar, de, tar, in, le, os, thar, faoi
- Cmpd
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: is, má.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (20)
- VERB--NOUN-Dat (2)
- VERB--NOUN-Gen (11)
- VERB--NOUN-NomAcc (606)
- VERB--NOUN-NomAcc-ADP(le) (1)
- VERB--PRON (415)
- obj
- VERB--NOUN (4)
- VERB--NOUN-Gen (10)
- VERB--NOUN-NomAcc (332)
- VERB--NOUN-NomAcc-ADP(as) (1)
- VERB--NOUN-NomAcc-ADP(le) (1)
- VERB--PRON (73)
Relations Overview
- This corpus uses 11 relation subtypes: acl:relcl, case:voc, compound:prt, csubj:cleft, csubj:cop, flat:name, mark:prt, nmod:poss, obl:prep, obl:tmod, xcomp:pred
- The following 2 main types are not used alone, they are always subtyped: acl, csubj
- The following 9 relation types are not used in this corpus at all: iobj, expl, dislocated, aux, clf, orphan, goeswith, reparandum, dep