UD for Welsh
Tokenization and Word Segmentation
- In general Welsh is tokenized as English. Upper case is used for the first word of a sentence, proper names, months and weekdays.
- A notable difference is the apostrophe, which is part of the following word. Examples
- Mae o’n dod («He is coming»): Mae, o, ‘n, dod
- i’m weld («to see me»): i, ‘m, weld
- â’u ffrindiau («with their friends»): â, ‘u, ffrindiau
- other shortened forms: ‘w, ‘i, ‘th
- In same cases the apostrophe goes with the preceding word
- … yw f’enw i («… is my name»): yw, f’, enw, i
Morphology
Tags
- PROPN: the XPOS distinguished between person, palce, organisation (and propn for all other types)
- DET is used for the definite article (XPOS: art)
- PRON: five subclasses (XPOS):
- dem: demonstrative pronouns (hwn, hon, etc.)
- refl: reflexive pronouns (hun, hunan, etc.)
- rel: relative pronoun (a)
- indep: independent pronouns (used in subject position)
- dep: dependent pronouns (used in object position and as possessives, e.g. fy nhŷ «my house», fy ngweld «[to] see me»)
- pron: interrogatives and others (beth, neb, pwy, rhai, sawl)
- AUX is used in three cases:
- for the auxiliary verb bod, if inflected and in copula position)
- for TAM markers (yn (XPOS: impf), wedi (perf), newydd (perf), heb (perf), ar (post)
- for preverbals (y, a, mi, fe)
- VERB is used for all verbs, including bod if it is the main verb (followed by a verbnoun). Verbnouns however are marked as NOUN (with XPOS verbnoun) since they function syntactically as nouns (the direct object is in a genetive construction, the subject is marked with a preposition)
- ADP: inflected prepostions are marked with the XPOS cprep, other preposition have the XPOS prep)
- PART is only used for the predicative marker yn (which triggers soft mutation on the following word, in difference to the TAM marker yn with does not trigger any mutation and the preposition yn which triggers nasal mutation). The predicative yn is used before nouns and adjectives in head position Mae Siôn yn athro «Siôn is a teacher», Roedd Nia yn gyflym «Nia was fast»
Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.
Features
- Additional features exist to indicate the initial mutation
- Mutation=AM,NM,SM for aspirated, nasal or soft mutation
- Person=Impers for the impersonal form of verbs: cyhoeddwyd y llyfr y llinedd «One has published the book last year» (cf. French «on a publié le livre l’an dernier» or German «man hat das Buch letztes Jahr veröffentlicht»). Usually the impersonal forms are translated by passive forms in English, French or German.
- Polarity=Aff,Neg for the negative forms of the auxiliary bod
- Tense=Cond,Future,Imperative,Imperfect,Pluperfect,Present,Subj
Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.
Syntax
- Verbnouns function as nouns in Welsh: The direct object is in a genetive case (like possessives for other nouns), subjects (unless linked indirectly via a xcomp relation, are attached using a prepositional phrase. However, currently, we still use nsubj, obj , obl, csubj, ccomp and xcomp for dependents of verbnouns, in opposition to nmod etc. for nouns.
- Welsh specific dependency relation
- case:pred only to attach the predicative yn (PART) to its head noun or ajdective
- Other relations with :
- acl:relcl
- flat:name
- obl:agent (agents for impersonal verb forms)
- The following multi-word expressions use the fixed dependency relation
- o hyd «always«
- ar hyd «along»
- hyd at «as far as»
- hyd yn oed «even»
- dim ond «only»
- i mewn «into»
- o fewn «within»
- ynglŷn â «in connection with»
- oddi ar «since»
- oddi yma «from here»
- oddi wrth «from»
- oddi mewn «within»
Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.
Treebanks
There is one Welsh UD treebank:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.