Tokenization and Word Segmentation
Instruction: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.
Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.
Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.
- Subjects have the following characteristics:
- Word order:
- Case marking:
- Passivisation:
- Control:
- Relativisation:
- Objects have the following characteristics:
- Word order:
- Case marking:
- Passivisation:
- Control:
- Relativisation:
- The following subtypes are used in Erzya:
- nmod:poss for
- aux:neg for the negative auxiliary verb
- nsubj:exist for
- nsubj:cop for
- acl:relcl for
- acl:conv for
- advmod:comp for
- obl:tmod for
- cop:exist for
- xcomp:ds for
- obl:agent for
- flat:name for
- compound:appos for
- aux:q for
- nmod:gsubj for
- nmod:gobj for
- obl:exist for
- cop:negexist for
- compound:svc for
- nmod:comp for
- advmod:tmod for
- nmod:own for
- nmod:bahuv for
- cc:preconj for
- cop:neg for
- cop:own for
- compound:redup for
- aux:opt for
- advcl:tcl for
- advcl:conv for
- nummod:equ for
- nsubj:coploc for
- nsubj:copbelong for
- nmod:appos for
- csubj:cop for
- cop:existneg for
- cop:belong for
- conj:obj for
- compound:equ for
- compound:coll for
- aux:negexist for
- advmod:qnt for