home edit page issue tracker

This page pertains to UD version 2.

UD English EWT

Language: English (code: en)
Family: Indo-European, Germanic

This treebank has been part of Universal Dependencies since the UD v1.0 release.

The following people have contributed to making this treebank part of UD: Natalia Silveira, Timothy Dozat, Christopher Manning, Sebastian Schuster, John Bauer, Miriam Connor, Marie-Catherine de Marneffe, Nathan Schneider, Sam Bowman, Hanzhi Zhu, Daniel Galbraith.

Repository: UD_English-EWT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-SA 4.0

Genre: blog, social, reviews, email

Questions, comments? General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [syntacticdependencies (æt) lists • stanford • edu]. Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files. You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release and locate the source files there. Contact the treebank maintainers if in doubt.

Annotation Source
Lemmas assigned by a program, with some manual corrections, but not a full manual verification
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS annotated manually
Features assigned by a program, not checked manually
Relations annotated manually, natively in UD style

Description

A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).

The corpus comprises 254,830 words and 16,622 sentences, taken from five genres of web media: weblogs, newsgroups, emails, reviews, and Yahoo! answers. See the LDC2012T13 documentation for more details on the sources of the sentences. The trees were automatically converted into Stanford Dependencies and then hand-corrected to Universal Dependencies. All the basic dependency annotations have been single-annotated, a limited portion of them have been double-annotated, and subsequent correction has been done to improve consistency. Other aspects of the treebank, such as Universal POS, features and enhanced dependencies, has mainly been done automatically, with very limited hand-correction.

Acknowledgments

Annotation of the Universal Dependencies English Web Treebank was carried out by (in order of size of contribution):

Creation of the CoNLL-U files, including calculating UPOS, feature, and lemma information was primarily done by

The construction of the Universal Dependencies English Web Treebank was partially funded by a gift from Google, Inc., which we gratefully acknowledge.

Statistics of UD English EWT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrCaseDefiniteDegreeForeignGenderMoodNumberNumTypePersonPossPronTypeReflexTenseTypoVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcasecccc:preconjccompcompoundcompound:prtconjcopcsubjcsubj:passdepdetdet:predetdiscoursedislocatedexplfixedflatflat:foreigngoeswithiobjlistmarknmodnmod:npmodnmod:possnmod:tmodnsubjnsubj:passnummodobjoblobl:npmodobl:tmodorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview