UD English EWT
Language: English (code: en
)
Family: Indo-European, Germanic
This treebank has been part of Universal Dependencies since the UD v1.0 release.
The following people have contributed to making this treebank part of UD: Natalia Silveira, Timothy Dozat, Christopher Manning, Sebastian Schuster, John Bauer, Miriam Connor, Marie-Catherine de Marneffe, Nathan Schneider, Sam Bowman, Hanzhi Zhu, Daniel Galbraith.
Repository: UD_English-EWT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2
License: CC BY-SA 4.0
Genre: blog, social, reviews, email
Questions, comments?
General annotation questions (either English-specific or cross-linguistic) can be raised in the main UD issue tracker.
You can report bugs in this treebank in the treebank-specific issue tracker on Github.
If you want to collaborate, please contact [syntacticdependencies (æt) lists • stanford • edu].
Development of the treebank happens in the UD repository but not directly in the final CoNLL-U files.
You may submit bug fixes as pull requests against the dev branch but you have to go to the folder called not-to-release
and locate the source files there.
Contact the treebank maintainers if in doubt.
Annotation | Source |
---|---|
Lemmas | assigned by a program, with some manual corrections, but not a full manual verification |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | annotated manually |
Features | assigned by a program, not checked manually |
Relations | annotated manually, natively in UD style |
Description
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
The corpus comprises 254,830 words and 16,622 sentences, taken from five genres of web media: weblogs, newsgroups, emails, reviews, and Yahoo! answers. See the LDC2012T13 documentation for more details on the sources of the sentences. The trees were automatically converted into Stanford Dependencies and then hand-corrected to Universal Dependencies. All the basic dependency annotations have been single-annotated, a limited portion of them have been double-annotated, and subsequent correction has been done to improve consistency. Other aspects of the treebank, such as Universal POS, features and enhanced dependencies, has mainly been done automatically, with very limited hand-correction.
Acknowledgments
Annotation of the Universal Dependencies English Web Treebank was carried out by (in order of size of contribution):
- Natalia Silveira
- Timothy Dozat
- Sebastian Schuster
- Miriam Connor
- Marie-Catherine de Marneffe
- Nathan Schneider
- Samuel Bowman
- Hanzhi Zhu
- Daniel Galbraith
- Christopher Manning
- John Bauer
Creation of the CoNLL-U files, including calculating UPOS, feature, and lemma information was primarily done by
- Sebastian Schuster
- Natalia Silveira
The construction of the Universal Dependencies English Web Treebank was partially funded by a gift from Google, Inc., which we gratefully acknowledge.
Statistics of UD English EWT
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X
Features
Abbr – Case – Definite – Degree – Foreign – Gender – Mood – Number – NumType – Person – Poss – PronType – Reflex – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – cc:preconj – ccomp – compound – compound:prt – conj – cop – csubj – csubj:pass – dep – det – det:predet – discourse – dislocated – expl – fixed – flat – flat:foreign – goeswith – iobj – list – mark – nmod – nmod:npmod – nmod:poss – nmod:tmod – nsubj – nsubj:pass – nummod – obj – obl – obl:npmod – obl:tmod – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 16622 sentences and 254829 tokens.
- This corpus contains 34370 tokens (13%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 924 types of words that contain both letters and punctuation. Examples: 's, n't, 'm, 'll, 've, 're, 'd, Dr., e-mail, Mr., ’s, U.S., st., Inc., etc., Sept., vs., W., .doc, carol.st.clair@enron.com, it's, 01-Feb-02, n’t, Dec., Ft., Oct., alt.animals.cat, p&l, :D, Corp., Ms., No., Non-Bondad, PG&E, S., Yahoo!, i.e., A., Analysis_0712, D.C., E., ENRON.XLS, MEH-risk, Sha'lan, b/c, co., ekrapels@esaibos.com, enrongss.xls, p.m., 80's
Morphology
Tags
- This corpus uses 17 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
- This corpus contains 21 word types tagged as particles (PART): ', 's, -s, 2, `s, a, n, n't, na, not, nt, n’t, ot, s, t, ta, the, to, too, ’, ’s
- This corpus contains 81 lemmas tagged as pronouns (PRON): 'em, 's, anybody, anyone, anything, em, ever, everybody, everyone, everything, he, hers, herself, himself, i, is, it, it's, its, itself, mine, mines, my, myself, nobody, nothing, one, our, ours, ourselves, s, self, she, somebody, someone, something, that, the, thei, their, theirs, themselves, then, there, these, they, they're, theyy, thier, this, those, thou, thy, ti, u, ur, us, use, waht, we, what, whatever, which, who, who's, whoever, whom, whomever, whoooooo, whose, wtf, ya, ya'll, ye, yo, you, you're, your, yours, yourself, yourselves
- This corpus contains 23 lemmas tagged as determiners (DET): a, all, another, any, both, each, either, every, half, many, neither, no, quite, some, such, that, the, these, this, those, what, whatever, which
- Out of the above, 8 lemmas occurred sometimes as PRON and sometimes as DET: that, the, these, this, those, what, whatever, which
- This corpus contains 14 lemmas tagged as auxiliaries (AUX): be, can, could, do, get, have, may, might, must, ought, shall, should, will, would
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: be, can, do, get, have
- There are 4 (de)verbal forms:
- Fin
- AUX: is, will, can, would, was, are, could, do, should, have
- VERB: have, is, had, said, has, are, want, need, let, think
- Ger
- AUX: being, having, getting
- VERB: going, getting, looking, following, including, taking, having, using, doing, regarding
- Inf
- AUX: be, have, do, get, of, am, 've, are, b
- VERB: have, get, know, do, go, make, take, see, like, find
- Part
- AUX: been
- VERB: going, had, attached, done, made, used, based, called, doing, looking
Nominal Features
- Fem
- PRON: she, her, herself
- Masc
- PRON: he, his, him, himself
- Neut
- PRON: it, its, itself
- Plur
- DET: these, those
- NOUN: people, years, days, things, questions, times, months, guys, friends, places
- PRON: they, we, their, our, them, us, those, these, themselves, 's
- PROPN: americans, Beatles, Iraqis, Palestinians, Islands, Tigers, Shiites, Seas, Muslims, Christians
- VERB: associates, rays
- Sing
- ADJ: Global, Pakistani, criminal, female, middle
- ADV: best
- AUX-Fin: is, was, has, 's, am, does, s, ’s, `s, ai
- DET: this, that
- INTJ: appetit
- NOUN: time, service, place, thanks, food, way, year, day, number, pm
- NUM: 9/11
- PRON: i, it, my, he, me, this, his, that, him, she
- PROPN: Bush, US, al, Iraq, enron, united, Iran, New, China, states
- SYM: %, 1%P701!.doc
- VERB: is, has, was, says, 's, makes, seems, needs, looks, comes
- VERB-Fin: is, has, was, says, 's, makes, seems, needs, looks, comes
- X: URSULA
- Acc
- PRON: me, it, you, them, him, us, her, yourself, myself, themselves
- Nom
- PRON: i, you, it, they, we, he, she
- Def
- DET: the
- PRON: The
- Ind
- DET: a, an
Degree and Polarity
- Cmp
- ADJ: more, better, less, larger, bigger, earlier, older, smaller, higher, worse
- ADV: later, better, longer, less, earlier, sooner, further, closer, higher, faster
- Pos
- ADJ: good, great, other, new, many, last, same, few, sure, little
- ADV: well, far, soon, long, hard, early, late, close, little, high
- INTJ: Bon
- NOUN: equivalant
- PROPN: Central, Modern, english
- Sup
- ADJ: best, most, worst, cheapest, largest, latest, easiest, highest, oldest, biggest
- ADV: least, best, worst, highest, longest
Verbal Features
- Imp
- AUX-Fin: do, be, get
- VERB-Fin: let, go, see, take, try, get, make, give, call, put
- Ind
- AUX-Fin: is, was, are, do, have, has, were, 's, am, 'm
- VERB-Fin: have, is, had, said, has, are, want, need, know, think
- Past
- AUX-Fin: was, were, did, had, got, 'd, wase
- AUX-Part: been
- VERB-Fin: had, said, was, got, took, came, went, did, told, called
- VERB-Part: had, attached, done, made, used, based, called, given, seen, sent
- Pres
- AUX-Fin: is, are, do, have, has, 's, am, 'm, does, 've
- VERB-Fin: have, is, has, are, want, need, know, think, thank, get
- VERB-Part: going, doing, looking, working, trying, getting, having, coming, making, planning
- Pass
- VERB-Part: attached, made, used, told, done, sent, called, born, appreciated, given
Pronouns, Determiners, Quantifiers
- Art
- DET: the, a, an
- PRON: The
- Dem
- ADV: there, then, here, that
- DET: this, that, these, those
- PRON: this, that, those, these
- Int
- ADV: when, how, why, where, whenever, ever, wherever, however, were, y
- DET: what, which, whatever
- PRON: what, which, who, whom, whatever, whose, who's, Wtf, ever, waht
- Prs
- PRON: i, you, it, they, my, we, he, your, me, their
- Rel
- ADV: where, that, when, why, how, were, wherein
- DET: what, whhich
- PRON: that, who, which, whom, what, whose
- Card
- NUM: one, two, 2, 3, 5, 1, 10, 4, three, 20
- Mult
- ADV: once, twice
- Ord
- ADJ: first, second, third, 5th, fourth, 19th, 2nd, 1st, 20th, 21st
- ADV: first
- Yes
- PRON: my, your, their, his, our, its, her, whose, out, ur
- Yes
- PRON: yourself, myself, themselves, itself, himself, ourselves, herself, yourselves
- 1
- AUX-Fin: am, was
- PRON: i, my, we, me, our, us, myself, 's, ourselves, s
- VERB-Fin: was, am
- 2
- PRON: you, your, yourself, ur, yourselves
- 3
- AUX-Fin: is, was, has, 's, does, s, ’s, `s, ai, gets
- PRON: it, they, he, their, his, them, him, she, her, its
- VERB-Fin: is, has, was, says, 's, makes, seems, needs, looks, comes
Other Features
- Abbr
- Yes
- ADP: o, thru, w, ta, vs, f, b/c, w/, 2, 4
- ADV: aka
- AUX-Fin: shal, wud
- AUX-Inf: b
- CCONJ: n, 'n, VS
- DET: da, dat, sm
- INTJ: wel
- NOUN: luv, b, c, r., syd, yrs
- PART: na, ta, 2, a
- PRON: ur, any1, wht
- SCONJ: b/c, 4, bc, cos, coz, cus, ig, w/out
- VERB-Fin: wan, ar, hav
- VERB-Ger: playin
- VERB-Inf: hav, wan
- VERB-Part: gon
- Yes
- Foreign
- Yes
- X: la, a, de, del, guerre, hoc, non, Acedraz, Déjà, Hochrenaissance
- Yes
- Typo
- Yes
- ADJ: accomdating, hid, particlular, wierd
- ADP: then, a, and, of, aboout, abou, admidst, aground, amoung, becuse
- ADV: to, a, it, that, their
- AUX-Fin: woud, a, ar, as, cold, hav, hvae, made, most, my
- CCONJ: an, adn, a, ad=nd, amd, ans, at, of
- DET: and, teh, aa, dthat, he, ssome, te, then, ther, whhich
- NOUN: catagory, Unlce, appartment, begiinning, cge, eneedle, hlep, ocnversation, oone, peiod
- PART: too, ot
- PRON: out, you
- PROPN: Thanksgiv8ing
- SCONJ: becuse, wether, I'd, Seince, Whie, altough, ask, beacuse, becouse, then
- VERB-Fin: fixeded, preceded, reffered
- VERB-Ger: drive
- VERB-Inf: accomodate, bare, critisize, endevour, hlep, reccommend
- VERB-Part: botn, excepted
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: be.
- This corpus uses 16 lemmas as auxiliaries (aux). Examples: have, be, will, do, can, would, could, should, may, might, must, shall, get, better, se, to.
- This corpus uses 5 lemmas as passive auxiliaries (aux:pass). Examples: be, get, become, have, would.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (1998)
- VERB-Fin--PRON (794)
- VERB-Fin--PRON-Acc (3)
- VERB-Fin--PRON-Nom (4746)
- VERB-Ger--NOUN (85)
- VERB-Ger--PRON (23)
- VERB-Ger--PRON-Acc (7)
- VERB-Ger--PRON-Nom (186)
- VERB-Inf--NOUN (658)
- VERB-Inf--PRON (293)
- VERB-Inf--PRON-Acc (37)
- VERB-Inf--PRON-Nom (2611)
- VERB-Part--NOUN (458)
- VERB-Part--PRON (144)
- VERB-Part--PRON-Acc (4)
- VERB-Part--PRON-Nom (1266)
- obj
- VERB-Fin--NOUN (3499)
- VERB-Fin--NOUN-ADP('s) (1)
- VERB-Fin--PRON (340)
- VERB-Fin--PRON-Acc (870)
- VERB-Fin--PRON-Nom (63)
- VERB-Ger--NOUN (1042)
- VERB-Ger--PRON (47)
- VERB-Ger--PRON-Acc (118)
- VERB-Ger--PRON-Nom (10)
- VERB-Inf--NOUN (3197)
- VERB-Inf--NOUN-ADP('s) (1)
- VERB-Inf--NOUN-ADP(of) (1)
- VERB-Inf--PRON (326)
- VERB-Inf--PRON-Acc (789)
- VERB-Inf--PRON-Nom (88)
- VERB-Part--NOUN (770)
- VERB-Part--PRON (123)
- VERB-Part--PRON-Acc (92)
- VERB-Part--PRON-Nom (4)
- iobj
- VERB-Fin--NOUN (17)
- VERB-Fin--PRON (4)
- VERB-Fin--PRON-Acc (173)
- VERB-Fin--PRON-Nom (5)
- VERB-Ger--NOUN (9)
- VERB-Ger--PRON-Acc (24)
- VERB-Inf--NOUN (16)
- VERB-Inf--PRON (2)
- VERB-Inf--PRON-Acc (161)
- VERB-Inf--PRON-Nom (6)
- VERB-Part--NOUN (5)
- VERB-Part--PRON-Acc (14)
- VERB-Part--PRON-Nom (1)
Verbs with Reflexive Core Objects
- This corpus contains 58 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: find yourself, save yourself, ask yourself, blow himself, burn itself, consider themselves, describe themselves, do yourself, feel yourself, give yourself, protect ourselves, work themselves, absent himself, absent yourself, adapt itself, ally itself, avail myself, blow herself, bunker themselves, call himself, cloak himself, commit ourselves, compose himself, contradict themselves, embarrass himself, enjoy myself, enjoy yourself, explode himself, explode yourself, find himself, find themselves, get myself, hurt themselves, imagine yourself, introduce herself, introduce myself, keep himself, keep myself, kill themselves, land herself, leave yourself, make yourself, manifest itself, misrepresent themselves, organize themselves, picture yourself, present yourself, pride themselves, prove himself, put yourself
- Out of those, 1 lemmas occurred more than once, but never without a reflexive dependent. Examples: absent
Relations Overview
- This corpus uses 13 relation subtypes: acl:relcl, aux:pass, cc:preconj, compound:prt, csubj:pass, det:predet, flat:foreign, nmod:npmod, nmod:poss, nmod:tmod, nsubj:pass, obl:npmod, obl:tmod
- The following 1 relation types are not used in this corpus at all: clf