home edit page issue tracker

This page pertains to UD version 2.

Universal Dependencies

Universal Dependencies (UD) is a framework for cross-linguistically consistent grammatical annotation and an open community effort with over 200 contributors producing more than 100 treebanks in over 70 languages.

If you want to receive news about Universal Dependencies, you can subscribe to the UD mailing list. If you want to discuss individual annotation questions, use the Github issue tracker.

Current UD Languages

Information about language families (and genera for families with multiple branches) is mostly taken from WALS Online (IE = Indo-European).

Afrikaans 1 49K IE, Germanic

Afrikaans treebanks

AfriBooms 49K
UD Afrikaans-AfriBooms is a conversion of the AfriBooms Dependency Treebank, originally annotated with a simplified PoS set and dependency relations according to a subset of the Stanford tag set. The corpus consists of public government documents.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Akkadian 1 1K Afro-Asiatic, Semitic

Akkadian treebanks

PISANDUB 1K
A small set of sentences from Babylonian royal inscriptions.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Amharic 1 10K Afro-Asiatic, Semitic

Amharic treebanks

ATT 10K
UD_Amharic-ATT is a manual developed Treebanks for Amharic. Sentences were collected from grammar books, fictions, biographies, religious texts and news.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Ancient Greek 2 417K IE, Greek

Ancient Greek treebanks

PROIEL 214K
UD_Ancient_Greek-PROIEL is converted from the Ancient Greek data in the PROIEL treebank, and consists of the New Testament plus selections from Herodotus.

 

Perseus 202K
This Universal Dependencies Ancient Greek Treebank consists of an automatic conversion of a selection of passages from the Ancient Greek and Latin Dependency Treebank 2.1

 

See here for comparative statistics of Ancient Greek treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Arabic 3 1,042K Afro-Asiatic, Semitic

Arabic treebanks

PADT 282K
The Arabic-PADT UD treebank is based on the [Prague Arabic Dependency Treebank](http://ufal.mff.cuni.cz/padt/) (PADT), created at the Charles University in Prague.

 

PUD 20K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Luma Ateyah, Martin Popel, Daniel Zeman, Nizar Habash, Dima Taji
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

NYUAD 738K
The NYUAD Arabic UD treebank is based on the Penn Arabic Treebank (PATB), parts 1, 2, and 3, through conversion to CATiB dependency trees.

 

See here for comparative statistics of Arabic treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Armenian 1 22K IE, Armenian

Armenian treebanks

ArmTDP 22K
The ArmTDP Eastern Armenian UD treebank is based on the ՀայՇտեմ-ArmTDP-East dataset (2.0), created by the ArmTDP team led by Marat M. Yavrumyan at the Yerevan State University.

 

Language documentation

See the language documentation page.
Bambara 1 13K Mande

Bambara treebanks

CRB 13K
The UD Bambara treebank is a section of the Corpus Référence du Bambara annotated natively with Universal Dependencies.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Basque 1 121K Basque

Basque treebanks

BDT 121K
The Basque UD treebank is based on a automatic conversion from part of the Basque Dependency Treebank (BDT), created at the University of of the Basque Country by the IXA NLP research group. The treebank consists of 8.993 sentences (121.443 tokens) and covers mainly literary and journalistic texts.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Belarusian 1 8K IE, Slavic

Belarusian treebanks

HSE 8K
The Belarusian UD treebank is based on a sample of the news texts included in the Belarusian-Russian parallel subcorpus of the Russian National Corpus, online search available at: http://ruscorpora.ru/search-para-be.html.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Breton 1 10K IE, Celtic

Breton treebanks

KEB 10K
UD Breton-KEB is a treebank of Breton that has been manually annotated according to the Universal Dependencies guidelines. The tokenisation guidelines and morphological annotation comes from a finite-state morphological analyser of Breton released as part of the [Apertium project](http://www.apertium.org).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Bulgarian 1 156K IE, Slavic

Bulgarian treebanks

BTB 156K
UD_Bulgarian-BTB is based on the HPSG-based BulTreeBank, created at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. The original consists of 215,000 tokens (over 15,000 sentences). All the texts were processed automatically at tokenization, morphological and chunk level. Then, the full syntactic analysis were perfomed manually by trained annotators.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Buryat 1 10K Mongolic

Buryat treebanks

BDT 10K
The UD Buryat treebank was annotated manually natively in UD and contains grammar book sentences, along with news and some fiction.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Cantonese 1 6K Sino-Tibetan

Cantonese treebanks

HK 6K
The Cantonese-HK UD treebank was manually annotated by Tak-sum Wong and Herman H. M. Leung at City University of Hong Kong, by finely transcribing three films shooted by students from the School of Creative Media. The data are in Tradiaitonal Chinese. These trees form a parallel treebank with those in Chinese-HK.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Catalan 1 531K IE, Romance

Catalan treebanks

AnCora 531K
Catalan data from the AnCora corpus.

 

Language documentation

See the language documentation page.
Chinese 4 160K Sino-Tibetan

Chinese treebanks

GSD 123K
Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.

 

PUD 21K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Josie Li, Cheuk Ying Li, Martin Popel, Daniel Zeman, Herman Leung
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

HK 8K
A treebank manually annotated at the City University of Hong Kong. It contains subtitles of three films shot by students from the School of Creative Media as well as the official record of proceedings of the Legislative Council of Hong Kong. Traditional Chinese characters. This treebank is parallel with UD_Cantonese-HK.

 

CFL 7K
The Chinese-CFL UD treebank is manually annotated by Keying Li with minor manual revisions by Herman Leung and John Lee at City University of Hong Kong, based on essays written by learners of Mandarin Chinese as a foreign language. The data is in Simplified Chinese.

 

See here for comparative statistics of Chinese treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Classical Chinese 1 34K Sino-Tibetan

Classical Chinese treebanks

Mencius 34K ?
Classical Chinese Universal Dependencies Treebank annotated and converted by Institute for Research in Humanities, Kyoto University.
  • Contributors: Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Yuan Li, Hiroyuki Shirasu, Kazunori Fujita
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Coptic 1 22K Afro-Asiatic, Egyptian

Coptic treebanks

Scriptorium 22K
UD Coptic contains manually annotated Sahidic Coptic texts, currently from the Gospel of Mark, Shenoute of Atripe's "Not Because a Fox Barks", the Letters of Besa, and several short stories from the Apophthegmata Patrum.

 

Language documentation

See the language documentation page.
Croatian 1 197K IE, Slavic

Croatian treebanks

SET 197K
The Croatian UD treebank is based on the SETimes-HR corpus.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Czech 5 2,222K IE, Slavic

Czech treebanks

PDT 1,506K
The Czech-PDT UD treebank is based on the Prague Dependency Treebank 3.0 (PDT), created at the Charles University in Prague.

 

CAC 494K
The UD_Czech-CAC treebank is based on the Czech Academic Corpus 2.0 (CAC; Český akademický korpus; ČAK), created at Charles University in Prague.

 

FicTree 167K
FicTree is a treebank of Czech fiction, automatically converted into the UD format. The treebank was built at Charles University in Prague.

 

PUD 18K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Václava Kettnerová, Jan Hajič jr., Silvie Cinková, Zdeňka Urešová, Milan Straka, Jan Hajič, Jaroslava Hlaváčová, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

CLTT 35K
The UD_Czech-CLTT treebank is based on the Czech Legal Text Treebank 1.0, created at Charles University in Prague.

 

See here for comparative statistics of Czech treebanks.

Language documentation

See the language documentation page.
Danish 2 100K IE, Germanic

Danish treebanks

DDT 100K
The Danish UD treebank is a conversion of the Danish Dependency Treebank.

 

DTB -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Dutch 2 307K IE, Germanic

Dutch treebanks

Alpino 208K
This corpus consists of samples from various treebanks annotated at the University of Groningen using the Alpino annotation tools and guidelines.

 

LassySmall 98K
This corpus contains sentences from the Wikipedia section of the Lassy Small Treebank. Universal Dependency annotation was generated automatically from the original annotation in Lassy.

 

See here for comparative statistics of Dutch treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
English 6 586K IE, Germanic

English treebanks

ParTUT 49K
UD_English-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles, among others.

 

GUM 80K
Universal Dependencies version of syntax annotations from the GUM corpus (https://corpling.uis.georgetown.edu/gum/)

 

EWT 254K
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13).
  • Contributors: Natalia Silveira, Timothy Dozat, Christopher Manning, Sebastian Schuster, John Bauer, Miriam Connor, Marie-Catherine de Marneffe, Nathan Schneider, Sam Bowman, Hanzhi Zhu, Daniel Galbraith
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

PUD 21K
This is the English portion of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies (http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Jesse Kirchner, Lorenzo Lambertino, Martin Popel, Daniel Zeman, Christopher Manning, Sebastian Schuster, Siva Reddy
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

LinES 82K
UD English_LinES is the English half of the LinES Parallel Treebank with the original dependency annotation first automatically converted into Universal Dependencies and then partially reviewed. Its contents cover literature, an online manual and Europarl data.

 

ESL 97K
UD English-ESL / Treebank of Learner English (TLE) contains manual POS tag and dependency annotations for 5,124 English as a Second Language (ESL) sentences drawn from the Cambridge Learner Corpus First Certificate in English (FCE) dataset.
  • Contributors: Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz, Margarita Misirpashayeva
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of English treebanks.

Language documentation

See the language documentation page.
Erzya 1 15K Uralic, Mordvin

Erzya treebanks

JR 15K
UD Erzya is the original annotation (CoNLL-U) for texts in the Erzya language, it originally consists of a sample from a number of fiction authors writing originals in Erzya.

 

Language documentation

See the language documentation page.
Estonian 1 434K Uralic, Finnic

Estonian treebanks

EDT 434K
UD Estonian is a converted version of the Estonian Dependency Treebank (EDT), originally annotated in the Constraint Grammar (CG) annotation scheme, and consisting of genres of fiction, newspaper texts and scientific texts. The treebank contains 30,723 trees, 434,245 tokens.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Faroese 1 10K IE, Germanic

Faroese treebanks

OFT 10K
This is a treebank of Faroese based on the Faroese Wikipedia.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Finnish 3 377K Uralic, Finnic

Finnish treebanks

FTB 159K
FinnTreeBank 1 consists of manually annotated grammatical examples from VISK. The UD version of FinnTreeBank 1 was converted from a native annotation model with a script.

 

TDT 202K
UD_Finnish-TDT is based on the Turku Dependency Treebank (TDT), a broad-coverage dependency treebank of general Finnish covering numerous genres. The conversion to UD was followed by extensive manual checks and corrections, and the treebank closely adheres to the UD guidelines.

 

PUD 15K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).

 

See here for comparative statistics of Finnish treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
French 8 1,132K IE, Romance

French treebanks

ParTUT 28K
UD_French-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles, among others.

 

GSD 400K
The French UD was converted in 2015 from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). It is updated since 2015 independently from the previous source.
  • Contributors: Marie-Catherine de Marneffe, Bruno Guillaume, Ryan McDonald, Alane Suhr, Joakim Nivre, Matias Grioni, Carly Dickerson, Guy Perrier
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

Sequoia 70K
UD_French-Sequoia is an automatic conversion of the Sequoia Treebank corpus [French Sequoia corpus](http://deep-sequoia.inria.fr).

 

Spoken 34K
A Universal Dependencies corpus for spoken French.

 

PUD 24K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Jana Strnadová, Gauthier Caron, Martin Popel, Daniel Zeman, Marie-Catherine de Marneffe
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

FTB 573K
The Universal Dependency version of the French Treebank (Abeillé et al., 2003), hereafter UD_French-FTB, is a treebank of sentences from the newspaper Le Monde, initially manually annotated with morphological information and phrase-structure and then converted to the Universal Dependencies annotation scheme.
  • Contributors: Marie Candito, Bruno Guillaume, Teresa Lynn, Héctor Martínez Alonso, Benoît Sagot, Djamé Seddah, Eric Villemonte de la Clergerie
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

CrapBank -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

FQB -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

See here for comparative statistics of French treebanks.

Language documentation

See the language documentation page.
Galician 2 164K IE, Romance

Galician treebanks

TreeGal 25K
The Galician-TreeGal is a treebank for Galician developed at LyS Group (Universidade da Coruña).

 

CTG 138K
The Galician UD treebank is based on the automatic parsing of the Galician Technical Corpus (http://sli.uvigo.gal/CTG) created at the University of Vigo by the the TALG NLP research group.

 

See here for comparative statistics of Galician treebanks.

Language documentation

See the language documentation page.
German 3 354K IE, Germanic

German treebanks

GSD 292K
The German UD is converted from the content head version of the [universal dependency treebank v2.0 (legacy)](https://github.com/ryanmcd/uni-dep-tb).

 

PUD 21K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Georg Rehm, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Sebastian Bank, Martin Popel, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

FRAG 40K
Fragments of German aesthetic essays from late 18th century.

 

See here for comparative statistics of German treebanks.

Language documentation

See the language documentation page.
Gothic 1 55K IE, Germanic

Gothic treebanks

PROIEL 55K
The UD Gothic treebank is based on the Gothic data from the PROIEL treebank, and consists of Wulfila's Bible translation.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Greek 1 63K IE, Greek

Greek treebanks

GDT 63K
The Greek UD treebank (UD_Greek-GDT) is derived from the Greek Dependency Treebank (http://gdt.ilsp.gr), a resource developed and maintained by researchers at the Institute for Language and Speech Processing/Athena R.C. (http://www.ilsp.gr).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Hebrew 1 161K Afro-Asiatic, Semitic

Hebrew treebanks

HTB 161K
A Universal Dependencies Corpus for Hebrew.

 

Language documentation

See the language documentation page.
Hindi 2 375K IE, Indic

Hindi treebanks

HDTB 351K
The Hindi UD treebank is based on the Hindi Dependency Treebank (HDTB), created at IIIT Hyderabad, India.

 

PUD 23K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Esha Banerjee, Pinkey Nainwani, Martin Popel, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of Hindi treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Hindi English 1 26K Code switching

Hindi English treebanks

HIENCS 26K
The Hindi-English Code-switching treebank is based on code-switching tweets of Hindi and English multilingual speakers (mostly Indian) on Twitter. The treebank is manually annotated using UD sceheme. The training and evaluations sets were seperately annotated by different annotators using UD v2 and v1 guidelines respectively. The evaluation sets are automatically converted from UD v1 to v2.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Hungarian 1 42K Uralic, Ugric

Hungarian treebanks

Szeged 42K
The Hungarian UD treebank is derived from the Szeged Dependency Treebank (Vincze et al. 2010).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Indonesian 2 141K Austronesian, Malayo-Sumbawan

Indonesian treebanks

GSD 121K
The Indonesian UD is converted from the content head version of the [universal dependency treebank v2.0 (legacy)](https://github.com/ryanmcd/uni-dep-tb).

 

PUD 19K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Ruli Manurung, Muh Shohibussirri, Martin Popel, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of Indonesian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Irish 1 23K IE, Celtic

Irish treebanks

IDT 23K
A Universal Dependencies 1020-sentence treebank for modern Irish.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Italian 5 502K IE, Romance

Italian treebanks

ISDT 298K
The Italian corpus annotated according to the UD annotation scheme was obtained by conversion from ISDT (Italian Stanford Dependency Treebank), released for the dependency parsing shared task of Evalita-2014 (Bosco et al. 2014).

 

ParTUT 55K
UD_Italian-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles, among others.

 

PoSTWITA 124K
PoSTWITA-UD is a collection of Italian tweets annotated in Universal Dependencies that can be exploited for the training of NLP systems to enhance their performance on social media texts.

 

PUD 23K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Antonio Stella, Davide Rovati, Martin Popel, Daniel Zeman, Maria Simi, Manuela Sanguinetti
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

VIT -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

See here for comparative statistics of Italian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Japanese 5 1,688K Japanese

Japanese treebanks

GSD 184K
This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from Google UDT 2.0.
  • Contributors: Hiroshi Kanayama, Masayuki Asahara, Yusuke Miyao, Takaaki Tanaka, Ryan McDonald, Joakim Nivre, Daniel Zeman, Yuji Matsumoto, Shinsuke Mori, Sumire Uematsu
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

PUD 26K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Atsuko Shimada, Anna Trukhina, Martin Popel, Daniel Zeman, Hiroshi Kanayama
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

Modern 14K
This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from `Corpus of Historical Japanese' (CHJ).

 

BCCWJ 1,273K
This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from `Balanced Corpus of Contemporary Written Japanese'(BCCWJ).
  • Contributors: Mai Omura, Masayuki Asahara, Yusuke Miyao, Takaaki Tanaka, Hiroshi Kanayama, Yuji Matsumoto, Shinsuke Mori, Sumire Uematsu, Yugo Murawaki
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

KTC 189K
Please add a summary section to the treebank readme file

 

See here for comparative statistics of Japanese treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Kazakh 1 10K Turkic, Northwestern

Kazakh treebanks

KTB 10K
The UD Kazakh treebank is a combination of text from various sources including Wikipedia, some folk tales, sentences from the UDHR, news and phrasebook sentences. Sentences IDs include partial document identifiers.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Komi Zyrian 2 3K Uralic, Permic

Komi Zyrian treebanks

Lattice 2K
UD Komi-Zyrian Lattice is a treebank of written standard Komi-Zyrian.

 

IKDP 1K
This treebank consists of dialectal transcriptions of spoken Komi-Zyrian. The current texts are short recorded segments from different areas where the Iźva dialect of Komi language is spoken.

 

See here for comparative statistics of Komi Zyrian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Korean 5 446K Korean

Korean treebanks

Kaist 350K
The KAIST Korean Universal Dependency Treebank is generated by Chun et al., 2018 from the constituency trees in the [KAIST Tree-Tagging Corpus](http://semanticweb.kaist.ac.kr/home/index.php/Corpus4).

 

GSD 80K
The Google Korean Universal Dependency Treebank is first converted from the [Universal Dependency Treebank v2.0 (legacy)](https://github.com/ryanmcd/uni-dep-tb), and then enhanced by Chun et al., 2018.

 

PUD 16K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Sookyoung Kwak, Yongseok Cho, Martin Popel, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

Penn -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Sejong -
Please add a summary section to the treebank readme file

 

See here for comparative statistics of Korean treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Kurmanji 1 10K IE, Iranian

Kurmanji treebanks

MG 10K
The UD Kurmanji corpus is a corpus of Kurmanji Kurdish. It contains fiction and encyclopaedic texts in roughly equal measure. It has been annotated natively in accordance with the UD annotation scheme.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Latin 3 582K IE, Latin

Latin treebanks

PROIEL 199K
The Latin PROIEL treebank is based on the Latin data from the PROIEL treebank, and contains most of the Vulgate New Testament translations plus selections from Caesar's Gallic War, Cicero's Letters to Atticus, Palladius' Opus Agriculturae and the first book of Cicero's De officiis.

 

ITTB 353K
Latin data from the _Index Thomisticus_ Treebank. Data are taken from the _Index Thomisticus_ corpus by Roberto Busa SJ, which contains the complete work by Thomas Aquinas (1225–1274; Medieval Latin) and by 61 other authors related to Thomas.

 

Perseus 29K
This Universal Dependencies Latin Treebank consists of an automatic conversion of a selection of passages from the Ancient Greek and Latin Dependency Treebank 2.1

 

See here for comparative statistics of Latin treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Latvian 1 152K IE, Baltic

Latvian treebanks

LVTB 152K
Latvian UD Treebank is based on Latvian Treebank ([LVTB](http://sintakse.korpuss.lv)), being created at University of Latvia, Institute of Mathematics and Computer Science, [Artificial Intelligence Laboratory](http://ailab.lv).

 

Language documentation

See the language documentation page.
Lithuanian 2 46K IE, Baltic

Lithuanian treebanks

HSE 5K
Lithuanian treebank annotated manually (dependencies) using the Morphological Annotator by CCL, Vytautas Magnus University (http://tekstynas.vdu.lt/) and manual disambiguation. A pilot version which includes news and an essay by Tomas Venclova is available here.

 

ALKSNIS 40K ?
The Lithuanian dependency treebank ALKSNIS.

 

Language documentation

See the language documentation page.
Maltese 1 44K Afro-Asiatic, Semitic

Maltese treebanks

MUDT 44K
MUDT (Maltese Universal Dependencies Treebank) is a manually annotated treebank of Maltese, a Semitic language of Malta descended from North African Arabic with a significant amount of Italo-Romance influence. MUDT was designed as a balanced corpus with four major genres (see Splitting below) represented roughly equally.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Marathi 1 3K IE, Indic

Marathi treebanks

UFAL 3K
UD Marathi is a manually annotated treebank consisting primarily of stories from Wikisource, and parts of an article on Wikipedia.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Naija 1 12K Creole

Naija treebanks

NSC 12K
A Universal Dependencies corpus for spoken Naija (Nigerian Pidgin).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
North Sami 1 26K Uralic, Sami

North Sami treebanks

Giella 26K
This is a North Sámi treebank based on a manually disambiguated and function-labelled gold-standard corpus of North Sámi produced by the Giellatekno team at UiT Norgga árktalaš universitehta.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Norwegian 3 625K IE, Germanic

Norwegian treebanks

Bokmaal 310K
The Norwegian UD treebank is based on the Bokmål section of the Norwegian Dependency Treebank (NDT), which is a syntactic treebank of Norwegian. NDT has been automatically converted to the UD scheme by Lilja Øvrelid at the University of Oslo.

 

Nynorsk 301K
The Norwegian UD treebank is based on the Nynorsk section of the Norwegian Dependency Treebank (NDT), which is a syntactic treebank of Norwegian. NDT has been automatically converted to the UD scheme by Lilja Øvrelid at the University of Oslo.

 

NynorskLIA 13K
This Norwegian treebank is based on the LIA treebank of transcribed spoken Norwegian dialects. The treebank has been automatically converted to the UD scheme by Lilja Øvrelid at the University of Oslo.

 

See here for comparative statistics of Norwegian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Old Church Slavonic 1 57K IE, Slavic

Old Church Slavonic treebanks

PROIEL 57K
The Old Church Slavonic (OCS) UD treebank is based on the Old Church Slavonic data from the PROIEL treebank and contains the text of the Codex Marianus New Testament translation.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Old French 1 170K IE, Romance

Old French treebanks

SRCMF 170K
UD_Old_French-SRCMF is a conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French [srcmf.org](http://srcmf.org/)).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Persian 1 152K IE, Iranian

Persian treebanks

Seraji 152K
The Persian Universal Dependency Treebank (Persian UD) is based on Uppsala Persian Dependency Treebank (UPDT). The conversion of the UPDT to the Universal Dependencies was performed semi-automatically with extensive manual checks and corrections.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Polish 2 214K IE, Slavic

Polish treebanks

LFG 130K
The LFG Enhanced UD treebank of Polish is based on a corpus of LFG (Lexical Functional Grammar) syntactic structures generated by an LFG grammar of Polish, POLFIE, and manually disambiguated by human annotators.

 

SZ 83K
The UD Polish treebank is based on “Składnica zależnościowa” (the Polish dependency treebank) version 0.5.

 

See here for comparative statistics of Polish treebanks.

Language documentation

See the language documentation page.
Portuguese 3 570K IE, Romance

Portuguese treebanks

Bosque 227K
This Universal Dependencies (UD) Portuguese treebank is based on the Constraint Grammar converted version of the Bosque, which is part of the Floresta Sintá(c)tica treebank. It contains both European (CETEMPúblico) and Brazilian (CETENFolha) variants.
  • Contributors: Alexandre Rademaker, Eckhard Bick, Fabricio Chalub, Cláudia Freitas, Guilherme Paulino-Passos, Luisa Rocha, Isabela Soares-Bastos, Livy Real, Valeria de Paiva, Daniel Zeman, Martin Popel, David Mareček, Natalia Silveira, André Martins
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

GSD 319K
The Brazilian Portuguese UD is converted from the [Google Universal Dependency Treebank v2.0 (legacy)](https://github.com/ryanmcd/uni-dep-tb).

 

PUD 23K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Gustavo Mendonça, Larissa Rinaldi, Martin Popel, Daniel Zeman, Valeria de Paiva
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of Portuguese treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Romanian 2 413K IE, Romance

Romanian treebanks

RRT 218K
The Romanian UD treebank (called RoRefTrees) (Barbu Mititelu et al., 2016) is the reference treebank in UD format for standard Romanian.

 

Nonstandard 195K
The Romanian Non-standard UD treebank (called UAIC-RoDia) is based on UAIC-RoDia Treebank.

 

See here for comparative statistics of Romanian treebanks.

Language documentation

See the language documentation page.
Russian 4 1,247K IE, Slavic

Russian treebanks

GSD 99K
Russian Universal Dependencies Treebank annotated and converted by Google.

 

SynTagRus 1,107K
Russian data from the SynTagRus corpus.

 

Taiga 20K
Universal Dependencies treebank based on data samples extracted from Taiga Corpus and MorphoRuEval-2017 text collections.

 

PUD 19K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Tatiana Lando, Olga Loginova, Martin Popel, Daniel Zeman, Kira Droganova
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of Russian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Sanskrit 1 1K IE, Indic

Sanskrit treebanks

UFAL 1K
A small Sanskrit treebank of sentences from Pañcatantra, an ancient Indian collection of interrelated fables by Vishnu Sharma.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Serbian 1 86K IE, Slavic

Serbian treebanks

SET 86K
The Serbian UD treebank is based on the SETimes-SR corpus.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Slovak 1 106K IE, Slavic

Slovak treebanks

SNK 106K
The Slovak UD treebank is based on data originally annotated as part of the Slovak National Corpus, following the annotation style of the Prague Dependency Treebank.

 

Language documentation

See the language documentation page.
Slovenian 2 170K IE, Slavic

Slovenian treebanks

SSJ 140K
The Slovenian UD Treebank is a rule-based conversion of the ssj500k treebank, the largest collection of manually syntactically annotated data in Slovenian, originally annotated in the JOS annotation scheme.

 

SST 29K
The Spoken Slovenian UD Treebank (SST) is the first syntactically annotated corpus of spoken Slovenian, based on a sample of the reference GOS corpus, a collection of transcribed audio recordings of monologic, dialogic and multi-party spontaneous speech in different everyday situations.

 

See here for comparative statistics of Slovenian treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Spanish 3 1,004K IE, Romance

Spanish treebanks

AnCora 549K
Spanish data from the AnCora corpus.

 

GSD 431K
The Spanish UD is converted from the content head version of the [universal dependency treebank v2.0 (legacy)](https://github.com/ryanmcd/uni-dep-tb).

 

PUD 23K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Hector Fernandez Alcalde, Laura Moreno Romero, Martin Popel, Daniel Zeman, Héctor Martínez Alonso
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

See here for comparative statistics of Spanish treebanks.

Language documentation

See the language documentation page.
Swedish 3 195K IE, Germanic

Swedish treebanks

Talbanken 96K
The Swedish-Talbanken treebank is based on Talbanken, a treebank developed at Lund University in the 1970s.

 

LinES 79K
UD Swedish_LinES is the Swedish half of the LinES Parallel Treebank with UD annotations. All segments are translations from English and the sources cover literary genres, online manuals and Europarl data.

 

PUD 19K
Swedish-PUD is the Swedish part of the Parallel Universal Dependencies (PUD) treebanks.

 

See here for comparative statistics of Swedish treebanks.

Language documentation

See the language documentation page.
Swedish Sign Language 1 1K Sign Language

Swedish Sign Language treebanks

SSLC 1K
The Universal Dependencies treebank for Swedish Sign Language (ISO 639-3: swl) is derived from the Swedish Sign Language Corpus (SSLC) from the department of linguistics, Stockholm University.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Tagalog 1 <1K Austronesian, Central Philippine

Tagalog treebanks

TRG <1K
UD_Tagalog-TRG is a UD treebank manually annotated using sentences from a grammar book.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Tamil 1 9K Dravidian, Southern

Tamil treebanks

TTB 9K
The UD Tamil treebank is based on the Tamil Dependency Treebank created at the Charles University in Prague by Loganathan Ramasamy.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Telugu 1 6K Dravidian, South Central

Telugu treebanks

MTG 6K
The Telugu UD treebank is created in UD based on manual annotations of sentences from a grammar book.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Thai 1 22K Tai-Kadai

Thai treebanks

PUD 22K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Rattima Nitisaroj, Yanin Sawanakunanon, Martin Popel, Daniel Zeman
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Turkish 3 74K Turkic, Southwestern

Turkish treebanks

IMST 57K
The UD Turkish Treebank, also called the IMST-UD Treebank, is a semi-automatic conversion of the IMST Treebank (Sulubacak et al., 2016).

 

PUD 16K
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the [CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/).
  • Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Savas Cetin, Martin Popel, Daniel Zeman, Francis Tyers, Çağrı Çöltekin
  • Repository master dev
  • README
  • Treebank hub page
  • Download

 

BOUN -
A Turkish treebank annotated at the Boğaziçi University.

 

See here for comparative statistics of Turkish treebanks.

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Ukrainian 1 116K IE, Slavic

Ukrainian treebanks

IU 116K
Gold standard Universal Dependencies corpus for Ukrainian, developed for UD originally, by [Institute for Ukrainian](https://mova.institute), NGO. [[українською](https://mova.institute/золотий_стандарт)]

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Upper Sorbian 1 11K IE, Slavic

Upper Sorbian treebanks

UFAL 11K
A small treebank of Upper Sorbian based mostly on Wikipedia.

 

Language documentation

See the language documentation page.
Urdu 1 138K IE, Indic

Urdu treebanks

UDTB 138K
The Urdu Universal Dependency Treebank was automatically converted from Urdu Dependency Treebank (UDTB) which is part of an ongoing effort of creating multi-layered treebanks for Hindi and Urdu.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Uyghur 1 40K Turkic, Southeastern

Uyghur treebanks

UDT 40K
The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Vietnamese 1 43K Austro-Asiatic, Viet-Muong

Vietnamese treebanks

VTB 43K
The Vietnamese UD treebank is a conversion of the constituent treebank created in the VLSP project (https://vlsp.hpda.vn/).

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Warlpiri 1 <1K Pama-Nyungan

Warlpiri treebanks

UFAL <1K
A small treebank of grammatical examples in Warlpiri, taken from linguistic literature.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Yoruba 1 2K Niger-Congo, Defoid

Yoruba treebanks

YTB 2K
Parts of the Yoruba Bible, hand-annotated natively in Universal Dependencies.

 

Language documentation

See the language documentation page.

Upcoming UD Languages

Bengali 2 - IE, Indic

Bengali treebanks

BRU -
Please add a summary section to the treebank readme file

 

DDS -
Please add a summary section to the treebank readme file

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Bhojpuri 1 - IE, Indic

Bhojpuri treebanks

BHTB -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Dargwa 1 - Nakho-Dagestanian

Dargwa treebanks

Mehweb -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Georgian 1 - Kartvelian

Georgian treebanks

GNC -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Kannada 1 - Dravidian, Southern

Kannada treebanks

MKG -
Examples from Modern Kannada Grammar by S.N.Sridhar.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Kyrgyz 1 - Turkic, Northwestern

Kyrgyz treebanks

KTB -
... 1-2 sentences (see http://universaldependencies.org/release_checklist.html#the-readme-file for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Pnar 1 - Austro-Asiatic, Khasian

Pnar treebanks

PTB -
UD Pnar-PTB is a conversion from the Ring (2017) dataset ([doi:10.21979/N9/KVFGBZ](http://dx.doi.org/10.21979/N9/KVFGBZ)) that underpins a grammatical description of the Pnar language (Ring 2015, [http://hdl.handle.net/10356/62519](http://hdl.handle.net/10356/62519)). The corpus consists of folktales and interviews transcribed, translated, and interlinearized.

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Romansh 2 - IE, Romance

Romansh treebanks

Rumgr -
Please add a summary section to the treebank readme file

 

Sursilv -
Please add a summary section to the treebank readme file

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Shipibo Konibo 1 - Panoan

Shipibo Konibo treebanks

PUCP -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Sindhi 1 - IE, Indic

Sindhi treebanks

MazharDootio -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Somali 1 - Afro-Asiatic, Cushitic

Somali treebanks

STB -
Please add a summary section to the treebank readme file

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Sorani 1 - IE, Iranian

Sorani treebanks

MG -
Please add a summary section to the treebank readme file

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.
Welsh 1 - IE, Celtic

Welsh treebanks

CCG -
Corpws Cystrawennol y Gymraeg

 

Language documentation

See the language documentation page.
Wolof 1 - Niger-Congo, Northern Atlantic

Wolof treebanks

WTB -
... 1-2 sentences (see [release checklist](http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...

 

Language documentation

The language hub documentation has not yet been created or ported from the UDv1 documentation.

Disclaimer: Our use of flags to symbolise languages is only intended as a visual enhancement of the website and should not be interpreted as a political statement in any way.

Download

The data is released through LINDAT/CLARIN.