home edit page issue tracker

This page pertains to UD version 2.

UD Bulgarian BTB

Language: Bulgarian (code: bg)
Family: Indo-European, Slavic

This treebank has been part of Universal Dependencies since the UD v1.1 release.

The following people have contributed to making this treebank part of UD: Kiril Simov, Petya Osenova, Martin Popel.

Repository: UD_Bulgarian-BTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.2

License: CC BY-NC-SA 3.0

Genre: news, legal, fiction

Questions, comments? General annotation questions (either Bulgarian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [kivs (æt) bultreebank • org, petya (æt) bultreebank • org]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD

Description

UD_Bulgarian-BTB is based on the HPSG-based BulTreeBank, created at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. The original consists of 215,000 tokens (over 15,000 sentences).

All the texts were processed automatically at tokenization, morphological and chunk level. Then, the full syntactic analysis were perfomed manually by trained annotators.

The UD_Bulgarian-BTB consists of 156 149 tokens (11,138 sentences). This subset of BulTreeBank excludes ellipses and some rare phenomena. The conversion was done semi-automatically by Kiril Simov, with the application of set of rules and constraints for result consistency.

The rest of the sentences will be converted for the next releases. The original version is freely available for research upon request.

Acknowledgments

The original treebank was developed in a project (2001-2004), funded by the Volkswagen Stiftung, Federal Republic of Germany under the Programme “Cooperation with Natural and Engineering Scientists in Central and Eastern Europe”. The project was carried out mainly at IICT-BAS in tight cooperation with researchers at the Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universität, Tübingen, Germany. Link: http://bultreebank.org/ The conversion of BulTreeBank into Universal Dependency format was supported by the EU Project QTLeap. Link: http://qtleap.eu/

We would like to thank all our colleagues that contributed to the annotation of the original treebank: Elisaveta Balabanova, Dimitar Dojkov, Maggie Ivanchukova, Sia Kolkovska, Milena Slavcheva, Petya Osenova. We also would like to thank our annotator and validator to the treebank UD version: Stanislava Kancheva.

Statistics of UD Bulgarian BTB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AnimacyAspectCaseDefiniteDegreeForeignGenderMoodNumberNumTypePersonPolarityPossPronTypeReflexTenseVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundconjcopcsubjcsubj:passdetdiscourseexplfixedflatgoeswithiobjmarknmodnsubjnsubj:passnummodobjoblorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview