Shallow parsing

Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.). While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns (e.g. as specified by Regular Expressions), approaches that use machine learning techniques (classifiers, topic modeling, etc.) can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents.^[1] That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.

It is a technique widely used in natural language processing. It is similar to the concept of lexical analysis for computer languages. Under the name of the Shallow Structure Hypothesis, it is also used as an explanation for why second language learners often fail to parse complex sentences correctly.^[2]

References[edit]

Citations[edit]

^ Jurafsky, Daniel; Martin, James H. (2000). Speech and Language Processing. Singapore: Pearson Education Inc. pp. 577–586.
^ Clahsen, Felser, Harald, Claudia (2006). "Grammatical Processing in Language Learners". Applied Psycholinguistics. 27: 3–42. doi:10.1017/S0142716406060024.

Sources[edit]

"NP Chunking (State of the art)". Association for Computational Linguistics. Retrieved 2016-01-30.
Abney, Steven (1991), Parsing By Chunks (PDF), Kluwer Academic Publishers, pp. 257–278.

External links[edit]

Apache OpenNLP OpenNLP includes a chunker.
GATE General Architecture for Text Engineering GATE includes a chunker.
NLTK chunking
Illinois Shallow Parser Shallow Parser Demo

v t e Natural language processing
General terms	Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)
Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Truecasing
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation
Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Automated online assistant Chatbot Interactive fiction Question answering Voice user interface

Shallow parsing

Contents

References[edit]

Citations[edit]

Sources[edit]

External links[edit]

See also[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages