Collocation extraction

Collocation extraction is the task of extracting collocations automatically from a corpus using a computer.

Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'.

The traditional method of performing collocation extraction is to find a formula based on the statistical quantities of those words to calculate a score associated to every word pairs. Proposed formulas are mutual information, t-test, z test, chi-squared test and likelihood ratio.^[1]

External links[edit]

References[edit]

^ Manning, C. D.; Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press. ISBN 978-0-262-13360-9.

This computational linguistics-related article is a stub. You can help Wikipedia by expanding it.

[1] Manning, C. D.; Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press. ISBN 978-0-262-13360-9.

[1]

v t e Natural language processing
General terms	Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)
Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Truecasing
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation
Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Automated online assistant Chatbot Interactive fiction Question answering Voice user interface

Collocation extraction

See also[edit]

External links[edit]

References[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages