Collocation and contingency tables
When we look into a corpus, some words tend to appear in combination; for example, I need a strong coffee, John kicked the bucket, He is a heavy smoker. J. R. Firth drew attention to such words that are not combined randomly into a phrase or sentence. Firth coined the term collocations for such word combinations; the meaning of a word is in part determined by its characteristic collocations. In the field of natural language processing (NLP), the combination of words plays an important role.
Word combinations that are considered collocations can be compound nouns, idiomatic expressions, or combinations that are lexically restricted. This variability in definition is defined by terms such as multi-word expressions (MWE), multi-word units (MWU), bigrams and idioms.
Collocations can be observed in corpora and can be quantified. Multi-word expressions have to be stored as units in order to understand their complete meaning. Three characteristic properties emerge...