Hidden Markov Models (HMM) – part-of-speech
This recipe brings in the first hard-core linguistic capability of LingPipe; it refers to the grammatical category for words or part-of-speech (POS). What are the verbs, nouns, adjectives, and so on in text?
How to do it...
Let's jump right in and drag ourselves back to those awkward middle-school years in English class or our equivalent:
As always, head over to your friendly command prompt and type the following:
java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar: com.lingpipe.cookbook.chapter9.PosTagger
The system will respond with a prompt to which we will add a Jorge Luis Borges quote:
INPUT> Reality is not always probable, or likely.
The system will respond delightfully to this quote with:
Reality_nn is_bez not_* always_rb probable_jj ,_, or_cc likely_jj ._.
Appended to each token is _
with a part-of-speech tag; nn
is noun, rb
is adverb, and so on. The complete tag set and description of the corpus of the tagger can be found at http...