Free Sample
+ Collection

Natural Language Processing with Java and LingPipe Cookbook

Breck Baldwin, Krishna Dayanidhi

Over 60 effective recipes to develop your Natural Language Processing (NLP) skills quickly and effectively
RRP $26.99
RRP $44.99
Print + eBook

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781783284672
Paperback312 pages

About This Book

  • Build effective natural language processing applications
  • Transit from ad-hoc methods to advanced machine learning techniques
  • Use advanced techniques such as logistic regression, conditional random fields, and latent Dirichlet allocation

Who This Book Is For

This book is for experienced Java developers with NLP needs, whether academics, industrialists, or hobbyists. A basic knowledge of NLP terminology will be beneficial.

Table of Contents

Chapter 1: Simple Classifiers
Deserializing and running a classifier
Getting confidence estimates from a classifier
Getting data from the Twitter API
Applying a classifier to a .csv file
Evaluation of classifiers – the confusion matrix
Training your own language model classifier
How to train and evaluate with cross validation
Viewing error categories – false positives
Understanding precision and recall
How to serialize a LingPipe object – classifier example
Eliminate near duplicates with the Jaccard distance
How to classify sentiment – simple version
Chapter 2: Finding and Working with Words
Introduction to tokenizer factories – finding words in a character stream
Combining tokenizers – lowercase tokenizer
Combining tokenizers – stop word tokenizers
Using Lucene/Solr tokenizers
Using Lucene/Solr tokenizers with LingPipe
Evaluating tokenizers with unit tests
Modifying tokenizer factories
Finding words for languages without white spaces
Chapter 3: Advanced Classifiers
A simple classifier
Language model classifier with tokens
Naïve Bayes
Feature extractors
Logistic regression
Multithreaded cross validation
Tuning parameters in logistic regression
Customizing feature extraction
Combining feature extractors
Classifier-building life cycle
Linguistic tuning
Thresholding classifiers
Train a little, learn a little – active learning
Chapter 4: Tagging Words and Tokens
Interesting phrase detection
Foreground- or background-driven interesting phrase detection
Hidden Markov Models (HMM) – part-of-speech
N-best word tagging
Confidence-based tagging
Training word tagging
Word-tagging evaluation
Conditional random fields (CRF) for word/token tagging
Modifying CRFs
Chapter 5: Finding Spans in Text – Chunking
Sentence detection
Evaluation of sentence detection
Tuning sentence detection
Marking embedded chunks in a string – sentence chunk example
Paragraph detection
Simple noun phrases and verb phrases
Regular expression-based chunking for NER
Dictionary-based chunking for NER
Translating between word tagging and chunks – BIO codec
HMM-based NER
Mixing the NER sources
CRFs for chunking
NER using CRFs with better features
Chapter 6: String Comparison and Clustering
Distance and proximity – simple edit distance
Weighted edit distance
The Jaccard distance
The Tf-Idf distance
Using edit distance and language models for spelling correction
The case restoring corrector
Automatic phrase completion
Single-link and complete-link clustering using edit distance
Latent Dirichlet allocation (LDA) for multitopic clustering
Chapter 7: Finding Coreference Between Concepts/People
Named entity coreference with a document
Adding pronouns to coreference
Cross-document coreference
The John Smith problem

What You Will Learn

  • Master a broad range of classification techniques for text data
  • Track people, concepts, and things in data, within and across documents
  • Understand the importance of evaluation in creation of NLP applications and how to do it
  • Yield best practices for common text-analytics problems
  • Tune systems for high performance and trade off various aspects of the performance curve
  • Become a master in customizing NLP systems at all levels
  • Build systems for non-tokenized languages such as Chinese and Japanese

In Detail

NLP is at the core of web search, intelligent personal assistants, marketing, and much more, and LingPipe is a toolkit for processing text using computational linguistics.

This book starts with the foundational but powerful techniques of language identification, sentiment classifiers, and evaluation frameworks. It goes on to detail how to build a robust framework to solve common NLP problems, before ending with advanced techniques for complex heterogeneous NLP systems.

This is a recipe and tutorial book for experienced Java developers with NLP needs. A basic knowledge of NLP terminology will be beneficial. This book will guide you through the process of how to build NLP apps with minimal fuss and maximal impact.


Read More

Recommended for You

Python 3 Text Processing with NLTK 3 Cookbook
$ 26.99
Python Text Processing with NLTK 2.0 Cookbook
$ 23.99
Mastering the Nmap Scripting Engine
$ 26.99
Python Text Processing with NLTK 2.0 Cookbook: LITE
$ 9.99
RESTful Services with ASP.NET Web API [Video]
$ 25.50