Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Natural Language Processing with Java and LingPipe Cookbook

You're reading from  Natural Language Processing with Java and LingPipe Cookbook

Product type Book
Published in Nov 2014
Publisher
ISBN-13 9781783284672
Pages 312 pages
Edition 1st Edition
Languages

Table of Contents (14) Chapters

Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Simple Classifiers Finding and Working with Words Advanced Classifiers Tagging Words and Tokens Finding Spans in Text – Chunking String Comparison and Clustering Finding Coreference Between Concepts/People Index

Applying a classifier to a .csv file


Now, we can test our language ID classifier on the data we downloaded from Twitter. This recipe will show you how to run the classifier on the .csv file and will set the stage for the evaluation step in the next recipe.

How to do it...

Applying a classifier to the .csv file is straightforward! Just perform the following steps:

  1. Get a command prompt and run:

    java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar:lib/twitter4j-core-4.0.1.jar:lib/opencsv-2.4.jar com.lingpipe.cookbook.chapter1.ReadClassifierRunOnCsv
    
  2. This will use the default CSV file from the data/disney.csv distribution, run over each line of the CSV file, and apply a language ID classifier from models/ 3LangId.LMClassifier to it:

    InputText: When all else fails #Disney
    Best Classified Language: english
    InputText: ES INSUPERABLE DISNEY !! QUIERO VOLVER:(
    Best Classified Language: Spanish
    
  3. You can also specify the input as the first argument and the classifier as the second one.

How it works…

We will deserialize a classifier from the externalized model that was described in the previous recipes. Then, we will iterate through each line of the .csv file and call the classify method of the classifier. The code in main() is:

String inputPath = args.length > 0 ? args[0] : "data/disney.csv";
String classifierPath = args.length > 1 ? args[1] : "models/3LangId.LMClassifier";
@SuppressWarnings("unchecked") BaseClassifier<CharSequence> classifier = (BaseClassifier<CharSequence>) AbstractExternalizable.readObject(new File(classifierPath));
List<String[]> lines = Util.readCsvRemoveHeader(new File(inputPath));
for(String [] line: lines) {
  String text = line[Util.TEXT_OFFSET];
  Classification classified = classifier.classify(text);
  System.out.println("InputText: " + text);
  System.out.println("Best Classified Language: " + classified.bestCategory());
}

The preceding code builds on the previous recipes with nothing particularly new. Util.readCsvRemoveHeader, shown as follows, just skips the first line of the .csv file before reading from disk and returning the rows that have non-null values and non-empty strings in the TEXT_OFFSET position:

public static List<String[]> readCsvRemoveHeader(File file) throws IOException {
  FileInputStream fileIn = new FileInputStream(file);
  InputStreamReader inputStreamReader = new InputStreamReader(fileIn,Strings.UTF8);
  CSVReader csvReader = new CSVReader(inputStreamReader);
  csvReader.readNext();  //skip headers
  List<String[]> rows = new ArrayList<String[]>();
  String[] row;
  while ((row = csvReader.readNext()) != null) {
    if (row[TEXT_OFFSET] == null || row[TEXT_OFFSET].equals("")) {
      continue;
    }
    rows.add(row);
  }
  csvReader.close();
  return rows;
}
You have been reading a chapter from
Natural Language Processing with Java and LingPipe Cookbook
Published in: Nov 2014 Publisher: ISBN-13: 9781783284672
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}