Reader small image

You're reading from  Mastering Text Mining with R

Product typeBook
Published inDec 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781783551811
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
KUMAR ASHISH
KUMAR ASHISH
author image
KUMAR ASHISH

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about KUMAR ASHISH

Right arrow

Named entity recognition


Named entity recognition in a sub process in the natural language processing pipeline. We identify the names and numbers from the input document. The names can be names of a person or company, location numbers can be money or percentages, to name a few. In order to perform named entity recognition, we will use Apache OpenNLP TokenNameFinderModel API. In order to invoke the code from the R environment, we will use the OpenNLP R package:

  1. Load the required libraries:

    library(rJava)
    library(NLP)
    library(openNLP)
  2. Create a sample text; we will extract the entities from this text:

    txt <- " IBM is an MNC with headquarters in New York. Oracle is a cloud company in California. James works in IBM. Oracle hired John for cloud expertise. They give 100% to their profession"
  3. We will convert it to string for processing:

    txt_str <- as.String(txt)
  4. We will process the text through the MaxEnt sentence token annotator and the MaxEnt word token annotator, both available in r packages and...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Text Mining with R
Published in: Dec 2016Publisher: PacktISBN-13: 9781783551811

Author (1)

author image
KUMAR ASHISH

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about KUMAR ASHISH