Reader small image

You're reading from  Natural Language Processing and Computational Linguistics

Product typeBook
Published inJun 2018
Reading LevelBeginner
PublisherPackt
ISBN-139781788838535
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Bhargav Srinivasa-Desikan
Bhargav Srinivasa-Desikan
author image
Bhargav Srinivasa-Desikan

Bhargav Srinivasa-Desikan is a research engineer working for INRIA in Lille, France. He is a part of the MODAL (Models of Data Analysis and Learning) team, and he works on metric learning, predictor aggregation, and data visualization. He is a regular contributor to the Python open source community, and completed Google Summer of Code in 2016 with Gensim where he implemented Dynamic Topic Models. He is a regular speaker at PyCons and PyDatas across Europe and Asia, and conducts tutorials on text analysis using Python.
Read more about Bhargav Srinivasa-Desikan

Right arrow

Chapter 6. NER-Tagging and Its Applications

We saw in the previous chapter how we can use spaCy's language pipeline – POS-tagging, which is a very powerful tool, and we will now explore another interesting usage, NER-tagging. We will discuss what exactly this is from both a linguistic and text analysis point of view, as well as detailed examples of its usage, and how to train our own NER-tagger with spaCy. Following are the topics we will cover in this chapter:

  • What is NER-tagging?
  • NER-tagging in Python
  • Training your NER-tagger
  • NER-tagging examples and visualization

NER-tagging examples and visualization

One of spaCy's most impressive offerings is its visualization suites and API, and in particular displaCy [17]. We discussed this in the previous chapter when visualizing part of speech tags. While it is most impressive in visualizing dependency parsing (which we will see next chapter), it doesn't do a half bad job with entities either.

Fig 6.4 An example from a news excerpt from an Elon Musk article on https://www.wired.com

We can see in the above example that spaCy has caught the entities quite well. Indeed, even the Elon Musk page is marked as an organization, which could be considered an organization. It could be the context of Tesla before it or official pages after it we cannot be sure. We do have an interesting mistake caught again here, where Twitter is a geopolitical entity. Again, we could let this slide if we...

Summary

We've seen once again how well spaCy deals with computational linguistic tasks and also how useful NER-tagging can be. While being a task that is used in text analysis, the model itself is a statistical one understanding this helps in setting the context for building our own models if we would like, or in updating the existing model that spaCy uses.

In the next chapter, we will see how spaCy deals with our final section on computational linguistics dependency parsing.

References

[1] A survey of named entity recognition and classification:
https://nlp.cs.nyu.edu/sekine/papers/li07.pdf

[2] Annotation Subtypes:
https://catalog.ldc.upenn.edu/docs/LDC2005T33/BBN-Types-Subtypes.html

[3] Named Entity Recognition and Resolution for Literary Studies:
https://pure.uva.nl/ws/files/2676433/168352_2014_VanDalenOskam_07_Namescape.pdf

[4] Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data:
https://repository.upenn.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1162&context=cis_papers

[5] Natural Language Processing: Semantic Aspects:
https://books.google.fr/books?id=YXv6AQAAQBAJ&source=gbs_navlinks_s

[6] Stanford NER:
https://nlp.stanford.edu/software/CRF-NER.shtml

[7] Testing NLTK and Stanford NER Taggers for Accuracy:
https://pythonprogramming.net/testing-stanford-ner-taggers-for-accuracy...

NER-tagging examples and visualization


One of spaCy's most impressive offerings is its visualization suites and API, and in particular displaCy [17]. We discussed this in the previous chapter when visualizing part of speech tags. While it is most impressive in visualizing dependency parsing (which we will see next chapter), it doesn't do a half bad job with entities either.

Fig 6.4 An example from a news excerpt from an Elon Musk article on https://www.wired.com

We can see in the above example that spaCy has caught the entities quite well. Indeed, even the Elon Musk page is marked as an organization, which could be considered an organization. It could be the context of Tesla before it or official pages after it – we cannot be sure. We do have an interesting mistake caught again here, where Twitter is a geopolitical entity. Again, we could let this slide if we are considering that Facebook and Twitter are becoming big enough to be a country! But jokes aside, it is not always easy to deal with...

Summary


We've seen once again how well spaCy deals with computational linguistic tasks and also how useful NER-tagging can be. While being a task that is used in text analysis, the model itself is a statistical one – understanding this helps in setting the context for building our own models if we would like, or in updating the existing model that spaCy uses.

In the next chapter, we will see how spaCy deals with our final section on computational linguistics – dependency parsing.

References


[1] A survey of named entity recognition and classification: https://nlp.cs.nyu.edu/sekine/papers/li07.pdf

[2] Annotation Sub-Types: https://catalog.ldc.upenn.edu/docs/LDC2005T33/BBN-Types-Subtypes.html

[3] Named Entity Recognition and Resolution for Literary Studies: https://pure.uva.nl/ws/files/2676433/168352_2014_VanDalenOskam_07_Namescape.pdf

[4] Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data: https://repository.upenn.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1162&context=cis_papers

[5] Natural Language Processing: Semantic Aspects: https://books.google.fr/books?id=YXv6AQAAQBAJ&source=gbs_navlinks_s

[6] Stanford NER: https://nlp.stanford.edu/software/CRF-NER.shtml

[7] Testing NLTK and Stanford NER Taggers for Accuracy: https://pythonprogramming.net/testing-stanford-ner-taggers-for-accuracy/?completed=/named-entity-recognition-stanford-ner-tagger/

[8] How to Use Stanford Named Entity Recognizer (NER) in Python...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing and Computational Linguistics
Published in: Jun 2018Publisher: PacktISBN-13: 9781788838535
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Bhargav Srinivasa-Desikan

Bhargav Srinivasa-Desikan is a research engineer working for INRIA in Lille, France. He is a part of the MODAL (Models of Data Analysis and Learning) team, and he works on metric learning, predictor aggregation, and data visualization. He is a regular contributor to the Python open source community, and completed Google Summer of Code in 2016 with Gensim where he implemented Dynamic Topic Models. He is a regular speaker at PyCons and PyDatas across Europe and Asia, and conducts tutorials on text analysis using Python.
Read more about Bhargav Srinivasa-Desikan