Reader small image

You're reading from  Natural Language Processing with TensorFlow - Second Edition

Product typeBook
Published inJul 2022
Reading LevelIntermediate
PublisherPackt
ISBN-139781838641351
Edition2nd Edition
Languages
Right arrow
Author (1)
Thushan Ganegedara
Thushan Ganegedara
author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

Right arrow

Named Entity Recognition with RNNs

Now let’s look at our first task: using an RNN to identify named entities in a text corpus. This task is known as Named Entity Recognition (NER). We will be using a modified version of the well-known CoNLL 2003 (which stands for Conference on Computational Natural Language Learning - 2003) dataset for NER.

CoNLL 2003 is available for multiple languages, and the English data was generated from a Reuters Corpus that contains news stories published between August 1996 and August 1997. The database we’ll be using is found at https://github.com/ZihanWangKi/CrossWeigh and is called CoNLLPP. It is a more closely curated version than the original CoNLL, which contains errors in the dataset induced by incorrectly understanding the context of a word. For example, in the phrase “Chicago won …” Chicago was identified as a location, whereas it is in fact an organization. This exercise is available in ch06_rnns_for_named_entity_recognition...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Natural Language Processing with TensorFlow - Second Edition
Published in: Jul 2022Publisher: PacktISBN-13: 9781838641351

Author (1)

author image
Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara