Home Data The Applied AI and Natural Language Processing Workshop

The Applied AI and Natural Language Processing Workshop

By Krishna Sankar , Jeffrey Jackovich , Ruze Richards
books-svg-icon Book
eBook $29.99 $20.98
Print $43.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $29.99 $20.98
Print $43.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    2. Analyzing Documents and Text with Natural Language Processing
About this book
Are you fascinated with applications like Alexa and Siri and how they accurately process information within seconds before returning accurate results? Are you looking for a practical guide that will teach you how to build intelligent applications that can revolutionize the world of artificial intelligence? The Applied AI and NLP Workshop will take you on a practical journey where you will learn how to build artificial intelligence (AI) and natural language processing (NLP) applications with Amazon Web services (AWS). Starting with an introduction to AI and machine learning, this book will explain how Amazon S3, or Amazon Simple Storage Service, works. You’ll then integrate AI with AWS to build serverless services and use Amazon’s NLP service Comprehend to perform text analysis on a document. As you advance, the book will help you get to grips with topic modeling to extract and analyze common themes on a set of documents with unknown topics. You’ll also work with Amazon Lex to create and customize a chatbot for task automation and use Amazon Rekognition for detecting objects, scenes, and text in images. By the end of The Applied AI and NLP Workshop, you’ll be equipped with the knowledge and skills needed to build scalable intelligent applications with AWS.
Publication date:
July 2020
Publisher
Packt
Pages
384
ISBN
9781800208742

 

2. Analyzing Documents and Text with Natural Language Processing

Overview

This chapter describes the use of Amazon Comprehend to summarize text documents and create Lambda functions to analyze the texts. You will learn how to develop services by applying the serverless computing paradigm, and use Amazon Comprehend to examine texts to determine their primary language. You will extract information such as entities (people or places), key phrases (noun phrases that are indicative of the content), emotional sentiments, and topics from a set of documents.

By the end of this chapter, you will able to set up a Lambda function to process and analyze imported text using Comprehend and extract structured information from scanned paper documents using Amazon Textract.

 

Introduction

Since 2005, when Amazon formally launched its Elastic Compute Cloud (EC2) web service, cloud computing has grown from a developer service to mission-critical infrastructure. The spectrum of applications is broad—most highly scalable consumer platforms such as Netflix are based on AWS, and so are many pharmaceuticals and genomics, as well as organizations such as the BBC and The Weather Channel, BMW, and Canon. As of January 2020, there are about 143 distinct AWS services spanning 25 categories, from compute and storage to quantum technologies, robotics, and machine learning. In this book, we will cover a few of them, as shown in the following diagram:

Figure 2.1: Amazon AI services covered

a

S3 is the versatile object store that we use to store the inputs to our AI services as well as the outputs from those services. You have been working with S3 since
Chapter 1, An Introduction to AWS.

b
...
 

Serverless Computing

Serverless computing is a relatively new architecture that takes a different spin on the cloud application architecture. Let's start with a traditional on-premise server-based architecture.

Usually, a traditional application architecture starts with a set of computer hardware, a host operating system, virtualization, containers, and an application stack consisting of libraries and frameworks tied together by networking and storage. On top of all this, we write business logic. In essence, to maintain a business capability, we have to maintain the server hardware, operating system patches, updates, library updates, and so forth. We also have to worry about scalability, fault tolerance, and security at the least.

With cloud computing, the application architecture is free of computer hardware as well as having elasticity. We still have to maintain the OS, libraries, patches, and so on. This where serverless computing comes in—in the words of...

 

Amazon Comprehend

Amazon Comprehend is a text analytics service. It has a broad spectrum of capabilities. Amazon Comprehend can extract key phrases and entities. It can do language detection and topic modeling. It can also perform sentiment analysis as well as syntax analysis. Amazon Comprehend is multilingual. Some of the applications of Amazon Comprehend include:

  • Understanding the main themes and topics of various unstructured text items such as support tickets, social media posts, customer feedback, customer complaints, and business documents such as contracts and medical records.
  • Knowledge management by categorizing business documents such as internal procedures, white papers, notes and descriptions, media posts, and emails.
  • Brand monitoring—effectively responding to social media posts, reviews, and other user-generated content from various channels. Respond faster by prioritizing the content as well as routing the content to the appropriate person or process...
 

What Is an NLP Service?

Amazon Comprehend is an NLP service. The overall goal of an NLP service is to make machines understand our spoken and written language. Virtual assistants, such as Alexa or Siri, use NLP to produce insights from input data. The input data is structured by a language, which has a unique grammar, syntax, and vocabulary. Thus, processing text data requires identifying the language first and applying subsequent rules to identify the document's information. NLP's general task is to capture this information as a numeral representation. This general task is split into specific tasks, such as identifying languages, entities, key phrases, emotional sentiments, and topics.

Figure 2.4: Amazon Comprehend data flow

As we discussed earlier, Amazon Comprehend uses pre-trained models to perform document analysis tasks. This is very good because it enables a business to develop capabilities without going through an exhaustive AI model training...

 

Using Amazon Comprehend to Inspect Text and Determine the Primary Language

Amazon Comprehend is used for searching and examining texts and then gathering insights from a variety of topics (health, media, telecom, education, government, and so on) and languages in the text data format. Thus, the first step to analyze text data and utilize more complex features (such as topic, entity, and sentiment analysis) is to determine the dominant language. Determining the dominant language ensures the accuracy of more in-depth analysis. To examine the text in order to determine the primary language, there are two operations (DetectDominantLanguage and BatchDetectDominantLanguage).

Both operations expect the text in the UTF-8 format with a length of at least 20 characters and a maximum of 5,000 bytes. If you are sending a list, it should not contain more than 25 items.

The response includes what language was identified using a two-letter code. The following table shows the language codes...

 

Extracting Information from a Set of Documents

At a business level, knowing if and why a customer is angry or happy when they contact a virtual assistant is extremely important, to retain the customer. At an NLP level, this requires more information to be extracted and a more complex algorithm. The additional information to extract and quantify is entities, key phrases, emotional sentiment, and topics.

Detecting Named Entities—AWS SDK for Python (boto3)

An entity is a broader concept—it is something that has an identity of its own. An entity can be a person or a place, a company name or an organization; it can also be a number (say quantity, price, number of days) or a date, a title, a policy number, or a medical code. For example, in the text "Martin lives at 27 Broadway St.", Martin might be detected as a PERSON, while 27 Broadway St might be detected as a LOCATION.

Entities also have a score to indicate the confidence level that the entity type was...

 

Setting Up a Lambda Function and Analyzing Imported Text Using Comprehend

We have used Amazon Comprehend to do various NLP tasks, such as detecting entities and key phrases and carrying out sentiment analysis.

Integrating Comprehend and AWS Lambda for responsive NLP

In this topic, we will be integrating AWS Lambda functions with Comprehend, which provides a more powerful, scalable infrastructure. You can use AWS Lambda to run your code in response to events, such as changes to data in an Amazon S3 bucket.

Executing code in response to events provides a real-world solution for developing scalable software architecture. Overall, this increases our data pipeline and provides the ability to handle more complex big data volumes and NLP operations.

What Is AWS Lambda?

AWS Lambda is a compute service that runs code without provisioning or managing servers. AWS Lambda executes code only when needed, and scales automatically. AWS Lambda runs your code on a high-availability compute...

 

Amazon Textract

Another interesting NLP Amazon service is Textract. Essentially, Textract can extract information from documents, usually business documents such as tax forms, legal documents, medical forms, bank forms, patent registrations, and so forth. It is an optical character recognition (OCR) solution for scanning structured documents, suitable for robotic process automation (RPA). Textract is a relatively new service—previewed in November 2018 and generally available in May 2019.

The advantage of Textract is that it understands documents and can extract tables and/or key-value pairs suitable for downstream processing. A lot of business processes, such as health insurance processing, tax preparation, loan application processing, monitoring and evaluation of existing loans, compliance evaluation, and engineering evaluations take in these documents, usually processing them manually to extract information and then start digital processes. Using Amazon Textract, the manual...

 

Summary

In this chapter, we started with high-level concepts around Amazon AI services and serverless computing. On a conceptual level, you learned about serverless computing as well as the various AI services available on the AWS platform.

Overall, the culmination of these independent functions provides the foundation for building complex machine learning-based NLP applications (for example, Siri, Alexa, and so on). Knowing how and why the individual functions operate will allow you to build your own AWS-based NLP applications.

Then, we dived into the details of Amazon Comprehend—how Comprehend's DetectDominantLanguage method is structured, and how to pass in both strings and a list of strings. You learned how to extract entities, sentiments, key phrases, and topics, which provide the data for complex NLP. This allows Amazon Comprehend to become more efficient by automating text analysis upon a text document that's been uploaded to S3.

You also learned how...

About the Authors
  • Krishna Sankar

    Krishna Sankar is an AI data scientist with Volvo Cars focusing on autonomous vehicles. Previously, he was the chief data scientist at blackarrow, where he was focusing on optimizing user experience via inference, intelligence, and interfaces. He was the director of a data science/bioinformatics startup and also worked as a distinguished engineer at Cisco. He has been speaking at various conferences as well as guest lecturing at the Naval Postgraduate School. His other passion is Lego Robotics. You'll find him at the St.Louis FLL World Competition as the robot design judge.

    Browse publications by this author
  • Jeffrey Jackovich

    Jeffrey Jackovich is a curious data scientist with a background in health-tech and mergers and acquisitions. He has extensive business-oriented healthcare knowledge but enjoys analyzing all types of data with R and Python. He loves the challenges involved in the data science process, and his ingenious demeanor was tempered while serving as a Peace Corps volunteer in Morocco. He is completing a master's of science in computer information systems from Boston University, with a data analytics concentration.

    Browse publications by this author
  • Ruze Richards

    Ruze Richards is a data scientist and cloud architect who has spent most of his career building high-performance analytics systems for enterprises and startups. He is passionate about AI and machine learning. He began his career as a physicist, felt excited about neural networks, and started working at AT&T Bell Labs to further pursue this area of interest. He is thrilled to spread the knowledge and help people achieve their goals.

    Browse publications by this author
The Applied AI and Natural Language Processing Workshop
Unlock this book and the full library FREE for 7 days
Start now