The Applied AI and Natural Language Processing Workshop

By Krishna Sankar , Jeffrey Jackovich , Ruze Richards
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    1. An Introduction to AWS
About this book

Are you fascinated with applications like Alexa and Siri and how they accurately process information within seconds before returning accurate results? Are you looking for a practical guide that will teach you how to build intelligent applications that can revolutionize the world of artificial intelligence? The Applied AI and NLP Workshop will take you on a practical journey where you will learn how to build artificial intelligence (AI) and natural language processing (NLP) applications with Amazon Web services (AWS).

Starting with an introduction to AI and machine learning, this book will explain how Amazon S3, or Amazon Simple Storage Service, works. You’ll then integrate AI with AWS to build serverless services and use Amazon’s NLP service Comprehend to perform text analysis on a document. As you advance, the book will help you get to grips with topic modeling to extract and analyze common themes on a set of documents with unknown topics. You’ll also work with Amazon Lex to create and customize a chatbot for task automation and use Amazon Rekognition for detecting objects, scenes, and text in images.

By the end of The Applied AI and NLP Workshop, you’ll be equipped with the knowledge and skills needed to build scalable intelligent applications with AWS.

Publication date:
July 2020
Publisher
Packt
Pages
384
ISBN
9781800208742

 

1. An Introduction to AWS

Overview

In this chapter, we start off with the basic concepts of cloud computing, Artificial Intelligence (AI), and Machine Learning (ML). These are the foundational elements that we will be working with throughout this book. The guided instructions in this chapter will equip you with the skills necessary to store and retrieve data with Amazon Simple Storage Service (S3) while you learn the core concepts of this technology. Next, you will apply your S3 knowledge by importing and exporting text data via the management console and the Command Line Interface (CLI). By the end of the chapter, you will be able to confidently work with the management console and the CLI so that you can test AI and ML services.

 

Introduction

We are in an era of unprecedented computing capabilities—serverless computing with autonomous functions that can scale elastically from zero to a million users and back to zero in seconds, innovative intelligent bot frameworks that can live in a contact center in the cloud that we can spin up with a small amount of configuration, and the ability to extract text from images, tables, and scanned documents such as medical records and business and tax documents.

Of course, we are talking about the cloud services available at our fingertips, specifically from Amazon. In 2004, Amazon first offered cloud computing as a service, and now (according to Forbes) the cloud market is worth over $30 billion, growing at a rate of 30-50% yearly. More and more people prefer to do their computing in the cloud.

So, what is cloud computing? It is a set of computing services of which you can use as much as you need and can afford and pay for on an as-you-go basis. So, enterprises switch from their own hosting to the cloud. Beyond that, you get not only a cost-efficient way of doing your computing, but you also get a wider and wider variety of these services.

While there is a huge set of cloud services offered by Amazon, in this book, we will work with Amazon Web Services (AWS) for Artificial Intelligence (AI) and Machine Learning (ML). In the process, we will also use AWS Lambda for serverless computing, AWS Simple Storage Service, and AWS API Gateway for networking and content delivery.

This chapter will introduce you to the AWS interface and will teach you how to store and retrieve data with Amazon Simple Storage Service (S3). Then, you will apply your S3 knowledge by importing and exporting text data via the management console and the CLI. Lastly, you will learn how to locate and test AI and ML services.

In later chapters, you will get a chance to apply Natural Language Processing (NLP) techniques to analyze documents, program serverless computing, use AI/ML services for topic and theme extraction, construct your own fully capable contact center with its own telephone number, develop bots that answer calls in your own contact center, and finally, program image analysis with ML to extract text from images (such as street signs) and perform facial recognition. Overall, it is going to be an interesting journey that will end with us commanding an infrastructure of vast resources for AI and ML.

 

How Is AWS Special?

Today, there are many cloud providers, with the market share breakdown as follows: as per the Canalys analysis (https://www.canalys.com/static/press_release/2020/Canalys---Cloud-market-share-Q4-2019-and-full-year-2019.pdf), as of Q4 2019, AWS is the top vendor, owning nearly a third of the overall public cloud infrastructure market (32%), leading by a wide margin over Microsoft (18%), Google (6%), and Alibaba (5%).

These numbers vary depending on the source, and they may change in the future, but all agree that Amazon is the largest provider at the moment. One of the reasons for this is that Amazon offers a very large array of cloud services. In fact, one of their competitive advantages is exactly that: a very broad and deep cloud computing ecosystem. For example, in the area of ML, Amazon has thousands of use cases, with the professed goal of every imaginable ML service being provided on AWS. This explains our focus on doing ML on AWS.

What Is ML?

ML and AI go hand in hand. ML is the art and science of predicting real-world outcomes based on knowledge of the world and its history. You build a model that allows you to predict the future. The model is based on a formula or a process that formulates this prediction. The model is trained using data.

AI is a wider area of science, which includes, together with ML, all the ways of imitating human behavior and capabilities. However, the way people use these terms vary, depending on who you ask. People also tend to use the current most popular term, mostly for search engine optimization. In this book, we will take the liberty of using these two terms interchangeably.

ML is essential to learn in today's world because it is an integral part of all industries' competitive and operational data strategies. More specifically, ML allows insights from NLP to power chatbots, ML insights are used in the financial industry; and ML applications allow efficient online recommendation engines, such as friend suggestions on Facebook, Netflix displaying movies you will probably like, and more items to consider on Amazon.

What Is AI?

AI is intelligence that's demonstrated by machines. More specifically, it refers to any device that perceives its environment and takes actions that increase its chance of successfully achieving its goals. Contemporary examples are understanding human speech, competing at the highest levels of strategic games (such as Chess and Go), and autonomous cars.

AI is important because it adds intelligence to existing products. Products that are currently used will be further improved with AI capabilities; for example, Siri was added to a new generation of Apple products. Conversational chatbots can be combined with large amounts of data to improve technologies at home and in the office.

In this chapter, we will introduce you to the first few AWS services that will start you on the way to doing ML on AWS. Whenever we can, we will stick to the free tier of AWS. You get the free tier for 1 year, and it is limited in the number of computing resources you can use. Readers willing to invest a few dollars in learning with a regular AWS account will find the money well spent. Another alternative is to use packaged labs, such as Qwiklabs, which lets you do labs at will, with the added convenience of shutting the labs down so that you will not incur accidental charges when you leave your machines running.

 

What Is Amazon S3?

S3 is an online cloud object storage and retrieval service. Instead of data being associated with a server, S3 storage is server-independent and can be accessed over the internet. Data stored in S3 is managed as objects using an Application Programming Interface (API) that is accessible via the internet (HTTPS).

The benefits of using S3 are as follows:

  • Amazon S3 runs on the largest global cloud infrastructure to deliver 99.99% durability.
  • It provides the widest range of options to transfer data.
  • It allows you to run big data analytics without moving data into a separate analytics system.
  • It supports security standards and compliance certificates.
  • It offers a flexible set of storage management and administration capabilities.

    Note

    For more information, visit https://aws.amazon.com/s3/.

Why Use S3?

S3 is a place to store and retrieve your files. It is recommended for storing static content such as text files, images, audio files, and video files. For example, S3 can be used as a static web server if the website consists exclusively of HTML and images. The website can be connected to an FTP client to serve the static files. In addition, S3 can be used to store user-generated images and text files.

However, the two most important applications of S3 are as follows:

  • To store static data from web pages or mobile apps
  • To implement big data analytics

It can easily be used in conjunction with additional AWS ML and infrastructure services. For example, text documents imported to Amazon S3 can be summarized by code running in an AWS Lambda function that is analyzed using AWS Comprehend. We will cover AWS Lambda and AWS Comprehend in Chapter 2, Analyzing Documents and Text with Natural Language Processing, and Chapter 3, Topic Modeling and Theme Extraction.

The Basics of Working on AWS with S3

The first step to accessing S3 is to create an AWS free-tier account, which provides access to the AWS Management Console. The AWS Management Console is a web application that provides one method to access all AWS's powerful storage and ML/AI services.

The second step is to understand the access level. AWS defines identity and access management (IAM). The same email/password is used to access IAM.

AWS Free-Tier Account

AWS provides a free-tier (within their individual free usage stipulations) account, and one of the included storage services is S3. Thus, you can maximize cost savings and reduce errors before making a large investment by testing services to optimize your ML and AI workflows.

AWS Account Setup and Navigation

Generally, you need an AWS account with Amazon. A good description of the steps is available at https://support.sou.edu/kb/articles/amazon-web-services-account-creation. The steps might vary a little bit, as Amazon might make changes to its processes.

The general steps are:

  1. Create a personal account (if needed; many of you might already be Amazon customers), which might also need a security check.
  2. Create an AWS account. AWS account creation also requires credit card information. But you can also use credit codes.
  3. The AWS free tier offers limited capability for 1 year. The details are at https://aws.amazon.com/free/?all-free-tier.sort-by=item.additionalFields.SortRank&all-free-tier.sort-order=asc.

Downloading the Support Materials for This Book

In this book, you will be programming AWS APIs using Jupyter notebooks, uploading images for AI services and text files to S3, and even writing short code for Lambda functions. These files and programs are located in a GitHub repository, https://packt.live/2O67hxH. You can download the files using the Download ZIP button and then unzip the file:

Figure 1.1: Download support files from GitHub

Figure 1.1: Download support files from GitHub

As an example, we have downloaded the files into the Documents/aws-book/The-Applied-AI-and-Natural-Language-Processing-with-AWS directory:

Figure 1.2: Support files from GitHub in a local directory

Figure 1.2: Support files from GitHub in a local directory

A Word about Jupyter Notebooks

Some of the programs in this book use Jupyter notebooks to run. You will recognize them by the .ipynb file extensions. If you haven't already used Jupyter notebooks, please follow the Installation and setup in the Preface.

Importing and Exporting Data into S3

The way AWS handles big data is by providing the AWS Import/Export service, which allows you to transfer large amounts of data to AWS.

How it works is you mail your storage device to AWS, and AWS will transfer that data using Amazon's high-speed network. Your big data will be loaded into AWS the next business day after it arrives. Once data has been loaded, the storage device is returned to the owner. This is a more cost-efficient way of transferring huge amounts of data and is much faster than transferring it via the internet.

If the amount of data that you need to put into S3 is relatively small, you can simply upload it from your computer. Today, with the increasing capacity of broadband networks, "small" becomes bigger and bigger. Our guideline is 1 TB. Once you have more than this, you may need to think of faster ways to put the data in S3. One of them is the AWS Import/Export Disk Service (https://aws.amazon.com/snowball/disk/details/), where you package your data on a device provided by AWS and ship it to them. Significant amounts of data can then be loaded within a day or a few days.

How S3 Differs from a Filesystem

S3 is used to store almost any type of file, thus, it can get confused with a traditional filesystem because of this similarity. However, S3 differs in a few ways from a traditional filesystem. The folders in a traditional filesystem are buckets in S3; a file in a traditional filesystem is an object in S3. S3 uses objects since you can store any data type (that is, more than files) in buckets.

Another difference is how objects can be accessed. Objects stored in buckets can be accessed from a web service endpoint (such as a web browser, for example, Chrome or Firefox), so each object requires a globally unique name. The name restrictions for objects are similar to the restrictions in selecting a URL when creating a new website. You need to select a unique URL, according to the same logic that your house has a unique address.

For example, if you created a bucket (with public permission settings) named myBucket and then uploaded a text file named pos_sentiment__leaves_of_grass.txt to the bucket, the object would be accessible from a web browser via the corresponding subdomain.

 

Core S3 Concepts

The S3 hierarchy includes the following concepts:

  • Type of data storage: S3 is a key-value store. You provide a unique key, and AWS stores your data as a value. You retrieve the data using the key.
  • Keys: The key is the name assigned to an object that uniquely identifies it inside a bucket. All objects in a bucket have one key associated with them.
  • Objects: Objects are what you store. They are not updatable: if you need to change one byte in the value, you will have to upload the entire object again.
    Figure 1.3: Object storage using a unique key and myBucket

Figure 1.3: Object storage using a unique key and myBucket

  • Bucket: Just like a folder, a bucket is a container where you store objects. Buckets are created at the root level and do not have a filesystem hierarchy. More specifically, you can have multiple buckets, but you cannot have sub-buckets within a bucket. Buckets are the containers for objects, and you can control (create, delete, and list objects in the bucket) access, view access logs, and select the geographical region where Amazon S3 will store the bucket.
  • Region: Region refers to the geographical region, such as us-central or ap-south, where S3 stores a bucket, based on the user's preference. The region can be selected when creating a bucket. The location should be based on where the data will be accessed the most. Overall, specific region selection has the biggest impact if S3 is used to store files for a website that's exclusively accessed in a specific geographic region.

    The object storage in a bucket with different forms is as follows:

    Figure 1.4: Object storage

Figure 1.4: Object storage

S3 Operations

The S3 API is quite simple, and it includes the following operations for the entity in question:

  • Bucket: Create, delete, and list keys in a bucket
  • Object: Write, read, and delete

Here's an example:

Figure 1.5: Object stored in myBucket

Figure 1.5: Object stored in myBucket

 

Data Replication

Amazon replicates data across the region in multiple servers located in Amazon's data centers. Data replication benefits include high availability and durability. More specifically, when you create a new object in S3, the data is saved in S3; however, the change needs to be replicated across the S3 regions. Overall, replication may take some time, and you might notice delays resulting from various replication mechanisms.

After deleting an object, replication can cause a lag time that allows the deleted data to display until the deletion is fully replicated. Creating an object and immediately trying to display it in the object list might be delayed as a result of a replication delay.

The REST Interface

S3's native interface is a Representational State Transfer (REST) API. It is recommended to always use HTTPS requests to perform any S3 operations. The two higher-level interfaces that we will use to interact with S3 are the AWS Management Console and the AWS CLI. Accessing objects with the API is quite simple and includes the following operations for the entity in question:

  • Bucket: Create, delete, or list keys in a bucket
  • Object: Write, read, or delete

Exercise 1.01: Using the AWS Management Console to Create an S3 Bucket

In this exercise, we will prepare a place on AWS to store data for ML. To import a file, you need to have access to the Amazon S3 console:

  1. You should have already completed the account setup detailed earlier in this chapter. Go to https://aws.amazon.com/ and click My Account and then AWS Management Console to open the AWS Management Console in a new browser tab:
    Figure 1.6: Accessing the AWS Management Console via the user’s account

    Figure 1.6: Accessing the AWS Management Console via the user's account

  2. Click inside the search bar located under AWS services, as shown here:
    Figure 1.7: Searching AWS services

    Figure 1.7: Searching AWS services

  3. Type S3 into the search bar and an auto-populated list will appear. Then, click the S3 Scalable Storage in the Cloud option:
    Figure 1.8: Selecting the S3 service

    Figure 1.8: Selecting the S3 service

  4. Now we need to create an S3 bucket. In the S3 dashboard, click the Create bucket button. If this is the first time that you are creating a bucket, your screen will look like this:
    Figure 1.9: Creating a bucket

    Figure 1.9: Creating a bucket

    If you have already created S3 buckets, your dashboard will list all the buckets you have created. Enter a unique bucket name: Bucket names must be unique across S3. If you encounter a naming issue, please refer to https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.

    Region: If a default region is auto-populated, then keep the default location. If it is not auto populated, select a region near your current location.

  5. Click the Next button to continue the creation of the bucket:
    Figure 1.10: The Create bucket window

    Figure 1.10: The Create bucket window

  6. An S3 bucket provides the property options Versioning, Server Access Logging, Tags, Object-Level Logging, and Default Encryption; however, we will not enable them.
  7. Your bucket will be displayed in the bucket list, as shown here:
    Figure 1.11: The bucket has been created

Figure 1.11: The bucket has been created

In this exercise, we have created a place for our files to be stored on the cloud. In the next exercise, we will learn the process of storing and retrieving our files from this place.

Exercise 1.02: Importing and Exporting the File with Your S3 Bucket

In this exercise, we will show you how to place your data in S3 on Amazon, and how to retrieve it from there.

Follow these steps to complete this exercise:

Importing a file:

  1. Click the bucket's name to navigate to the bucket:
    Figure 1.12: Navigate to the bucket

    Figure 1.12: Navigate to the bucket

  2. You are on the bucket's home page. Select Upload:
    Figure 1.13: Uploading a file into the bucket

    Figure 1.13: Uploading a file into the bucket

  3. To select a file to upload, click Add files:
    Figure 1.14: Adding a new file to the bucket

    Figure 1.14: Adding a new file to the bucket

  4. We will upload the pos_sentiment__leaves_of_grass.txt file from the https://packt.live/3e9lwfR GitHub repository. The best way is to download the repository to your local disk. Then you can select the file:
    Figure 1.15: Selecting the file to upload to the S3 bucket

    Figure 1.15: Selecting the file to upload to the S3 bucket

  5. After selecting a file to upload, select Next:
    Figure 1.16: Selecting the file to upload to the bucket

    Figure 1.16: Selecting the file to upload to the bucket

  6. Click the Next button and leave the default options selected:
    Figure 1.17: The permissions page while uploading the file

    Figure 1.17: The permissions page while uploading the file

  7. You can set property settings for your object, such as Storage class, Encryption, and Metadata. However, leave the default values as they are and then click the Next button:
    Figure 1.18: Setting the properties

    Figure 1.18: Setting the properties

  8. Click the Upload button to upload the files:
    Figure 1.19: Uploading the files

    Figure 1.19: Uploading the files

  9. You will be directed to your object on your bucket's home screen:
    Figure 1.20: Files uploaded to the bucket

Figure 1.20: Files uploaded to the bucket

Exporting a file:

  1. Select the checkbox next to the file to export (Red Marker #1 – see the following screenshot). This populates the file's information display screen. Click Download (Red Marker #2 – see the following screenshot) to retrieve the text file:
    Figure 1.21: Exporting the file

Figure 1.21: Exporting the file

The file will download, as shown in the bottom left-hand corner of the screen:

Figure 1.22: Downloading the file to export

Figure 1.22: Downloading the file to export

In this exercise, you learned how to import a file to and export a file from your Amazon S3 bucket. As you can see, the process is quite easy thanks to the simple user interface.

 

The AWS CLI

The CLI is an open-source tool built on the AWS SDK for Python (Boto) to perform setups, determine whether calls work as intended, verify status information, and more. The CLI provides another access tool for all AWS services, including S3. Unlike the Management Console, the CLI can be automated via scripts.

To authenticate your AWS account to the CLI, you must create a configuration file to obtain your public key and secret key. Next, you will install and then configure the AWS CLI.

Exercise 1.03: Configuring the CLI

In this exercise, we will configure the CLI with our AWS access key ID and AWS secret access key. Follow these steps to complete the exercise:

  1. First, go to the AWS Management Console and then IAM. You might have to log in to the account. Then, click Users:
    Figure 1.23: The Management Console home page with the Users option highlighted

    Figure 1.23: The Management Console home page with the Users option highlighted

  2. In the upper-right corner of the signed-in AWS Management Console, click My Security Credentials:
    Figure 1.24: My Security Credentials

    Figure 1.24: My Security Credentials

  3. Next, click Continue to Security Credentials:
    Figure 1.25: Security Credentials

    Figure 1.25: Security Credentials

  4. Click the Access keys (access key ID and secret access key) option:
    Figure 1.26: Accessing key generation

    Figure 1.26: Accessing key generation

  5. Then, click Create New Access Key:
    Figure 1.27: Creating a new access key

    Figure 1.27: Creating a new access key

  6. Click Download Key File to download the key file:
    Figure 1.28: Downloading the key file

    Figure 1.28: Downloading the key file

    The rootkey.csv file that contains the keys will be downloaded. You can view the details by opening the file.

    Note

    Store the keys in a safe location. Protect your AWS account and never share, email, or store keys in a non-secure location. An AWS representative will never request your keys, so be vigilant when it comes to potential phishing scams.

  7. Open Command Prompt and type aws configure.
  8. You will be prompted for four input variables. Enter your information, then press Enter after each input:
    AWS Access Key ID
    AWS Secret Access Key 
    Default region 
    Default output format (json)
  9. The name is obtained in your console (Oregon is displayed here, but yours is determined by your unique location):
    Figure 1.29: Location search

    Figure 1.29: Location search

  10. The codes for regions are obtained from the following Available Regions list:
    Figure 1.30: List of available regions

    Figure 1.30: List of available regions

  11. The command Prompt's final input variable will look as follows.
    Then, press Enter:
    Figure 1.31: The last step in the AWS CLI configuration in Command Prompt

Figure 1.31: The last step in the AWS CLI configuration in Command Prompt

You can change the configuration anytime by entering the aws configure command.

In this exercise, you configured the security credentials for your AWS account. We will use these credentials to access the AWS APIs in the rest of the book.

 

CLI Usage

When using a command, specify at least one path argument. The two-path arguments are LocalPath and S3Uri:

  • LocalPath: This represents the path of a local file or directory, which can be written as an absolute or relative path.
  • S3Uri: This represents the location of an S3 object, prefix, or bucket. The command form is s3://myBucketName/myKey. The path argument must begin with s3:// to indicate that the path argument refers to an S3 object.

The overall command structure is aws s3 <Command> [<Arg> …]. The following table shows the different commands with a description and an example:

Figure 1.32: Command list

Figure 1.32: Command list

 

Recursion and Parameters

Importing files one at a time is time-consuming, especially if you have many files in a folder that need to be imported. A simple solution is to use a recursive procedure. A recursive procedure is one that can call itself and saves you, the user, from entering the same import command for each file.

Performing a recursive CLI command requires passing a parameter to the API. This sounds complicated, but it is incredibly easy. First, a parameter is simply a name or option that is passed to a program to affect the operation of the receiving program. In our case, the parameter is recursive, and the entire command to perform the recursive command is as follows:

aws s3 cp s3://myBucket . --recursive

With this command, all the S3 objects in the bucket are copied to the specified directory:

Figure 1.33: Parameter list

Figure 1.33: Parameter list

Activity 1.01: Putting the Data into S3 with the CLI

Let's start with a note about the terminology used in this activity. Putting data into S3 can also be called uploading. Getting it from there is called downloading. Sometimes, it is also called importing and exporting. Please do not confuse this with AWS Import/Export, which is a specific AWS service for sending a large amount of data to AWS or getting it back from AWS.

In this activity, we will be using the CLI to create a bucket in S3 and import a second text file. Suppose that you are creating a chatbot. You have identified text documents that contain content that will allow your chatbot to interact with customers more effectively. Before the text documents can be parsed, they need to be uploaded to an S3 bucket. Once they are in S3, further analysis will be possible. To ensure that this has happened correctly, you will need to install Python, set up the Amazon CLI tools, and have a user authenticated with the CLI:

  1. Configure the CLI and verify that it can successfully connect to your AWS environment.
  2. Create a new S3 bucket.
  3. Import your text file into the bucket.
  4. Export the file from the bucket and verify the exported objects.

    Note

    The solution for this activity can be found via this link.

 

Using the AWS Console to Identify ML Services

The AWS Console provides a web-based interface to navigate, discover, and utilize AWS services for AI and ML. In this topic, we will explore two ways to use the Console to search ML services. Also, we will test an ML API with text data retrieved from a website.

Exercise 1.04: Navigating the AWS Management Console

In this exercise, we will navigate the AWS Management Console to locate ML services. Starting from the console, https://console.aws.amazon.com/console/, and only using console search features, let's navigate to the Amazon Lex (https://console.aws.amazon.com/lex/) service information page:

  1. Click https://console.aws.amazon.com/console/ to navigate to the AWS Console. You might have to log in to your AWS account. Then, click Services:
    Figure 1.34: AWS Console

    Figure 1.34: AWS Console

  2. Scroll down the page to view all the ML services. Then, click Amazon Lex. If Lex is not available at your location, you may consider switching to a different one.
    Figure 1.35: Options for ML

    Figure 1.35: Options for ML

  3. You will be redirected to the Amazon Lex home screen:
    Figure 1.36: Amazon Lex home screen

Figure 1.36: Amazon Lex home screen

You will get a chance to work with Amazon Lex in Chapter 5, Using Speech with the Chatbot. For now, you can click the different Learn More links to get to know Lex's features a bit better. If you're itching to try it out right away, you may click Get Started.

Locating new AWS services is an essential skill for discovering more tools to provide solutions for your data projects. Now, let's test the API features of Amazon Comprehend.

Exercise 1.05: Testing the Amazon Comprehend API Features

Now that you have mastered S3, let's do a quick exercise that extends beyond storing a file and prepares you for the rest of the chapters. In this exercise, we will display text analysis output by using a partial text file input in the API explorer. Exploring an API is a skill that saves development time by making sure that the output is in the desired format for your project. Here, we will test the AWS Comprehend text analysis features.

Note

You will work with Comprehend in more detail in Chapter 4, Conversational Artificial Intelligence. We will also introduce the various AWS AI services and how to work with them. Here, we are doing an exercise to get you familiar with interacting with AWS in multiple ways.

Here is the user story: suppose that you are creating a chatbot. Before taking any steps, we first need to understand the business goal or statements or objectives. Then we need to select the relevant AWS services. For example, if our business goal is related to storage, we will go for the storage domain.

You have identified a business topic and the corresponding text documents with content that will allow the chatbot to make your business successful. Your next step is to identify/verify an AWS service to parse the text document for sentiment, language, key phrases, and entities. Amazon's AI services include AWS Comprehend, which does this very well.

Before investing time in writing a complete program, you need to test the AWS service's features via the AWS Management Console's interface. To ensure that this happens correctly, you will need to search the web for an article (written in English or Spanish) that contains the subject matter that you're interested in. You are aware that exploring APIs is a skill that can save development time by ensuring that the output is in the desired format for your project.

Now that we have the user story, let's carry out this task:

Similarly, to Exercise 1.01, Using the AWS Management Console to Create an S3 Bucket, you should already have done the account setup as detailed earlier in this chapter.

  1. Go to https://aws.amazon.com/ and click My Account and then AWS Management Console to open the AWS Management Console in a new browser tab:
    Figure 1.37: Accessing the AWS Management Console via the user’s account

    Figure 1.37: Accessing the AWS Management Console via the user's account

  2. Click inside the search bar (under Find Services) in the AWS Management Console to search for Amazon Comprehend and you will be directed to the Amazon Comprehend Console screen as shown below:
    Figure 1.38: Searching for AWS services

    Figure 1.38: Searching for AWS services

  3. Type in amazon comp. As you type, Amazon will autocomplete and show the services that match the name typed in the search box:
    Figure 1.39: Selecting the AWS service

    Figure 1.39: Selecting the AWS service

  4. You will see the Amazon Comprehend landing page:
    Figure 1.40: The Amazon Comprehend page

    Figure 1.40: The Amazon Comprehend page

  5. Click Launch Amazon Comprehend and you will be directed to the Real-time analysis page. You can either use their built-in model or you can provide a custom one. We will use their built-in model:
    Figure 1.41: Real-time analysis

    Figure 1.41: Real-time analysis

    You can input text and click Analyze. Let's copy a poem by Walt Whitman from http://www.gutenberg.org/cache/epub/1322/pg1322.txt and analyze it. Navigate to Topic modeling and Documentation. There is a GUI for exploring the API, and the right side provides real-time output for text input.

  6. Click Clear text to clear all default services. Navigate to open the following URL in a new tab: http://www.gutenberg.org/cache/epub/1322/pg1322.txt.
  7. Copy the first poem and paste it in the Input text box:
    Figure 1.42: Amazon Comprehend real-time analysis screen

    Figure 1.42: Amazon Comprehend real-time analysis screen

  8. Click Analyze to see the output:
    Figure 1.43: Analyzing the output

    Figure 1.43: Analyzing the output

  9. Review the Entities, Key phrases, and Language tabs and click the Sentiment tab to view the sentiment analysis:
    Figure 1.44: Sentiment tab results

    Figure 1.44: Sentiment tab results

  10. You can try other tabs. The language will show English with 99% confidence, and the Syntax tab is interesting and has lots of information. The Key phrases tab underlines the key phrases and lists them:
    Figure 1.45: Key phrases tab results

Figure 1.45: Key phrases tab results

Try some other text – maybe movie comments from IMDb or comments from Amazon product reviews – and see how Amazon Comprehend handles sentiment. A cool thing to try would be sarcasm or even comments that change their polarity at the last minute, for example, "The book is really good, but the movie is dreadful" or "The screenplay and direction were done by people who couldn't fathom what was good about the novel," for interesting results.

The Utility of the AWS Console Interface to AI Services

The Comprehend console interface is very useful for testing ideas. As you will see in later chapters, we can use a similar interface to Amazon Textract to see if we can extract tables and other information from forms such as tax returns, company statements such as profit and loss or balance sheets, medical forms, and so forth.

While we need programming and development to develop a robotic process automation application, the console interface helps us to quickly test our business hypotheses. For example, maybe you want to automate a loan processing pipeline in which you are manually entering information from different documents. To see if any AWS AI services would fit the need, you can scan all the relevant documents and test them with the AWS Textract console. Later, in Chapter 5, Computer Vision and Image Processing, you will work with scanned documents and Amazon Textract.

You can also check how accurately the AWS built-in models can extract the required information. Maybe you will need custom models, maybe the documents are not easily understandable by a machine, but you can find them earlier and plan accordingly. Maybe your application involves medical record handling, which might require more sophisticated custom models. In fact, you can upload a custom model and test it in the console as well.

 

Summary

In this chapter, we started with understanding the basics of cloud computing, AWS, ML, and AI. We then explored S3, created buckets, and exported and imported data to and from S3. At the same time, we explored the AWS command line and its uses. Finally, we worked with the console interface of AWS Comprehend as an example of testing various ideas that relate to analyzing texts and documents.

In the next chapter, you will learn more about AWS AI services, serverless computing, and how to analyze text documents using natural language processing (NLP). Researching new AWS services is essential for discovering additional solutions to solve many machine learning problems that you are working on. Additionally, as you saw, AWS has multiple ways of interacting with its services to help test business ideas, evaluate AI/ML models, and do quick prototyping.

About the Authors
  • Krishna Sankar

    Krishna Sankar is an AI data scientist with Volvo Cars focusing on autonomous vehicles. Previously, he was the chief data scientist at blackarrow, where he was focusing on optimizing user experience via inference, intelligence, and interfaces. He was the director of a data science/bioinformatics startup and also worked as a distinguished engineer at Cisco. He has been speaking at various conferences as well as guest lecturing at the Naval Postgraduate School. His other passion is Lego Robotics. You'll find him at the St.Louis FLL World Competition as the robot design judge.

    Browse publications by this author
  • Jeffrey Jackovich

    Jeffrey Jackovich is a curious data scientist with a background in health-tech and mergers and acquisitions. He has extensive business-oriented healthcare knowledge but enjoys analyzing all types of data with R and Python. He loves the challenges involved in the data science process, and his ingenious demeanor was tempered while serving as a Peace Corps volunt

    Browse publications by this author
  • Ruze Richards

    Ruze Richards is a data scientist and cloud architect who has spent most of his career building high-performance analytics systems for enterprises and startups. He is passionate about AI and machine learning. He began his career as a physicist, felt excited about neural networks, and started working at AT&T Bell Labs to further pursue this area of interest. He is thrilled to spread the knowledge and help people achieve their goals.

    Browse publications by this author
The Applied AI and Natural Language Processing Workshop
Unlock this book and the full library FREE for 7 days
Start now