Reader small image

You're reading from  Natural Language Understanding with Python

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781804613429
Edition1st Edition
Right arrow
Author (1)
Deborah A. Dahl
Deborah A. Dahl
author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl

Right arrow

Selecting Libraries and Tools for Natural Language Understanding

This chapter will get you set up to process natural language. We will begin by discussing how to install Python, and then we will discuss general software development tools such as JupyterLab and GitHub. We will also review major Python natural language processing (NLP) libraries, including the Natural Language Toolkit (NLTK), spaCy, and TensorFlow/Keras.

Natural language understanding (NLU) technology has benefited from a wide assortment of very capable, freely available tools. While these tools are very powerful, there is no one library that can do all of the NLP tasks needed for all applications, so it is important to understand what the strengths of the different libraries are and how to combine them.

Making the best use of these tools will greatly accelerate any NLU development project. These tools include the Python language itself, development tools such as JupyterLab, and a number of specific natural language...

Technical requirements

To run the examples in this chapter, you will need the following software:

  • Python 3
  • pip or conda (preferably pip)
  • JupyterLab
  • NLTK
  • spaCy
  • Keras

The next sections will go over the process of installing these packages, which should be installed in the order in which they are listed here.

Installing Python

The first step in setting up your development environment is to install Python. If you have already installed Python on your system, you can skip to the next section, but do make sure that your Python installation includes Python 3, which is required by most NLP libraries. You can check your Python version by entering the following command in a command-line window, and the version will be displayed:

 $ python --version

Note that if you have both Python 2 and Python 3 installed, you may have to run the python3 –version command to check the Python 3 version. If you don’t have Python 3, you’ll need to install it. Some NLP libraries require not just Python 3 but Python 3.7 or greater, so if your version of Python is older than 3.7, you’ll need to update it.

Python runs on almost any operating system that you choose to use, including Windows, macOS, and Linux. Python can be downloaded for your operating system from http://www.python...

Developing software – JupyterLab and GitHub

The development environment can make all the difference in the efficiency of the development process. In this section, we will discuss two popular development resources: JupyterLab and GitHub. If you are familiar with other Python interactive development environments (IDEs), then you can go ahead and use the tools that you’re familiar with. However, the examples discussed in this book will be shown in a JupyterLab environment.

JupyterLab

JupyterLab is a cross-platform coding environment that makes it easy to experiment with different tools and techniques without requiring a lot of setup time. It operates in a browser environment but doesn’t require a cloud server—a local server is sufficient.

Installing JupyterLab is done with the following pip command:

$ pip install jupyterlab

Once JupyterLab is installed, you can run it using the following command:

$ jupyter lab

This command should be run...

Exploring the libraries

In this section, we will review several of the major Python libraries that are used in NLP; specifically, NLTK, spaCy, and Keras. These are very useful libraries, and they can perform most basic NLP tasks. However, as you gain experience with NLP, you will also find additional NLP libraries that may be appropriate for specific tasks as well, and you are encouraged to explore those.

Using NLTK

NLTK (https://www.nltk.org/) is a very popular open source Python library that greatly reduces the effort involved in developing natural language applications by providing support for many frequently performed tasks. NLTK also includes many corpora (sets of ready-to-use natural language texts) that can be used for exploring NLP problems and testing algorithms.

In this section, we will go over what NLTK can do, and then discuss the NLTK installation process.

As we discussed in Chapter 3, many distinct tasks can be performed in an NLU pipeline as the processing...

Looking at an example

To illustrate some of these concepts, we’ll work through an example using JupyterLab where we explore an SA task for movie reviews. We’ll look at how we can apply the NLTK and spaCy packages to get some ideas about what the data is like, which will help us plan further processing.

The corpus (or dataset) that we’ll be looking at is a popular set of 2,000 movie reviews, classified as to whether the writer expressed a positive or negative sentiment about the movie (http://www.cs.cornell.edu/people/pabo/movie-review-data/).

Dataset citation

Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL, 2005.

This is a good example of the task of SA, which was introduced in Chapter 1.

Setting up JupyterLab

We’ll be working with JupyterLab, so let’s start it up. As we saw earlier, you can start JupyterLab by simply typing the...

Summary

In this chapter, we covered the major development tools and Python libraries that are used in NLP application development. We discussed the JupyterLab development environment and the GitHub software repository system. The major libraries that we covered were NLTK, spaCy, and Keras. Although this is by no means an exhaustive list of NLP libraries, it’s sufficient to get a start on almost any NLP project.

We covered installation and basic usage for the major libraries, and we provided some suggested tips on selecting libraries. We summarized some useful auxiliary packages, and we concluded with a simple example of how the libraries can be used to do some NLP tasks.

The topics discussed in this chapter have given you a basic understanding of the most useful Python packages for NLP, which you will be using for the rest of the book. In addition, the discussion in this chapter has given you a start on understanding the principles for selecting tools for future projects...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Natural Language Understanding with Python
Published in: Jun 2023Publisher: PacktISBN-13: 9781804613429
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl