Reader small image

You're reading from  Natural Language Understanding with Python

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781804613429
Edition1st Edition
Right arrow
Author (1)
Deborah A. Dahl
Deborah A. Dahl
author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl

Right arrow

Data exploration

Data exploration, which is sometimes also called exploratory data analysis (EDA), is the process of taking a first look at your data to see what kinds of patterns there are to get an overall perspective on the full dataset. These patterns and overall perspective will help us identify the most appropriate processing approaches. Because some NLU techniques are very computationally intensive, we want to ensure that we don’t waste a lot of time applying a technique that is inappropriate for a particular dataset. Data exploration can help us narrow down the options for techniques at the very beginning of our project. Visualization is a great help in data exploration because it is a quick way to get the big picture of patterns in the data.

The most basic kind of information about a corpus that we would want to explore includes information such as the number of words, the number of distinct words, the average length of documents, and the number of documents in each...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Natural Language Understanding with Python
Published in: Jun 2023Publisher: PacktISBN-13: 9781804613429

Author (1)

author image
Deborah A. Dahl

Deborah A. Dahl is the principal at Conversational Technologies, with over 30 years of experience in natural language understanding technology. She has developed numerous natural language processing systems for research, commercial, and government applications, including a system for NASA, and speech and natural language components on Android. She has taught over 20 workshops on natural language processing, consulted on many natural language processing applications for her customers, and written over 75 technical papers. Th is is Deborah's fourth book on natural language understanding topics. Deborah has a PhD in linguistics from the University of Minnesota and postdoctoral studies in cognitive science from the University of Pennsylvania.
Read more about Deborah A. Dahl