You're reading from Natural Language Processing with TensorFlow - Second Edition

Product typeBook

Published inJul 2022

Reading LevelIntermediate

PublisherPackt

ISBN-139781838641351

Edition2nd Edition

Languages

Python

Tools

TensorFlow

Concepts

Mobile Application Development

Author (1)

Thushan Ganegedara

Tasks of Natural Language Processing

NLP has a multitude of real-world applications. A good NLP system is one that performs many NLP tasks. When you search for today’s weather on Google or use Google Translate to find out how to say, “How are you?” in French, you rely on a subset of such tasks in NLP. We will list some of the most ubiquitous tasks here, and this book covers most of these tasks:

Tokenization: Tokenization is the task of separating a text corpus into atomic units (for example, words or characters). Although it may seem trivial for a language like English, tokenization is an important task. For example, in the Japanese language, words are not delimited by spaces or punctuation marks.
Word-Sense Disambiguation (WSD): WSD is the task of identifying the correct meaning of a word. For example, in the sentences, The dog barked at the mailman and Tree bark is sometimes used as a medicine, the word bark has two different meanings. WSD is critical for tasks such as question answering.
Named Entity Recognition (NER): NER attempts to extract entities (for example, person, location, and organization) from a given body of text or a text corpus. For example, the sentence, John gave Mary two apples at school on Monday will be transformed to [John]name gave [Mary]name [two]number apples at [school]organization on [Monday]time. NER is an imperative topic in fields such as information retrieval and knowledge representation.
Part-of-Speech (PoS) tagging: PoS tagging is the task of assigning words to their respective parts of speech. It can either be basic tags such as noun, verb, adjective, adverb, and preposition, or it can be granular such as proper noun, common noun, phrasal verb, verb, and so on. The Penn Treebank project, a popular project focusing PoS, defines a comprehensive list of PoS tags at https://www.ling.upenn.edu/courses/ling001/penn_treebank_pos.html.
Sentence/synopsis classification: Sentence or synopsis (for example, movie reviews) classification has many use cases such as spam detection, news article classification (for example, political, technology, and sport), and product review ratings (that is, positive or negative). This is achieved by training a classification model with labeled data (that is, reviews annotated by humans, with either a positive or negative label).
Text generation: In text generation, a learning model (for example, a neural network) is trained with text corpora (a large collection of textual documents), and it then predicts new text that follows. For example, language modeling can output an entirely new science fiction story by using existing science fiction stories for training.

Recently, OpenAI released a language model known as OpenAI-GPT-2, which can generate incredibly realistic text. Furthermore, this task plays a very important role in understanding language, which helps a downstream decision-support model get off the ground quickly.

Question Answering (QA): QA techniques possess a high commercial value, and such techniques are found at the foundation of chatbots and VA (for example, Google Assistant and Apple Siri). Chatbots have been adopted by many companies for customer support. Chatbots can be used to answer and resolve straightforward customer concerns (for example, changing a customer’s monthly mobile plan), which can be solved without human intervention. QA touches upon many other aspects of NLP such as information retrieval and knowledge representation. Consequently, all this makes developing a QA system very difficult.
Machine Translation (MT): MT is the task of transforming a sentence/phrase from a source language (for example, German) to a target language (for example, English). This is a very challenging task, as different languages have different syntactical structures, which means that it is not a one-to-one transformation. Furthermore, word-to-word relationships between languages can be one-to-many, one-to-one, many-to-one, or many-to-many. This is known as the word alignment problem in MT literature.

Finally, to develop a system that can assist a human in day-to-day tasks (for example, VA or a chatbot) many of these tasks need to be orchestrated in a seamless manner. As we saw in the previous example where the user asks, “Can you show me a good Italian restaurant nearby?” several different NLP tasks, such as speech-to-text conversion, semantic and sentiment analyses, question answering, and machine translation, need to be completed. In Figure 1.1, we provide a hierarchical taxonomy of different NLP tasks categorized into several different types. It is a difficult task to attribute an NLP task to a single classification. Therefore, you can see some tasks spanning multiple categories. We will split the categories into two main types: language-based (light-colored with black text) and problem formulation-based (dark-colored with white text). The linguistic breakdown has two categories: syntactic (structure-based) and semantic (meaning-based). The problem formulation-based breakdown has three categories: preprocessing tasks (tasks that are performed on text data before feeding to a model), discriminative tasks (tasks where we attempt to assign an input text to one or more categories from a set of predefined categories) and generative tasks (tasks where we attempt to generate a new textual output). Of course, this is one classification among many. But it will show how difficult it is to assign a specific NLP task to a specific category.

Figure 1.1: A taxonomy of the popular tasks of NLP categorized under broader categories

Having understood the various tasks in NLP, let us now move on to understand how we can solve these tasks with the help of machines. We will discuss both the traditional method and the deep- learning-based approach.

You have been reading a chapter from

Natural Language Processing with TensorFlow - Second Edition

Published in: Jul 2022Publisher: PacktISBN-13: 9781838641351

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Thushan Ganegedara

Thushan is a seasoned ML practitioner with 4+ years of experience in the industry. Currently he is a senior machine learning engineer at Canva; an Australian startup that founded the online visual design software, Canva, serving millions of customers. His efforts are particularly concentrated in the search and recommendations group working on both visual and textual content. Prior to Canva, Thushan was a senior data scientist at QBE Insurance; an Australian Insurance company. Thushan was developing ML solutions for use-cases related to insurance claims. He also led efforts in developing a Speech2Text pipeline there. He obtained his PhD specializing in machine learning from the University of Sydney in 2018.
Read more about Thushan Ganegedara

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages