You're reading from MATLAB for Machine Learning - Second Edition

Product typeBook

Published inJan 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781835087695

Edition2nd Edition

Languages

MATLAB

Tools

MATLAB

Concepts

Machine Learning

Author (1)

Giuseppe Ciaburro

Natural Language Processing Using MATLAB

Natural language processing (NLP) automatically processes information conveyed through spoken or written language. This task is fraught with difficulty and complexity, largely due to the innate ambiguity of human language. To enable machine learning (ML) and interaction with the world in ways typical of humans, it is essential not only to store data but also to teach machines how to translate this data simultaneously into meaningful concepts. As natural language interacts with the environment, it generates predictive knowledge. In this chapter, we will learn the basic concepts of NLP and how to build a model to label sentences.

In this chapter, we’re going to cover the following main topics:

Explaining NLP
Exploring corpora and word and sentence tokenize
Implementing a MATLAB model to label sentences
Understanding gradient boosting techniques

Technical requirements

In this chapter, we will introduce basic ML concepts. To understand these topics, a basic knowledge of algebra and mathematical modeling is needed. You will also require working knowledge of the MATLAB environment.

To work with the MATLAB code in this chapter, you’ll need the following files (available on GitHub at https://github.com/PacktPublishing/MATLAB-for-Machine-Learning-second-edition):

IMDBSentimentClassification.m
ImdbDataset.xlsx

Explaining NLP

NLP is a field that’s dedicated to the development of technology that enables computers to interact with, understand, and generate human language in a way that mimics natural human communication. This involves various techniques and approaches aimed at processing and analyzing the complexities of natural languages, such as English, Chinese, Arabic, and more. The goal is to bridge the gap between human language and computer language, allowing computers to comprehend and generate text as if they were engaging in a conversation with a human interlocutor (Figure 7.1):

Figure 7.1 – NLP tasks

NLP strives to develop information technology tools for analyzing, comprehending, and creating texts in a manner that resonates with human understanding, mimicking interactions with another human rather than a machine. Natural language, both spoken and written, represents the most instinctive and widespread mode of communication. In contrast...

Exploring corpora and word and sentence tokenizers

The analysis of corpora, words, and sentence tokenization forms the basis for comprehensive language understanding. Corpora provides real-world language data for analysis, words constitute the elements of expression, and sentence tokenization structures the text into meaningful units for further investigation. This trio of concepts plays a central role in advancing linguistic research and enhancing NLP capabilities.

Corpora

In linguistics and NLP, corpora refer to extensive collections of written or spoken texts that serve as valuable sources of data for linguistic analysis and language-related studies. Corpora provides a diverse range of language samples, enabling researchers to examine patterns, trends, and variations in language usage, syntax, and semantics across different contexts and genres.

Linguistic corpora represent sizable collections of spoken or written texts, often originating from authentic communication contexts...

Implementing a MATLAB model to label sentences

In this section, we will discuss a very interesting topic that is very popular in today’s society. I am referring to the importance of reviews in influencing a customer’s interest in making the right decision.

Introducing sentiment analysis

Sentiment analysis, a technique that utilizes NLP, extracts and analyzes subjective information from text. Analyzing vast datasets reveals collective opinions that impact various domains. While manual sentiment analysis is challenging, automated methods have emerged. However, automating language modeling is complex and costly due to the nuances of human language. Additionally, the methodology varies across languages, increasing complexity.

A major challenge lies in determining the polarity of opinions. Polarity classification is subjective, with one sentence perceived differently by individuals based on their value systems. The rise of social media has heightened interest in sentiment...

Understanding gradient boosting techniques

To improve the performance of an algorithm, we can perform a series of steps and use different techniques, depending on the type of algorithm and the specific problems being addressed. The first approach involves a thorough analysis of the data to identify possible inaccuracies or shortcomings. In addition, many algorithms have parameters that can be adjusted to achieve better performance – not to mention techniques such as feature scaling or feature selection. A popular technique is to combine the capabilities offered by different algorithms to achieve better overall performance.

Approaching ensemble learning

The concept of ensemble learning involves the use of multiple models combined in a way that maximizes performance by exploiting their strengths and mitigating their relative weaknesses. These ensemble learning methods are based on weak learning models that do not achieve high levels of accuracy on their own, but when combined...

Summary

In this chapter, we studied NLP, which automatically processes information that’s transmitted through spoken or written language. To begin, we analyzed the basic concepts of NLP by identifying the tasks that can be tackled and then moved on to the main approaches concerning text analysis and text generation. We then moved on to analyze corpora, words, and sentence tokenization. Corpora offers authentic language data for examination, with words serving as the fundamental components of expression, and sentence tokenization organizing the text into coherent units for in-depth analysis.

In the second part of this chapter, we analyzed a practical case of using NLP for labeling movie reviews. This is a sentiment analysis problem that aims to automatically identify the polarity of a textual comment. In this example, we were able to practically learn which tools to use in MATLAB to perform this type of analysis. In the final part of this chapter, we analyzed ensemble learning...

The rest of the chapter is locked

You have been reading a chapter from

MATLAB for Machine Learning - Second Edition

Published in: Jan 2024Publisher: PacktISBN-13: 9781835087695

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Giuseppe Ciaburro

Giuseppe Ciaburro holds a PhD and two master's degrees. He works at the Built Environment Control Laboratory - Università degli Studi della Campania "Luigi Vanvitelli". He has over 25 years of work experience in programming, first in the field of combustion and then in acoustics and noise control. His core programming knowledge is in MATLAB, Python and R. As an expert in AI applications to acoustics and noise control problems, Giuseppe has wide experience in researching and teaching. He has several publications to his credit: monographs, scientific journals, and thematic conferences. He was recently included in the world's top 2% scientists list by Stanford University (2022).
Read more about Giuseppe Ciaburro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages