Reader small image

You're reading from  Mastering NLP from Foundations to LLMs

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781804619186
Edition1st Edition
Right arrow
Authors (2):
Lior Gazit
Lior Gazit
author image
Lior Gazit

Lior Gazit is a highly skilled Machine Learning professional with a proven track record of success in building and leading teams drive business growth. He is an expert in Natural Language Processing and has successfully developed innovative Machine Learning pipelines and products. He holds a Master degree and has published in peer-reviewed journals and conferences. As a Senior Director of the Machine Learning group in the Financial sector, and a Principal Machine Learning Advisor at an emerging startup, Lior is a respected leader in the industry, with a wealth of knowledge and experience to share. With much passion and inspiration, Lior is dedicated to using Machine Learning to drive positive change and growth in his organizations.
Read more about Lior Gazit

Meysam Ghaffari
Meysam Ghaffari
author image
Meysam Ghaffari

Meysam Ghaffari is a Senior Data Scientist with a strong background in Natural Language Processing and Deep Learning. Currently working at MSKCC, where he specialize in developing and improving Machine Learning and NLP models for healthcare problems. He has over 9 years of experience in Machine Learning and over 4 years of experience in NLP and Deep Learning. He received his Ph.D. in Computer Science from Florida State University, His MS in Computer Science - Artificial Intelligence from Isfahan University of Technology and his B.S. in Computer Science at Iran University of Science and Technology. He also worked as a post doctoral research associate at University of Wisconsin-Madison before joining MSKCC.
Read more about Meysam Ghaffari

View More author details
Right arrow

Preface

This book provides an in-depth introduction to natural language processing (NLP) techniques, starting with the mathematical foundations of machine learning (ML) and working up to advanced NLP applications such as large language models (LLMs) and AI applications. As part of your learning experience, you’ll get to grips with linear algebra, optimization, probability, and statistics, which are essential for understanding and implementing ML and NLP algorithms. You’ll also explore general ML techniques and find out how they relate to NLP. The preprocessing of text data, including methods for cleaning and preparing text for analysis, will follow, right before you learn how to perform text classification, which is the task of assigning a label or category to a piece of text based on its content. The advanced topics of LLMs’ theory, design, and applications will be discussed toward the end of the book, as will the future trends in NLP, which will feature expert opinions on the future of the field. To strengthen your practical skills, you’ll also work on mocked real-world NLP business problems and solutions.

Who this book is for

This book is for technical folks, ranging from deep learning and ML researchers, hands-on NLP practitioners, and ML/NLP educators, to STEM students. Professionals working with text as part of their projects and existing NLP practitioners will also find plenty of useful information in this book. Beginner-level ML knowledge and a basic working knowledge of Python will help you get the best out of this book.

What this book covers

Chapter 1, Navigating the NLP Landscape: A Comprehensive Introduction, explains what the book is about, which topics we will cover, and who can use this book. This chapter will help you decide whether this book is the right fit for you or not.

Chapter 2, Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP, has three parts. In the first part, we will review the basics of linear algebra that are needed at different parts of the book. In the next part, we will review the basics of statistics, and finally, we will present basic statistical estimators.

Chapter 3, Unleashing Machine Learning Potentials in NLP, discusses different concepts and methods in ML that can be used to tackle NLP problems. We will discuss general feature selection and classification techniques. We will cover general aspects of ML problems, such as train/test/validation selection, and dealing with imbalanced datasets. We will also discuss performance metrics for evaluating ML models that are used in NLP problems. We will explain the theory behind the methods as well as how to use them in code.

Chapter 4, Streamlining Text Preprocessing Techniques for Optimal NLP Performance, talks about various text preprocessing steps in the context of real-world problems. We will explain which steps suit which needs, based on the scenario that is to be solved. There will be a complete Python pipeline presented and reviewed in this chapter.

Chapter 5, Empowering Text Classification: Leveraging Traditional Machine Learning Techniques, explains how to perform text classification. Theory and implementation will also be explained. A comprehensive Python notebook will be covered as a case study.

Chapter 6, Text Classification Reimagined: Delving Deep into Deep Learning Language Models, covers the problems that can be solved using deep learning neural networks. The different problems in this category will be introduced to you so you can learn how to efficiently solve them. The theory of the methods will be explained here and a comprehensive Python notebook will be covered as a case study.

Chapter 7, Demystifying Large Language Models: Theory, Design, and Langchain Implementation, outlines the motivations behind the development and usage of LLMs, alongside the challenges faced during their creation. Through an examination of state-of-the-art model designs, you will gain comprehensive insights into the theoretical underpinnings and practical applications of LLMs.

Chapter 8, Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG, guides you through setting up LLM applications, both API-based and open source, and delves into prompt engineering and RAGs via LangChain. We will review practical applications in code.

Chapter 9, Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs, dives into enhancing LLM performance using RAG, exploring advanced methodologies, automatic web source retrieval, prompt compression, API-cost reduction, and collaborative multi-agent LLM teams, pushing the boundaries of current LLM applications. Here, you will review multiple Python notebooks, each handling different advanced solutions to practical use cases.

Chapter 10, Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI, dives into the transformative impact of LLMs and AI on technology, culture, and society, exploring key trends, computational advancements, the significance of large datasets, and the evolution, purpose, and social implications of LLMs in business and beyond.

Chapter 11, Exclusive Industry Insights: Perspectives and Predictions from World Class Experts, offers a deep dive into future NLP and LLM trends through conversations with experts in legal, research, and executive roles, exploring challenges, opportunities, and the intersection of LLMs with professional practices and ethical considerations.

To get the most out of this book

All the code presented in this book is in the form of a Jupyter notebook. All the code was developed with Python 3.10.X and is expected to work on later versions as well.

Software/hardware covered in the book

Operating system requirements

Access to a Python environment via one of the following:

  • Accessing Google Colab, which is free and easy from any browser on any device (recommended)
  • A local/cloud development environment of Python with the ability to install public packages and access OpenAI’s API

Windows, macOS, or Linux

Sufficient computation resources, as follows:

  • The previously recommended free access to Google Colab includes a free GPU instance
  • If opting to avoid Google Colab, the local/cloud environment should have a GPU for several code examples

As the code examples in this book have a diversified set of use cases, for some of the advanced LLM solutions, you will need an OpenAI account, which will allow an API key.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Mastering-NLP-from-Foundations-to-LLMs If there’s an update to the code, it will be updated in the GitHub repository.

Throughout the book we review complete code notebooks that represent solutions on a professional industry level:

Chapter

Notebook Name

4

Ch4_Preprocessing_Pipeline.ipynb

Ch4_NER_and_POS.ipynb

5

Ch5_Text_Classification_Traditional_ML.ipynb

6

Ch6_Text_Classification_DL.ipynb

8

Ch8_Setting_Up_Close_Source_and_Open_Source_LLMs.ipynb

Ch8_Setting_Up_LangChain_Configurations_and_Pipeline.ipynb

9

Ch9_Advanced_LangChain_Configurations_and_Pipeline.ipynb

Ch9_Advanced_Methods_with_Chains.ipynb

Ch9_Completing_a_Complex_Analysis_with_a_Team_of_LLM_Agents.ipynb

Ch9_RAGLlamaIndex_Prompt_Compression.ipynb

Ch9_Retrieve_Content_from_a_YouTube_Video_and_Summarize.ipynb

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Now, we add a feature for achieving the syntax. We define the output_parser variable, and we use a different function for generating the output, predict_and_parse().”

A block of code is set as follows:

import pandas as pd
import matplotlib.pyplot as plt
# Load the record dict from URL
import requests
import pickle

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

qa_engineer (to manager_0):
exitcode: 0 (execution succeeded)
Code output:
Figure(640x480)
programmer (to manager_0):
TERMINATE

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “While we chose one particular database, you can refer to the Vector Store page to read more about the different choices.”

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit www.packtpub.com.

Share Your Thoughts

Once you’ve read Mastering NLP from Foundations to LLMs, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below
Download a free PDF copy of this book

https://packt.link/free-ebook/978-1-80461-918-6

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering NLP from Foundations to LLMs
Published in: Apr 2024Publisher: PacktISBN-13: 9781804619186
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Lior Gazit

Lior Gazit is a highly skilled Machine Learning professional with a proven track record of success in building and leading teams drive business growth. He is an expert in Natural Language Processing and has successfully developed innovative Machine Learning pipelines and products. He holds a Master degree and has published in peer-reviewed journals and conferences. As a Senior Director of the Machine Learning group in the Financial sector, and a Principal Machine Learning Advisor at an emerging startup, Lior is a respected leader in the industry, with a wealth of knowledge and experience to share. With much passion and inspiration, Lior is dedicated to using Machine Learning to drive positive change and growth in his organizations.
Read more about Lior Gazit

author image
Meysam Ghaffari

Meysam Ghaffari is a Senior Data Scientist with a strong background in Natural Language Processing and Deep Learning. Currently working at MSKCC, where he specialize in developing and improving Machine Learning and NLP models for healthcare problems. He has over 9 years of experience in Machine Learning and over 4 years of experience in NLP and Deep Learning. He received his Ph.D. in Computer Science from Florida State University, His MS in Computer Science - Artificial Intelligence from Isfahan University of Technology and his B.S. in Computer Science at Iran University of Science and Technology. He also worked as a post doctoral research associate at University of Wisconsin-Madison before joining MSKCC.
Read more about Meysam Ghaffari