Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering NLP from Foundations to LLMs

You're reading from  Mastering NLP from Foundations to LLMs

Product type Book
Published in Apr 2024
Publisher Packt
ISBN-13 9781804619186
Pages 340 pages
Edition 1st Edition
Languages
Authors (2):
Lior Gazit Lior Gazit
Profile icon Lior Gazit
Meysam Ghaffari Meysam Ghaffari
Profile icon Meysam Ghaffari
View More author details

Table of Contents (14) Chapters

Preface 1. Chapter 1: Navigating the NLP Landscape: A Comprehensive Introduction 2. Chapter 2: Mastering Linear Algebra, Probability, and Statistics for Machine Learning and NLP 3. Chapter 3: Unleashing Machine Learning Potentials in Natural Language Processing 4. Chapter 4: Streamlining Text Preprocessing Techniques for Optimal NLP Performance 5. Chapter 5: Empowering Text Classification: Leveraging Traditional Machine Learning Techniques 6. Chapter 6: Text Classification Reimagined: Delving Deep into Deep Learning Language Models 7. Chapter 7: Demystifying Large Language Models: Theory, Design, and Langchain Implementation 8. Chapter 8: Accessing the Power of Large Language Models: Advanced Setup and Integration with RAG 9. Chapter 9: Exploring the Frontiers: Advanced Applications and Innovations Driven by LLMs 10. Chapter 10: Riding the Wave: Analyzing Past, Present, and Future Trends Shaped by LLMs and AI 11. Chapter 11: Exclusive Industry Insights: Perspectives and Predictions from World Class Experts 12. Index 13. Other Books You May Enjoy

Challenges in developing LLMs

Developing LLMs poses a unique set of challenges, including but not limited to handling massive amounts of data, requiring vast computational resources, and the risk of introducing or perpetuating bias. The following subsections outline the detailed explanations of these challenges.

Amounts of data

LLMs require enormous amounts of data for training. As the model size grows, so does the need for diverse, high-quality training data. However, collecting and curating such large datasets is a challenging task. It can be time - consuming and expensive. There’s also the risk of inadvertently including sensitive or inappropriate data in the training set. To have more of an idea, BERT has been trained using 3.3 billion words from Wikipedia and BookCorpus. GPT-2 has been trained on 40 GB of text data, and GPT-3 has been trained on 570 GB of text data. Table 7.2 shows the number of parameters and size of training data of a few recent LMs.

...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}