You're reading from fastText Quick Start Guide

Product typeBook

Published inJul 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789130997

Edition1st Edition

Languages

Python

Tools

fastText

Concepts

Mobile Application Development

Author (1)

Joydeep Bhattacharjee

Sentence Classification in FastText

In this chapter, we will cover the following topics:

Sentence classification
fastText supervised learning:
- Architecture
- Hierarchical softmax architecture
- N-grams features and the hashing trick:
  - The Fowler-Noll-Vo (FNV) hash
- Word embeddings and their use in sentence classification
fastText model quantization:
- Compression:
  - Quantization
  - Vector quantization:
    - Finding the codebook for high-dimensional spaces
  - Product quantization
  - Additional steps

Sentence classification

Sentence classification deals with understanding text found in natural languages and determining the classes that it may belong to. In the text classification set of problems, you will have a set of documents d that belongs to the corpus X (which contains all the documents). You will also have a set of finite classes C = {c₁ , c₂, ..., c_n}. Classes are also called categories or labels. To train a model, you would need a classifier, which is generally a well-tested algorithm (not necessary but in this case we will be talking about a well-tested algorithm that is used in fastText) and you will need a corpus with documents and associated labeling identifying the classes that each document belongs to.

Text classification has many practical uses, such as the following:

Creating spam classifiers in email
Page ranking and indexing in search engines
Sentiment...

fastText supervised learning

A fastText classifier is built on top of a linear classifier, specifically a BoW classifier. In this section, you will get to know the architecture of the fastText classifier and how it works.

Architecture

You can consider that each piece of text and each label is actually a vector in space and the coordinates of that vector are what we are actually trying to tweak and train so that the vector for a text and associated label are really close in space:

Vector representation of the text

So, in this example, which is an example shown in 2D space, you have texts that are saying things such as "Nigerian Tommy Thompson is also a relative newcomer to the wrestling scene" and "James...

fastText model quantization

Due to the efforts of the Facebook AI Research team, there is a way to get vastly smaller models (in terms of the size that they take up in the hard drive), as you have seen in the Model quantization section in Chapter 2, Creating Models Using FastText Command Line. Models which take up hundreds of MBs can be quantized to only a couple of MBs. For example, if you see the DBpedia model released by Facebook, which can be accessed at the web page https://fasttext.cc/docs/en/supervised-models.html, notice that the regular model (this is the BIN file) is of 427 MB while the smaller model (the FTZ file) is only 1.7 MB.

This reduction in size is achieved by throwing out some of the information that is encoded in the BIN files (or the bigger model). The problem that needs to be solved here is how to keep information that is important and how to identify information...

Summary

With this chapter, you have completed a deep dive into the theory behind how the fastText model is designed and implemented, the benefits, and the things that you need to consider while implementing it in your ML pipeline.

The next part of the book is about implementation and deployment and we start with how to use fastText in a Python environment in the next chapter.

The rest of the chapter is locked

You have been reading a chapter from

fastText Quick Start Guide

Published in: Jul 2018Publisher: PacktISBN-13: 9781789130997

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee

Other recommended products

Related to this chapter

Hands-On Python Natural Language Processing

This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP based use-cases such as language translation, sentiment analysis, etc. Every module covers real-world examples

BookJun 2020316 pages4

Natural Language Processing and Computational Linguistics

Discover how you can perform your own modern text analysis, to make predictions, create inferences, and gain insights about the data around you today. Learn how to harness the powerful Python ecosystem and tools such as spaCy and Gensim to perform natural language processing, and computational linguistics algorithms.

BookJun 2018306 pages

Natural Language Processing with Python Quick Start Guide

NLP in Python is among the most sought-after skills among data scientists. With code and relevant case studies, this book will show how you can use industry grade tools to implement NLP programs capable of learning from relevant data. We will explore many modern methods ranging from spaCy to word vectors that have reinvented NLP.

BookNov 2018182 pages

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

BookJul 2018312 pages

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

BookJul 2021356 pages

Natural Language Processing with TensorFlow

TensorFlow is the leading framework for deep learning algorithms critical to artificial intelligence, and natural language processing (NLP) makes much of the data used by deep learning applications accessible to them. This book brings the two together and teaches deep learning developers how to work with today’s vast amount of unstructured data.

BookMay 2018472 pages

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

BookJul 2020276 pages

Deep Learning Essentials

Deep Learning is one of the trending topics in the field of Artificial Intelligence today and can be considered to be an advanced form of machine learning. This book will help you take your first steps when it comes to training efficient deep learning models, and apply them in various practical scenarios. You will model, train and deploy different kinds of neural networks such as Convolutional Neural Network, Recurrent Neural Network, and see their applications in real-world domains such as computer vision, natural language processing, and speech recognition. This book also covers solutions to tackle different problems you might come across while training your models and ensure their high performance. This book does not assume any prior knowledge of deep learning. By the end of this book, you will have a firm understanding of the basics of deep learning and neural network modeling, along with their practical applications.

BookJan 2018284 pages3

Mastering Transformers

Explore the accurate and fast fine-tuning capabilities of transformer-based language models and understand how they outperform traditional machine learning-based approaches when solving challenging NLU problems. Developers working with the Transformers architecture will be able to put their knowledge to work with this practical guide.

BookSep 2021374 pages

Python Natural Language Processing

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. The numbers of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. Python's powerful tools and libraries are evolved so much that natural language processing becomes much simpler and accurate with it. This book will get you up and running with Python's library for Natural Language Processing-- NLTK-- in no time.

BookJul 2017486 pages

TensorFlow Machine Learning Projects

This book will show you how to take advantage of TensorFlow’s most appealing features - simplicity, efficiency, and flexibility - in various scenarios. You will gain cutting-edge insights into using TensorFlow’s offerings for your problems and learn practical hacks to successfully implement real-world machine learning projects.

BookNov 2018322 pages

Deep Learning with TensorFlow 2 and Keras

Deep Learning with TensorFlow 2 and Keras, Second Edition teaches deep learning techniques alongside TensorFlow (TF) and Keras. The book introduces neural networks with TensorFlow, runs through the main applications, covers two working example apps, and then dives into TF and cloudin production, TF mobile, and using TensorFlow with AutoML.

BookDec 2019646 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages