You're reading from fastText Quick Start Guide

Product typeBook

Published inJul 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789130997

Edition1st Edition

Languages

Python

Tools

fastText

Concepts

Mobile Application Development

Author (1)

Joydeep Bhattacharjee

Machine Learning and Deep Learning Models

In almost all of the applications that we have been discussing up to now, the implicit assumption has been that you are creating a new machine learning NLP pipeline. Now, that may not always be the case. If you are already working on an established platform, fastText may also be a good addition to make the pipeline better.

This chapter will give you some of the methods and recipes for implementing fastText using popular frameworks such as scikit-learn, Keras, TensorFlow, and PyTorch. We will look at how we can augment the power of word embeddings in fastText, using other deep neural architectures such as convolutional neural networks (CNN) or attention networks to solve various NLP problems.

The topics covered in this chapter are as follows:

Scikit-learn and fastText
Embeddings
Keras
Embeddings layer in Keras
Convolutional neural network...

Scikit-learn and fastText

In this section, we will be talking about how to integrate fastText into your statistical models. The most common and popular library for statistical machine learning is scikit-learn, so we will focus on that.

scikit-learn is one of the most popular machine learning tools and the reason is that the API is very simple and uniform. The flow is like this:

You basically convert your data into matrix format.
Then, you create an instance of the predictor class.
Using the instance, you run the fit method on the data.
Once the model is created, you can run predict on it.

This means that you can create a custom classifier by defining the fit and predict methods.

Custom classifiers for fastText

Since we...

Embeddings

As you have seen, when you need to work with text in machine learning, you need to convert the text into numerical values. The logic is the same in neural architectures as well. In neural networks, you implement this using the embeddings layer. All modern deep learning libraries provide an embeddings API for use.

The embeddings layer is a useful and versatile layer used for various purposes:

It can be used to learn word embeddings to be used in an application later
It can be used with a larger model where the embeddings are also tuned as part of the model
It can be used to load a pretrained word embedding

It is in the third point that will be the focus of this section. The idea is to utilize fastText to create superior embeddings, which can then be injected into your model using this embedding layer. Normally the embeddings layer is initialized with random weights...

Keras

Keras is a widely popular high-level neural network API. It supports TensorFlow, CNTK, and Theano as the backend. Due to the user-friendly API of Keras, many people use it in lieu of the base libraries.

Embedding layer in Keras

The embedding layer will be the first hidden layer of the Keras network and you will need to specify three arguments: input dimension, output dimension, and input length. Since we will be using fastText to make our model better, we will also need to pass the weights parameter with the embedding matrix and make the trainable matrix to be false:

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM, 
                            weights=[embedding_matrix],
               ...

TensorFlow

TensorFlow is a computation library developed by Google. It is quite popular now and is used by many companies to create their neural network models. After what you have seen in Keras, the logic behind augmenting TensorFlow models using fastText is the same.

Word embeddings in TensorFlow

To create word embeddings in TensorFlow, you will need to create an embeddings matrix where all the tokens in your list of documents have unique IDs, and so each document is a vector of these IDs. Now, let's say you have an embedding in a NumPy array called word_embedding, with vocab_size rows and embedding_dim columns, and you want to create a tensor W. Taking a specific example, the sentence "I have a cat." can...

PyTorch

Following the same logic as the previous two libraries, you can use the torch.nn.EmbeddingBag class to inject the pretrained embeddings. There is a small drawback though. Keras and TensorFlow make the assumption that your tensors are actually implemented as NumPy arrays, while in the case of PyTorch, that's not the case. PyTorch implements the torch tensor. Generally, this is not an issue, but this means that you will need to write your own text conversion and tokenizing pipelines. To circumvent all this rewriting and reinvention of the wheel, you can use the torchtext library.

The torchtext library

The torchtext is an excellent library that takes care of most of the preprocessing steps that you need to build...

Summary

In this chapter, we took a look at how to integrate fastText word vectors into either linear machine learning models or deep learning models created in Keras, TensorFlow, and PyTorch. You also saw how word vectors can be easily assimilated into existing neural architectures that you might be using in your business application. If you are initializing the embeddings from random values, I would highly recommend that you try to initialize them using fastText values, and then see whether there are performance improvements in your model.

The rest of the chapter is locked

You have been reading a chapter from

fastText Quick Start Guide

Published in: Jul 2018Publisher: PacktISBN-13: 9781789130997

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee

Other recommended products

Related to this chapter

Hands-On Python Natural Language Processing

This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP based use-cases such as language translation, sentiment analysis, etc. Every module covers real-world examples

BookJun 2020316 pages4

Natural Language Processing and Computational Linguistics

Discover how you can perform your own modern text analysis, to make predictions, create inferences, and gain insights about the data around you today. Learn how to harness the powerful Python ecosystem and tools such as spaCy and Gensim to perform natural language processing, and computational linguistics algorithms.

BookJun 2018306 pages

Natural Language Processing with Python Quick Start Guide

NLP in Python is among the most sought-after skills among data scientists. With code and relevant case studies, this book will show how you can use industry grade tools to implement NLP programs capable of learning from relevant data. We will explore many modern methods ranging from spaCy to word vectors that have reinvented NLP.

BookNov 2018182 pages

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

BookJul 2018312 pages

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

BookJul 2021356 pages

Natural Language Processing with TensorFlow

TensorFlow is the leading framework for deep learning algorithms critical to artificial intelligence, and natural language processing (NLP) makes much of the data used by deep learning applications accessible to them. This book brings the two together and teaches deep learning developers how to work with today’s vast amount of unstructured data.

BookMay 2018472 pages

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

BookJul 2020276 pages

Deep Learning Essentials

Deep Learning is one of the trending topics in the field of Artificial Intelligence today and can be considered to be an advanced form of machine learning. This book will help you take your first steps when it comes to training efficient deep learning models, and apply them in various practical scenarios. You will model, train and deploy different kinds of neural networks such as Convolutional Neural Network, Recurrent Neural Network, and see their applications in real-world domains such as computer vision, natural language processing, and speech recognition. This book also covers solutions to tackle different problems you might come across while training your models and ensure their high performance. This book does not assume any prior knowledge of deep learning. By the end of this book, you will have a firm understanding of the basics of deep learning and neural network modeling, along with their practical applications.

BookJan 2018284 pages3

Mastering Transformers

Explore the accurate and fast fine-tuning capabilities of transformer-based language models and understand how they outperform traditional machine learning-based approaches when solving challenging NLU problems. Developers working with the Transformers architecture will be able to put their knowledge to work with this practical guide.

BookSep 2021374 pages

Python Natural Language Processing

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. The numbers of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. Python's powerful tools and libraries are evolved so much that natural language processing becomes much simpler and accurate with it. This book will get you up and running with Python's library for Natural Language Processing-- NLTK-- in no time.

BookJul 2017486 pages

TensorFlow Machine Learning Projects

This book will show you how to take advantage of TensorFlow’s most appealing features - simplicity, efficiency, and flexibility - in various scenarios. You will gain cutting-edge insights into using TensorFlow’s offerings for your problems and learn practical hacks to successfully implement real-world machine learning projects.

BookNov 2018322 pages

Deep Learning with TensorFlow 2 and Keras

Deep Learning with TensorFlow 2 and Keras, Second Edition teaches deep learning techniques alongside TensorFlow (TF) and Keras. The book introduces neural networks with TensorFlow, runs through the main applications, covers two working example apps, and then dives into TF and cloudin production, TF mobile, and using TensorFlow with AutoML.

BookDec 2019646 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages