Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
fastText Quick Start Guide

You're reading from  fastText Quick Start Guide

Product type Book
Published in Jul 2018
Publisher Packt
ISBN-13 9781789130997
Pages 194 pages
Edition 1st Edition
Languages
Author (1):
Joydeep Bhattacharjee Joydeep Bhattacharjee
Profile icon Joydeep Bhattacharjee

Preface

FastText is a state-of-the-art tool that can be used to perform text classification and build efficient word representations. It is open source and is designed at Facebook Artificial Intelligence Research (FAIR) lab. It is written in C++, and you also have wrappers available in Python.

This book has the ambitious goal of covering all and techniques and know-how that you need to build NLP applications in the real world. It will also cover the algorithms on which fastText is built so that you will clearly understand the context in which you can expect the best results from fastText.

Who this book is for

This book will be of benefit to you if you are a software developer/machine learning engineer trying to understand the state-of-the-art in NLP. A large part of the book deals with real-life problems and considerations for creating an NLP pipeline. If you are an NLP researcher, there is a lot of value here because you will learn about the internal algorithms and considerations taken while developing the fastText software. All the code examples are written in Jupyter Notebooks. I highly recommend you type them out, change them, and tinker with them. Keep the code handy so that you can use it later in your actual projects.

What this book covers

Chapter 1, Introducing FastText, introduces fastText and the NLP context in which this library is useful. It will map the motivations behind building the library and the intended usage and benefits that the creators of the library intended to bring into NLP and the field of computational linguistics. There will also be specific instructions explaining how to install fastText on your work machine. Upon completion of this chapter, you will have fastText installed and running on your computer.

Chapter 2, Creating Models Using the FastText Command Line, discusses the rich command line that the fastText library provides. This chapter describes the default command-line options and shows how to use it to create models. If you are only interested in having a superficial introduction to fastText, reading up to this chapter should be enough.

Chapter 3, Word Representations in FastText, explains how unsupervised word embeddings are created in fastText.

Chapter 4, Sentence Classification in FastText, introduces the algorithms that power sentence classification in fastText. You will also learn how fastText compresses big models into smaller models that can be deployed to low-memory devices.

Chapter 5, FastText in Python, is about creating models in Python by either using the official Python bindings for fastText or by using the gensim library, which is a popular Python library for NLP.

Chapter 6, Machine Learning and Deep Learning Models, explains how to integrate fastText into your NLP pipeline if you have pre-built pipelines that use either statistical machine learning paradigms or deep learning paradigms. In the case of statistical machine learning, this chapter makes use of the scikit-learn library; and in the case of deep learning, Keras, TensorFlow, and PyTorch are taken into account.

Chapter 7, Deploying Models to Mobile and the Web, is mainly about deployment and how to integrate fastText models in live production-grade customer applications.

To get the most out of this book

Ideally, you should have a basic knowledge of how Python code is written and structured. If you are not familiar with Python or are not clear how programming languages work in general, then please take at look at a book on Python. A book dealing with Python from a data science perspective would be ideal for you.

If you already have a basic idea of NLP and machine learning in general, this book should be easy for you to grasp. If you are starting out in NLP, that should not be too much of an issue if you are willing to dive deep into the mathematics covered. I have taken care to explain the mathematical concepts covered in this book, but if this too seems too difficult, please write to us and let us know.

A willingness on the part of the reader to dive deep and try out all the code is assumed.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/fastText-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Commands such as cat, grep, sed, and awk are quite old and their behavior is well-documented on the internet."

A block of code is set as follows:

import csv
import sys
w = csv.writer(sys.stdout)
for row in csv.DictReader(sys.stdin):
w.writerow([row['stars'], row['text'].replace('\n', '')])

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

import csv
import sys
w = csv.writer(sys.stdout)
for row in csv.DictReader(sys.stdin):
w.writerow([row['stars'], row['text'].replace('\n', '')])

Any command-line input or output is written as follows:

$ cat data/yelp/yelp_review.csv | \
python parse_yelp_dataset.py \
> data/yelp/yelp_review.v1.csv

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email feedback@packtpub.com and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at questions@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packtpub.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

lock icon The rest of the chapter is locked
Next Chapter arrow right
You have been reading a chapter from
fastText Quick Start Guide
Published in: Jul 2018 Publisher: Packt ISBN-13: 9781789130997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}