Reader small image

You're reading from  fastText Quick Start Guide

Product typeBook
Published inJul 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789130997
Edition1st Edition
Languages
Right arrow
Author (1)
Joydeep Bhattacharjee
Joydeep Bhattacharjee
author image
Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee

Right arrow

Word Representations in FastText

Now that you have taken a look at creating models in the command line, you might be wondering how fastText creates those word representations. In this chapter, you will get to know what happens behind the scenes and the algorithms that power fastText.

We will cover the following topics in this chapter:

  • Word-to-vector representations
  • Types of word representations
  • Getting vector representations from text
  • Model architecture in fastText
  • The unsupervised model
  • fastText skipgram implementation
  • CBOW (Continuous bag of words)
  • Comparison between skipgram and CBOW
  • Loss functions and optimizations
  • Softmax
  • Context definitions

Word-to-vector representations

Almost all machine learning and deep learning algorithms manipulate vectors and matrices. The reason they work is because of their base mathematics, which is heavily rooted in linear algebra. So, in short, for both supervised and unsupervised learning, you will need to create matrices of numbers. In other domains, this is not an issue as information is generally captured as numbers. For example, in retail, the sales information for how many units were sold or how much revenue the store is making in the current month is all numbers. Even in a more abstract field such as computer vision, the image is always stored as pixel intensity of the three basic colors: red, green, and blue. 0 for a particular color means no intensity and 255 means the highest possible intensity for the screen. Similarly, in the case of sound, it is stored as power spectral density...

Summary

In this chapter, you have taken a look at unsupervised learning in fastText, as well as the algorithms and methods that enable it.

The next chapter will be about how fastText has approached supervised learning and you will also learn about how model quantization works in fastText.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
fastText Quick Start Guide
Published in: Jul 2018Publisher: PacktISBN-13: 9781789130997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee