Reader small image

You're reading from  fastText Quick Start Guide

Product typeBook
Published inJul 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789130997
Edition1st Edition
Languages
Right arrow
Author (1)
Joydeep Bhattacharjee
Joydeep Bhattacharjee
author image
Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee

Right arrow

Notes for the Readers

Windows and Linux

We would suggest that you use PowerShell for your windows command line as that is more powerful then simple cmd.

Task Windows Linux/macOS
Creating a directory mkdir mkdir
Change directory cd cd
Move files move mv
Unzip files GUI and double click unzip
Top of the file get-content

head

Contents of the file type

cat

Piping this pipes objects

this pipes text

Bottom of the file -wait parameter with get-content

tail

python and perl commands work the same way in windows as they work in bash and hence you can use those files and especially perl one liners in similar way.

Python 2 and Python 3

fastText works for both Python 2 and Python 3. There are few differences though that you should keep in mind for the particular python version.

  1. print is a statement in Python 2 and a function in Python 3. This would mean that if you are in a Jupyter notebook and trying to see the changes in a variable you will need to pass the appropriate print statement in the corresponding python version.
  2. The fastText handles text as Unicode. Python 3 also handles text as Unicode and hence there is no additional overhead if you code in Python 3. But in case you are developing your models in Python 2, you cannot have your data as a string instance. You will need to have your data as Unicode. Following is an example of text as an instance of the str class and unicode class in Python 2.
>>> text1 = "some text" # this will not work for fastText
>>...

The fastText command line

Following is the list of parameters that you can use with fastText command line:

$ ./fasttext
usage: fasttext <command> <args>

The commands supported by fasttext are:

supervised train a supervised classifier
quantize quantize a model to reduce the memory usage
test evaluate a supervised classifier
predict predict most likely labels
predict-prob predict most likely labels with probabilities
skipgram train a skipgram model
cbow train a cbow model
print-word-vectors print word vectors given a trained model
print-sentence-vectors print sentence vectors given a trained model
print-ngrams print ngrams given a trained model and word
nn query for nearest neighbors
analogies query for analogies
dump dump arguments,dictionary,input/output vectors

The supervised, skipgram, and cbow commands are for training a model. predict, predict-prob are...

Gensim fastText parameters

Gensim supports the same hyperparameters that are supported in the native implementation of fastText. You should be able to set them as follows:

  • sentences: This can be a list of list of tokens. In general, a stream of tokens is recommended, such as LineSentence from the word2vec module, as you have seen earlier. In the Facebook fastText library this is given by the path to the file and is given by the -input parameter.
  • sg: Either 1 or 0. 1 means to train a skip-gram model, and 0 means to train a CBOW model. In the Facebook fastText library the equivalent is when you pass the skipgram and cbow arguments.
  • size: The dimensions of the word vectors and hence must be an integer. In line with the original implementation, 100 is chosen as default. This is similar to the -dim argument in the Facebook fastText implementation.
  • window: The window size that is considered...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
fastText Quick Start Guide
Published in: Jul 2018Publisher: PacktISBN-13: 9781789130997
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Joydeep Bhattacharjee

Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium.
Read more about Joydeep Bhattacharjee