You're reading from fastText Quick Start Guide
Windows and Linux
We would suggest that you use PowerShell for your windows command line as that is more powerful then simple cmd.
Task | Windows | Linux/macOS |
Creating a directory | mkdir | mkdir |
Change directory | cd | cd |
Move files | move | mv |
Unzip files | GUI and double click | unzip |
Top of the file | get-content |
head |
Contents of the file | type |
cat |
Piping | this pipes objects |
this pipes text |
Bottom of the file | -wait parameter with get-content |
tail |
python and perl commands work the same way in windows as they work in bash and hence you can use those files and especially perl one liners in similar way.
Python 2 and Python 3
fastText works for both Python 2 and Python 3. There are few differences though that you should keep in mind for the particular python version.
- print is a statement in Python 2 and a function in Python 3. This would mean that if you are in a Jupyter notebook and trying to see the changes in a variable you will need to pass the appropriate print statement in the corresponding python version.
- The fastText handles text as Unicode. Python 3 also handles text as Unicode and hence there is no additional overhead if you code in Python 3. But in case you are developing your models in Python 2, you cannot have your data as a string instance. You will need to have your data as Unicode. Following is an example of text as an instance of the str class and unicode class in Python 2.
>>> text1 = "some text" # this will not work for fastText
>>...
The fastText command line
Following is the list of parameters that you can use with fastText command line:
$ ./fasttext
usage: fasttext <command> <args>
The commands supported by fasttext are:
supervised train a supervised classifier
quantize quantize a model to reduce the memory usage
test evaluate a supervised classifier
predict predict most likely labels
predict-prob predict most likely labels with probabilities
skipgram train a skipgram model
cbow train a cbow model
print-word-vectors print word vectors given a trained model
print-sentence-vectors print sentence vectors given a trained model
print-ngrams print ngrams given a trained model and word
nn query for nearest neighbors
analogies query for analogies
dump dump arguments,dictionary,input/output vectors
The supervised, skipgram, and cbow commands are for training a model. predict, predict-prob are...
Gensim fastText parameters
Gensim supports the same hyperparameters that are supported in the native implementation of fastText. You should be able to set them as follows:
- sentences: This can be a list of list of tokens. In general, a stream of tokens is recommended, such as LineSentence from the word2vec module, as you have seen earlier. In the Facebook fastText library this is given by the path to the file and is given by the -input parameter.
- sg: Either 1 or 0. 1 means to train a skip-gram model, and 0 means to train a CBOW model. In the Facebook fastText library the equivalent is when you pass the skipgram and cbow arguments.
- size: The dimensions of the word vectors and hence must be an integer. In line with the original implementation, 100 is chosen as default. This is similar to the -dim argument in the Facebook fastText implementation.
- window: The window size that is considered...