Natural Language Processing with Flair

By Tadej Magajna
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    Chapter 1: Introduction to Flair
About this book
Flair is an easy-to-understand natural language processing (NLP) framework designed to facilitate training and distribution of state-of-the-art NLP models for named entity recognition, part-of-speech tagging, and text classification. Flair is also a text embedding library for combining different types of embeddings, such as document embeddings, Transformer embeddings, and the proposed Flair embeddings. Natural Language Processing with Flair takes a hands-on approach to explaining and solving real-world NLP problems. You'll begin by installing Flair and learning about the basic NLP concepts and terminology. You will explore Flair's extensive features, such as sequence tagging, text classification, and word embeddings, through practical exercises. As you advance, you will train your own sequence labeling and text classification models and learn how to use hyperparameter tuning in order to choose the right training parameters. You will learn about the idea behind one-shot and few-shot learning through a novel text classification technique TARS. Finally, you will solve several real-world NLP problems through hands-on exercises, as well as learn how to deploy Flair models to production. By the end of this Flair book, you'll have developed a thorough understanding of typical NLP problems and you’ll be able to solve them with Flair.
Publication date:
April 2022


Chapter 1: Introduction to Flair

There are few Natural Language Processing (NLP) frameworks out there as easy to learn and as easy to work with as Flair. Packed with pre-trained models, excellent documentation, and readable syntax, it provides a gentle learning curve for NLP researchers who are not necessarily skilled in coding; software engineers with poor theoretical foundations; students and graduates; as well as individuals with no prior knowledge simply interested in the topic. But before diving straight into coding, some background about the motivation behind Flair, the basic NLP concepts, and the different approaches to how you can set up your local environment may help you on your journey toward becoming a Flair NLP expert.

In Flair's official GitHub README, the framework is described as:

"A very simple framework for state-of-the-art Natural Language Processing"

This description will raise a few eyebrows. NLP researchers will immediately be interested in knowing what specific tasks the framework achieves its state-of-the-art results in. Engineers will be intrigued by the very simple label, but will wonder what steps are required to get up and running and what environments it can be used in. And those who are not knowledgeable in NLP will wonder whether they will be able to grasp the knowledge required to understand the problems Flair is trying to solve.

In this chapter, we will be answering all of these questions by covering the basic NLP concepts and terminology, providing an overview of Flair, and setting up our development environment with the help of the following sections:

  • A brief introduction to NLP
  • What is Flair?
  • Getting ready

Technical requirements

To get started, you will need a development environment with Python 3.6+. Platform-specific instructions for installing Python can be found at

You will not require a GPU-equipped development machine, though having one will significantly speed up some of the training-related exercises described later in the book.

You will require access to a command line. On Linux and macOS, simply start the Terminal application. On Windows, press Windows + R to open the Run box, type cmd and then click OK.

Flair's official GitHub repository is available via the following link: In this chapter we will install Flair version 0.11.

The code examples covered in this chapter are found in this book's official GitHub repository in the following Jupyter notebook:


A brief introduction to NLP

Before diving straight into what Flair is capable of and how to leverage its features, we will be going through a brief introduction to NLP to provide some context for readers who are not familiar with all the NLP techniques and tasks solved by Flair. NLP is a branch of artificial intelligence, linguistics, and software engineering that helps machines understand human language. When we humans read a sentence, our brains immediately make sense of many seemingly trivial problems such as the following:

  • Is the sentence written in a language I understand?
  • How can the sentence be split into words?
  • What is the relationship between the words?
  • What are the meanings of the individual words?
  • Is this a question or an answer?
  • Which part-of-speech categories are the words assigned to?
  • What is the abstract meaning of the sentence?

The human brain is excellent at solving these problems conjointly and often seamlessly, leaving us unaware that we made sense of all of these things simply by reading a sentence.

Even now, machines are still not as good as humans at solving all these problems at once. Therefore, to teach machines to understand human language, we have to split understanding of natural language into a set of smaller, machine-intelligible tasks that allow us to get answers to these questions one by one.

In this section, you will find a list of some important NLP tasks with emphasis on the tasks supported by Flair.


Tokenization is the process of breaking down a sentence or a document into meaningful units called tokens. A token can be a paragraph, a sentence, a collocation, or just a word.

For example, a word tokenizer would split the sentence Learning to use Flair into a list of tokens as ["Learning", "to", "use", "Flair"].

Tokenization has to adhere to language-specific rules and is rarely a trivial task to solve. For example, with unspaced languages where word boundaries aren't defined with spaces, it's very difficult to determine where one word ends and the next one starts. Well-defined token boundaries are a prerequisite for most NLP tasks that aim to process words, collocations, or sentences including the following tasks explained in this chapter.

Text vectorization

Text vectorization is a process of transforming words, sentences, or documents in their written form into a numerical representation understandable to machines.

One of the simplest forms of text vectorization is one-hot encoding. It maps words to binary vectors of length equal to the number of words in the dictionary. All elements of the vector are 0 apart from the element that represents the word, which is set to 1 – hence the name one-hot.

For example, take the following dictionary:

  • Cat
  • Dog
  • Goat

The word cat would be the first word in our dictionary and its one-hot encoding would be [1, 0, 0]. The word dog would be the second word in our dictionary and its one-hot encoding would be [0, 1, 0]. And the word goat would be the third and last word in our dictionary and its one-hot encoding would be [0, 0, 1].

This approach, however, suffers from the problem of high dimensionality as the length of this vector grows linearly with the number of words in the dictionary. It also doesn't capture any semantic meaning of the word. To counter this problem, most modern state-of-the-art approaches use representations called word or document embeddings. Each embedding is usually a fixed-length vector consisting of real numbers. While the numbers will at first seem unintelligible to a human, in some cases, some vector dimensions may represent some abstract property of the word – for example, a dimension of a word-embedding vector could represent the general (positive or negative) sentiment of the word. Given two or more embeddings, we will be able to compute the similarity or distance between them using a distance measure called cosine similarity. With many modern NLP solutions, including Flair, embeddings are used as the underlying input representation for higher-level NLP tasks such as named entity recognition.

One of the main problems with early word embedding approaches was that words with multiple meanings (polysemic words) were limited to a single and constant embedding representation. One of the solutions to this problem in Flair is the use of contextual string embeddings where words are contextualized by their surrounding text, meaning that they will have a different representation given a different surrounding text.

Named entity recognition

Named entity recognition (NER) is an NLP task or technique that identifies named entities in a text and tags them with their corresponding categories. Named entity categories include, but aren't limited to, places, person names, brands, time expressions, and monetary values.

The following figure illustrates NER using colored backgrounds and tags associated with the words:

Figure 1.1 – Visualization of NER tagging

Figure 1.1 – Visualization of NER tagging

In the previous example, we can see that three entities were identified and tagged. The first and third tags are particularly interesting because they both represent the same word, Berkeley, yet the first one clearly refers to an organization whereas the second one refers to a geographic location. The human brain is excellent at distinguishing between different entity types based on context and is able to do so almost seamlessly, whereas machines have struggled with it for decades. Recent advancements in contextual string embeddings, an essential part of Flair, made a huge leap forward in solving that.

Word-sense disambiguation

Word-Sense Disambiguation (WSD) is an NLP technique concerned with identifying the intended sense of a given word with multiple meanings.

For example, take the given sentence:

George tried to return to Berlin to return his hat.

WSD would aim to identify the sense of the first use of the word return, referring to the act of giving something back, and the sense of the second return, referring to the act of going back to the same place.

Part-of-speech tagging

Part-of-Speech (POS) tagging is a technique closely related to both WSD and NER that aims to tag the words as corresponding to a particular part of speech such as nouns, verbs, adjectives adverbs, and so on.

Figure 1.2 – Visualization of POS tagging

Figure 1.2 – Visualization of POS tagging

Actual POS taggers provide a lot more information with the tags than simply associating the words with noun/verb/adjective categories. For example, the Penn Treebank Project corpus, one of the most widely used NER corpora, distinguishes between 36 different types of POS tags.


Another NLP technique closely related to POS tagging is chunking. Unlike parts of speech (POS), where we identify individual POS, in chunking we identify complete short phrases such as noun phrases. In Figure 1.2, the phrase A lovely day can be considered a chunk as it is a noun phrase, and in its relationship to other words works the same way as a noun.

Stemming and lemmatization

Stemming and lemmatization are two closely related text normalization techniques used in NLP to reduce the words to their common base forms. For example, the word play is the base word of the words playing, played and plays.

The simpler of the two techniques, stemming, simply accomplishes this by cutting off the ends or beginnings of words. This simple solution often works, but is not foolproof. For example, the word ladies can never be transformed into the word lady by stemming only. We therefore need a technique that understands the POS category of a word and takes into account its context. This technique is called lemmatization. The process of lemmatization can be demonstrated using the following example.

Take the following sentence:

this meeting was exhausting

Lemmatization reduces the previous sentence to the following:

this meeting be exhaust

It reduces the word was to be and the word exhausting to exhaust. Also note that the word meeting is used as a noun and it is therefore mapped to the same word meeting, whereas if the word meeting was used as a verb, it would be reduced to meet.

A popular and easy-to-use library for performing lemmatization with Python is spaCy. Its models are trained on large corpora and are able to distinguish between different POS, yielding impressive results.

Text classification

Text classification is an NLP technique used to assign a text or a document to one or more classes or document types. Practical uses for text classification include spam filtering, language identification, sentiment analysis, and programming language identification from syntax.

Having covered the basic NLP concepts and terminology, we can now move on to understanding what Flair is and how it manages to solve NLP tasks with state-of-the-art results.


Introducing Flair

Flair is a powerful NLP framework published as a Python package. It provides a simple interface that is friendly, easy to use, and caters to people from various backgrounds including those with little prior knowledge in programming. It is published under the MIT License, which is one of the most permissive free software licenses.

Flair as an NLP framework comes with a variety of tools and uses. It can be defined in the following ways:

  • It is an NLP framework used in NLP research for producing models that achieve state-of-the-art results across many NLP tasks such as POS tagging, NER, and chunking across several languages and datasets. In Flair's GitHub repository, you will find step-by-step instructions on how to reproduce these results.
  • It is a tool for training, validating, and distributing NER, POS tagging, chunking, word sense disambiguation, and text classification models. It features tools that help ease the training and validation processes such as the automatic corpora downloading tool, and tools that facilitate model tuning such as the hyperparameter optimization tool. It supports a growing number of languages.
  • It is a tool for downloading and using state-of-the-art pre-trained models. The models are downloaded seamlessly, meaning that they will be automatically downloaded the first time you use them and will remain stored for future use.
  • It is a platform for the proposed state-of-the-art Flair embeddings. The state-of-the-art results Flair achieves in many NLP tasks can by and large be attributed to its proposed Flair contextual string embeddings described in more detail in the paper Contextual String Embeddings for Sequence Labeling. The author refers to them as "the secret sauce" of Flair.
  • It is an NLP framework for working with biomedical data. A special section of Flair is dedicated solely to working with biomedical data and features a set of pretrained models that achieve state-of-the-art results, as well as a number of corpora and comprehensive documentation on how to train custom biomedical tagging models.
  • It is a great practical introduction to NLP. Flair's extensive online documentation, simple interface, inclusive support for a large number of languages, and its ability to perform a lot of the tasks on non-GPU-equipped machines all make it an excellent entry point for someone aiming to learn about NLP through practical hands-on experimentation.

Setting up the development environment

Now that you have a basic understanding of features offered by the framework, as well as an understanding of the basic NLP concepts, you are now ready to move to the next step of setting up your development environment for Flair.

To be able to follow the instructions in this section, first make sure you have Python 3.6+ installed on your device as described in the Technical requirements section.

Creating the virtual environment

In Python, it's generally good practice to install packages in virtual environments so that the project dependencies you are currently working on will not affect your global Python dependencies or other projects you may work on in the future.

We will use the venv tool that is part of the Python Standard Library and requires no installation. To create a virtual environment, simply create a new directory, move into it, then run the following command:

$ python3 -m venv learning-flair

Then, to activate the virtual environment on Linux or macOS, run the following:

$ source learning-flair/bin/activate

If you are running Windows, run the following:

$ learning-flair\Scripts\activate.bat

Your command line should become prefixed with (learning-flair) $ and your virtual environment is now active.

Installing a published version of Flair in a virtual environment

You should now be ready to install Flair version 0.11 with this single command:

(learning-flair) $ pip install flair==0.11

The installation should now commence and finish within a minute or so depending on the speed of your internet connection.

You can verify the installation by running the following command, which will display a list of package properties including its version:

(learning-flair) $ pip show flair
Name: flair
Version: 0.11
Summary: A very simple framework for state-of-the-art NLP

A command output like the preceding indicates the package has been successfully installed in your virtual environment.

Installing directly from the GitHub repository (optional)

In some cases, the features we aim to make use of in Flair may already be implemented in a branch on GitHub, but those changes may not yet be released as part of a Python package published on PyPI. We can install Flair with those features directly from the Git repository branch.

For example, here is how you can install Flair from the master branch:

(learning-flair) $ git clone
(learning-flair) $ cd flair
(learning-flair) $ git checkout master
(learning-flair) $ pip install .

Important note

Installing code from non-reviewed branches can introduce unreliable or unsafe code. When installing Flair from development branches, make sure the code you are installing comes from a trusted source. Also note that the future versions of Flair (versions larger than 0.11) may not be compatible with the code snippets found in this book.

Replace the term master with any other branch name to install the package from a branch of your choice.

Running code that uses Flair

Running code that makes use of the Flair Python package is no different from running any other type of Python code.

The recommended way for you to run the code snippets in this book is to execute them as code cells in a Jupyter notebook, which you can install and run as follows:

(learning-flair) $ pip install notebook
(learning-flair) $ jupyter notebook

You can then create a new Python 3 notebook and run your first Flair script to verify the package is imported successfully:

import flair

After executing, the preceding code should print out the version of Flair you are currently using, indicating that the Flair package has been imported successfully and you are ready to start.



In this chapter, you became familiar with the basic NLP terminology and tasks. As you learn about Flair, you will often come across terms such as tokenization, NER, and POS, and the knowledge gained in this chapter will help you understand what they mean.

You also now understand where Flair sits in the NLP space, what problems it's solving and which fields it excels in. Finally, you've learned how to install Flair inside your virtual environment either from a PyPI package or a Git branch. You are now ready to start coding with Flair!

In the upcoming chapter, we will be covering basic syntax and the basic objects in Flair, known as base types.

About the Author
  • Tadej Magajna

    Tadej Magajna is a former lead machine learning engineer, former data scientist, master of Computer Science and now a software engineer at Microsoft. He currently works in a team responsible for language model training and building language packs for keyboards. He started his career as a 15-year-old at a local media company as a web developer and progressed towards more complex engineering and machine learning problems. He tackled problems like NLP market research, public transport bus and train capacity forecasting and finally language model training at his current role. Today, he is based in his hometown Ljubljana, Slovenia.

    Browse publications by this author
Natural Language Processing with Flair
Unlock this book and the full library FREE for 7 days
Start now