Reader small image

You're reading from  Elasticsearch 5.x Cookbook - Third Edition

Product typeBook
Published inFeb 2017
Publisher
ISBN-139781786465580
Edition3rd Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 16. Python Integration

In this chapter, we will cover the following recipes:

  • Creating a client

  • Managing indices

  • Managing mappings

  • Managing documents

  • Executing a standard search

  • Executing a search with aggregations

Introduction


In the previous chapter, we saw how it is possible to use a native client to access the Elasticsearch server via Java. This chapter is dedicated to the Python language and how to manage common tasks via its clients.

Apart from Java, the Elasticsearch team supports official clients for Perl, PHP, Python, .NET, and, Ruby. (See the announcement post on the Elasticsearch blog at http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/.) These clients have a lot of advantages over other implementations. A few of them are mentioned as follows:

  • They are strongly tied to the Elasticsearch API:

                 "These clients are direct translations of the native Elasticsearch REST interface"                                                                                                     - The Elasticsearch team

  • They handle dynamic node detection and failover: they are built with a strong networking base for communicating with the cluster.

  • They have full coverage of...

Creating a client


The official Elasticsearch clients are designed to manage a lot of issues that typically are required to create solid REST clients, such as retry if there are network issues, autodiscovery of other nodes of the cluster, and data conversions for communicating on the HTTP layer.

In this recipe, we'll see how to instantiate a client with varying options.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

A Python 2.x or 3.x distribution should be installed. In Linux and the MacOsX system, it's already provided by the standard installation. To manage Python, pip packages (https://pypi.python.org/pypi/pip/) must be also installed.

The full code of this recipe is in the chapter_16/client_creation.py file.

How to do it...

For creating a client, we will perform the following steps:

  1. Before using the Python client, we need to install it (possibly in a Python virtual...

Managing indices


In the previous recipe, we saw how to initialize a client to send calls to an Elasticsearch cluster. In this recipe, we will look at how to manage indices via client calls.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need the Python-installed packages from the Creating a client recipe in this chapter.

The full code for this recipe can be found in the chapter_16/indices_management.py file.

How to do it…

In Python, managing the life cycle of your indices is very easy. We will perform the following steps:

  1. We initialize a client:

            import elasticsearch 
            es = elasticsearch.Elasticsearch() 
            index_name = "my_index"
  2. We need to check if the index exists, and, if so, we need to delete it:

            if es.indices.exists(index_name): 
                es.indices.delete(index_name) 
    
  3. All the indices methods are available...

Managing mappings include the mapping


After creating an index, the next step is to add some type mappings to it. We have already seen how to include a mapping via the REST API in Chapter 4, Basic Operations.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need the Python installed packages of Creating a client recipe of this chapter.

The code for this recipe is in the chapter_16/mapping_management.py file.

How to do it…

After having initialized a client and created an index, the steps for managing the indices are as follows:

  1. Create a mapping.

  2. Retrieve a mapping.

These steps are easily managed with the following code:

  1. We initialize the client:

            import elasticsearch
            es = elasticsearch.Elasticsearch()
    
  2. We create an index:

            index_name = "my_index"
            type_name = "my_type"
            if es.indices.exists(index_name):
       ...

Managing documents


The APIs for managing a document (index, update, and delete) are the most important after the search ones. In this recipe, we will see how to use them in a standard way and in bulk actions to improve performances.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need the Python installed packages of Creating a client recipe of this chapter.

The full code for this recipe can be found in the chapter_16/document_management.py file.

How to do it…

The three main operations to manage the documents are as follows:

  • index: This operation stores a document in Elasticsearch. It is mapped on the index API call.

  • update: This allows updating some values in a document. This operation is composed internally (via Lucene) by deleting the previous document and re-indexing the document with the new values. It is mapped to the update API call.

  • delete: This delete...

Executing a search with aggregations


Searching for results is obviously the main activity for a search engine; thus a aggregations are very important because they often help to augment the results.

Aggregations are executed along the search by performing analytics on searched results.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

You also need the Python installed packages of the Creating a client recipe of this chapter.

The code of this recipe can be found in the chapter_16/aggregation.py file.

How to do it…

To extend a query with the aggregations part, you need to define an aggregation section, as we have already seen in Chapter 8, Aggregations. In the case of the official Elasticsearch client, you can add the aggregation DSL to the search dictionary to provide aggregations. We will perform the following steps:

  1. We initialize the client and populate the index:

         ...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 5.x Cookbook - Third Edition
Published in: Feb 2017Publisher: ISBN-13: 9781786465580
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro