Reader small image

You're reading from  ElasticSearch Cookbook

Product typeBook
Published inDec 2013
Reading LevelBeginner
PublisherPackt
ISBN-139781782166627
Edition1st Edition
Languages
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 11. Python Integration

In this chapter, we will cover the following topics:

  • Creating a client

  • Managing indices

  • Managing mappings

  • Managing documents

  • Executing a standard search

  • Executing a facet search

Introduction


In the previous chapter, we saw how it is possible to use a native client for accessing the ElasticSearch server via Java. This chapter is dedicated to Python language and how to manage common tasks via its clients.

As well as Java, ElasticSearch team supports official clients for Perl, PHP, Python, and Ruby (refer to the announcement post on ElasticSearch blog at http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/). They are pretty new as their initial public release was in September 2013. These clients have the following advantages against other implementations:

  • They are strongly tied to the ElasticSearch API. ElasticSearch team says These clients are direct translations of the native ElasticSearch REST interface.

  • They handle dynamic node detection and failover. They are built with a strong networking base for communicating with the cluster.

  • They have a full coverage of the REST API.

  • They share the same application approach for every language in which they...

Creating a client


The official ElasticSearch clients are designed to use several transport layers. They allow using the HTTP, thrift or memcached protocol without changing your application code.

The thrift and memcached protocols are binary ones and due to their structures they are generally a bit faster than the HTTP one. They wrap the REST API and share the same behavior so that switching between protocols is very easy.

In this recipe, we'll see how to instantiate a client with different protocols.

Getting ready

You need a working ElasticSearch cluster and plugins for extra protocols. The full code of this recipe is in the chapter_11/client_creation.py file.

How to do it...

For creating a client, we need to perform the following steps:

  1. Before using the Python client, it is required to install it (possibly in a Python virtual environment). The client is officially hosted on PyPi (http://pypi.python.org/) and it's easy to install with the following pip command:

    pip install elasticsearch

    This standard...

Managing indices


In the previous recipe we saw how to initialize a client to send calls to an ElasticSearch cluster. In this recipe, we will see how to manage indices via client calls.

Getting ready

You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.

The full code of this recipe is in the chapter_11/indices_management.py file.

How to do it...

In Python, managing the lifecycle of your indices is very easy, we need to perform the following steps:

  1. We initialize a client as follows:

    import elasticsearch
    es = elasticsearch.Elasticsearch()
    index_name = "my_index"
  2. All the indices methods are available in the client.indices namespace. We can create and wait for the creation of an index as follows:

    es.indices.create(index_name)
    es.cluster.health(wait_for_status="yellow")
  3. We can close/open an index as follows:

    es.indices.close(index_name)
    
    es.indices.open(index_name)
    es.cluster.health(wait_for_status="yellow")
  4. We can optimize an index as follows:

    es.indices...

Managing mappings


After creating an index, the next step is to add some mapping to it. We have already seen how to put a mapping via REST API in Chapter 4, Standard Operations. In this recipe, we will see how to manage mappings via official Python client and PyES.

Getting ready

You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.

The code of this recipe is in chapter_11/mapping_management.py and chapter_11/mapping_management_pyes.py.

How to do it...

After having initialized a client and created an index, the steps required for managing the indices are as follows:

  • Create a mapping

  • Retrieve a mapping

  • Delete a mapping

These steps are easily managed with code as follows:

  1. We initialize the client as follows:

    import elasticsearch
    
    es = elasticsearch.Elasticsearch()
  2. We create an index as follows:

    index_name = "my_index"
    type_name = "my_type"
    es.indices.create(index_name)
    es.cluster.health(wait_for_status="yellow")
  3. We put the mapping as follows:

    es.indices...

Managing documents


The APIs for managing the documents (index, update, and delete) are the most important ones after the search ones. In this recipe, we will see how to use them in a standard way and in bulk actions to improve the performance.

Getting ready

You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.

The full code of this recipe is in the chapter_11/document_management.py and chapter_11/document_management_pyes.py files.

How to do it...

The main operations to manage documents are as follows:

  • index: This stores a document in ElasticSearch. It is mapped on the Index API call.

  • update: This allows updating some values in a document. This operation is composed internally (via the Lucene nature) by deleting the previous document and reindexing of the document with the new values. It is mapped on the Update API call.

  • delete: This deletes a document from the index. It is mapped on the Delete API call.

With the ElasticSearch Python client...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
ElasticSearch Cookbook
Published in: Dec 2013Publisher: PacktISBN-13: 9781782166627
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro