You're reading from ElasticSearch Cookbook
In the previous chapter, we saw how it is possible to use a native client for accessing the ElasticSearch server via Java. This chapter is dedicated to Python language and how to manage common tasks via its clients.
As well as Java, ElasticSearch team supports official clients for Perl, PHP, Python, and Ruby (refer to the announcement post on ElasticSearch blog at http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/). They are pretty new as their initial public release was in September 2013. These clients have the following advantages against other implementations:
They are strongly tied to the ElasticSearch API. ElasticSearch team says These clients are direct translations of the native ElasticSearch REST interface.
They handle dynamic node detection and failover. They are built with a strong networking base for communicating with the cluster.
They have a full coverage of the REST API.
They share the same application approach for every language in which they...
The official ElasticSearch clients are designed to use several transport layers. They allow using the HTTP, thrift or memcached protocol without changing your application code.
The thrift and memcached protocols are binary ones and due to their structures they are generally a bit faster than the HTTP one. They wrap the REST API and share the same behavior so that switching between protocols is very easy.
In this recipe, we'll see how to instantiate a client with different protocols.
You need a working ElasticSearch cluster and plugins for extra protocols. The full code of this recipe is in the chapter_11/client_creation.py
file.
For creating a client, we need to perform the following steps:
Before using the Python client, it is required to install it (possibly in a Python virtual environment). The client is officially hosted on PyPi (http://pypi.python.org/) and it's easy to install with the following
pip
command:pip install elasticsearch
This standard...
In the previous recipe we saw how to initialize a client to send calls to an ElasticSearch cluster. In this recipe, we will see how to manage indices via client calls.
You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.
The full code of this recipe is in the chapter_11/indices_management.py
file.
In Python, managing the lifecycle of your indices is very easy, we need to perform the following steps:
We initialize a client as follows:
import elasticsearch es = elasticsearch.Elasticsearch() index_name = "my_index"
All the indices methods are available in the
client.indices
namespace. We can create and wait for the creation of an index as follows:es.indices.create(index_name) es.cluster.health(wait_for_status="yellow")
We can close/open an index as follows:
es.indices.close(index_name) es.indices.open(index_name) es.cluster.health(wait_for_status="yellow")
We can optimize an index as follows:
es.indices...
After creating an index, the next step is to add some mapping to it. We have already seen how to put a mapping via REST API in Chapter 4, Standard Operations. In this recipe, we will see how to manage mappings via official Python client and PyES.
You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.
The code of this recipe is in chapter_11/mapping_management.py
and chapter_11/mapping_management_pyes.py
.
After having initialized a client and created an index, the steps required for managing the indices are as follows:
Create a mapping
Retrieve a mapping
Delete a mapping
These steps are easily managed with code as follows:
We initialize the client as follows:
import elasticsearch es = elasticsearch.Elasticsearch()
We create an index as follows:
index_name = "my_index" type_name = "my_type" es.indices.create(index_name) es.cluster.health(wait_for_status="yellow")
We put the mapping as follows:
es.indices...
The APIs for managing the documents (index, update, and delete) are the most important ones after the search ones. In this recipe, we will see how to use them in a standard way and in bulk actions to improve the performance.
You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.
The full code of this recipe is in the chapter_11/document_management.py
and chapter_11/document_management_pyes.py
files.
The main operations to manage documents are as follows:
index
: This stores a document in ElasticSearch. It is mapped on the Index API call.update
: This allows updating some values in a document. This operation is composed internally (via the Lucene nature) by deleting the previous document and reindexing of the document with the new values. It is mapped on the Update API call.delete
: This deletes a document from the index. It is mapped on the Delete API call.
With the ElasticSearch Python client...
After having inserted documents, the most common executed action in ElasticSearch is the search. The official ElasticSearch client APIs for searching are similar to the REST one.
You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.
The code of this recipe is in the chapter_11/searching.py
and chapter_11/searching_pyes.py
files.
To execute a standard query, the client search method must be called passing the query parameters as we saw in Chapter 5, Search, Queries, and Filters. The required parameters are at least the index name, the type name, and the query DSL. In the following example I'll show how to call a match all query, a term query and a filter query. We need to perform the following steps:
We will initialize the client and populate the index as follows:
import elasticsearch from pprint import pprint es = elasticsearch.Elasticsearch() index_name = "my_index" type_name ...
Searching for results is obviously the main activity of a search engine, thus facet is very important because it often helps to complete the results.
Faceting is executed along the search doing analytics on searched results.
You need a working ElasticSearch cluster and required packages of the Creating a client recipe of this chapter.
The code of this recipe is in the chapter_11/faceting.py
and chapter_11/faceting_pyes.py
files.
To extend a query with the facet part, you need to define a facet section as we have already seen in Chapter 6, Facets. In the case of the official ElasticSearch client, you can add the facet DSL to the search dictionary to provide facets. We need to perform the following steps:
We need to initialize the client and populate the index as follows:
import elasticsearch from pprint import pprint es = elasticsearch.Elasticsearch() index_name = "my_index" type_name = "my_type" from utils import create_and_add_mapping, populate...