Reader small image

You're reading from  Elasticsearch 7.0 Cookbook. - Fourth Edition

Product typeBook
Published inApr 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781789956504
Edition4th Edition
Languages
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Python Integration

In the previous chapter, we saw how it was possible to use a native client to access the Elasticsearch server via Java. This chapter is dedicated to the Python language and how to manage common tasks via its clients.

Apart from Java, the Elasticsearch team supports official clients for Perl, PHP, Python, .NET, and Ruby (see the announcement post on the Elasticsearch blog at http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/). These clients have a lot of advantages over other implementations. A few of them are given in the following list:

  • They are strongly tied to the Elasticsearch API. These clients are direct translations of the native Elasticsearch REST interface—the Elasticsearch team.
  • They handle dynamic node detection and failovers. They are built with a strong networking base for communicating with the cluster.
  • They have...

Creating a client

The official Elasticsearch clients are designed to manage a lot of issues that are typically required to create solid REST clients, such as retry if there are network issues, autodiscovery of other nodes of the cluster, and data conversions for communicating on the HTTP layer.

In this recipe, we'll learn how to instantiate a client with varying options.

Getting ready

You need an up-and-running Elasticsearch installation, which we described how to get in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

A Python 2.x or 3.x distribution should be installed. In Linux and the Mac OS X system, it's already provided in the standard installation...

Managing indices

In the previous recipe, we saw how to initialize a client to send calls to an Elasticsearch cluster. In this recipe, we will look at how to manage indices via client calls.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1Getting Started.

You also need the Python-installed packages from the Creating a client recipe in this chapter.

The full code for this recipe can be found in the ch15/code/indices_management.py file.

How to do it…

In Python...

Managing mappings include the mapping

After creating an index, the next step is to add some type mappings to it. We have already seen how to include a mapping via the REST API in Chapter 3Basic Operations.

Getting ready

You need an up-and-running Elasticsearch installation, which we described how to get in the Downloading and installing Elasticsearch recipe in Chapter 1Getting Started.

You also need the Python packages that we installed in the Creating a client recipe in this chapter.

The code for this recipe is in the ch15/code/mapping_management.py file.

How to do it…

...

Managing documents

The APIs for managing a document (index, update, and delete) are the most important after the search APIs. In this recipe, we will see how to use them in a standard way and use bulk actions to improve performance.

Getting ready

You need an up-and-running Elasticsearch installation, which we described how to get in the Downloading and installing Elasticsearch recipe in Chapter 1Getting Started.

You also need the Python packages that we installed in the Creating a client recipe in this chapter.

The full code for this recipe can be found in the ch15/code/document_management.py file.

How to do it…

...

Executing a search with aggregations

Searching for results is obviously the main activity for a search engine, and therefore aggregations are very important because they often help to augment the results.

Aggregations are executed along with the search by performing analytics on the results of the search.

Getting ready

You need an up-and-running Elasticsearch installation, which we described how to get in the Downloading and installing Elasticsearch recipe in Chapter 1Getting Started.

You also need the Python packages that we installed in the Creating a client recipe of this chapter.

The code for this recipe can be found in the ch15/code/aggregation.py file.

...

Integrating with NumPy and scikit-learn

Elasticsearch can be easily integrated with many Python machine learning libraries. One of the most used libraries for works with datasets is NumPy—a NumPy array is a building block dataset for many Python machine learning libraries. In this recipe will we seen how it's possible to use Elasticsearch as dataset for the scikit-learn library (https://scikit-learn.org/).

Getting ready

You need an up and running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.

The code for this recipe is in the ch15/code directory and the file used in the following section is the kmeans_example.py.

We will use the iris dataset...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Elasticsearch 7.0 Cookbook. - Fourth Edition
Published in: Apr 2019Publisher: PacktISBN-13: 9781789956504
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro