Reader small image

You're reading from  Elasticsearch 5.x Cookbook - Third Edition

Product typeBook
Published inFeb 2017
Publisher
ISBN-139781786465580
Edition3rd Edition
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Indexing data via Apache Spark


After having installed Apache Spark, we can configure it to work with Elasticsearch and write some data in it.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2Downloading and Setup.

You also need a working installation of Apache Spark.

How to do it...

To configure Apache Spark to communicate with Elasticsearch, we will perform the following steps:

  1. We need to download the ElasticSearch Spark JAR:

            wget http://download.elastic.co/hadoop/elasticsearch-hadoop-
            5.1.1.zip 
            unzip elasticsearch-hadoop-5.1.1.zip 
    
  2. A quick way to access the Spark shell in Elasticsearch is to copy the Elasticsearch Hadoop required file in Spark's .jar directory. The file that must be copied is elasticsearch-spark-20_2.11-5.1.1.jar.

    The version of Scala used by both Apache Spark and Elasticsearch Spark must match!

For storing data in Elasticsearch via Apache...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Elasticsearch 5.x Cookbook - Third Edition
Published in: Feb 2017Publisher: ISBN-13: 9781786465580

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro