Reader small image

You're reading from  ElasticSearch Cookbook

Product typeBook
Published inDec 2013
Reading LevelBeginner
PublisherPackt
ISBN-139781782166627
Edition1st Edition
Languages
Right arrow
Author (1)
Alberto Paro
Alberto Paro
author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Right arrow

Chapter 8. Rivers

In this chapter, we will cover the following topics:

  • Managing a river

  • Using the CouchDB river

  • Using the MongoDB river

  • Using the RabbitMQ river

  • Using the JDBC river

  • Using the Twitter river

Introduction


There are two ways to insert your data in ElasticSearch. In the previous chapters we have seen the index API, which allows storing documents in ElasticSearch via the PUT/POST API or the bulk shortcut. The other way is to use a service that fetches the data from an external source (one shot or periodically) and puts the data into the cluster.

ElasticSearch names these services as Rivers and the ElasticSearch community provides several rivers to connect to the following data sources:

  • CouchDB

  • MongoDB

  • RabbitMQ

  • SQL DBMS (Oracle, MySQL, PostgreSQL and so on)

  • Redis

  • Twitter

  • Wikipedia

The rivers are available as external plugins.

In this chapter we'll discuss how to manage a river (creating, checking, and deleting) and how to configure the most common ones.

Managing a river


In ElasticSearch, the following are the two main action-related river setups:

  • Creating a river

  • Deleting a river

Getting ready

You need a working ElasticSearch cluster.

How to do it...

For managing a river, we need to perform the following steps:

  1. A river is uniquely defined by a name and a type. The type of the river is the type name defined in the loaded river plugins.

  2. After the name and the type parameters, usually a river requires an extra configuration that can be passed in the _meta property.

  3. To create a river, the HTTP method is PUT (POST also works):

    curl -XPUT 'http://127.0.0.1:9200/_river/my_river/_meta' -d '{
        "type" : "dummy"
    }'

    The dummy type is a "fake" river always installed in ElasticSearch.

  4. The result will be as follows:

    {"ok":true,"_index":"_river","_type":"my_river","_id":"_meta","_version":1}
  5. If you look at ElasticSearch logs, you'll see some new lines, which are as follows:

    [2013-08-03 20:48:39,206][INFO ][cluster.metadata         ] [Elsie-Dee] [_river] creating index...

Using the CouchDB river


CouchDB is a NoSQL data store that stores data in the JSON format, similar to ElasticSearch. It can query with map/reduce tasks and it's RESTful, so every operation can be done via HTTP API calls.

Using ElasticSearch to search the CouchDB data is very handy as it extends CouchDB data store with Lucene search capabilities.

Getting ready

You need a working ElasticSearch cluster and a working CouchDB Server to connect to.

How to do it...

For using the CouchDB river, we need to perform the following steps:

  1. Firstly, we need to install the CouchDB river plugin, which is available on GitHub and maintained by the ElasticSearch company. We can install the river plugin in the following way:

    bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0

    Tip

    The CouchDB river plugin uses the attachment plugin and sometimes JavaScript scripting language, it is good practice to install them.

  2. After restarting the node, we are able to create a configuration (config.json) for our CouchDB...

Using the MongoDB river


MongoDB is a very common NoSQL tool used all over the world. One of its main drawbacks is that it was not designed for text searching.

Thus, the latest MongoDB version provides full text search, its completeness, and functionality are far more limited than the current ElasticSearch version. So it's quite common to use MongoDB as the data store and ElasticSearch for searching. The MongoDB river, which initially was developed by me and now is maintained by Richard Louapre, helps to create a bridge between these two applications.

Getting ready

You need a working ElasticSearch cluster and a working MongoDB instance installed in the same machine of ElasticSearch in replica set (http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ and http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/). You need to restore the sample data available in mongodb/data using the following command:

mongorestore –d escookbook escookbook

How to do it...

For using the MongoDB...

Using the RabbitMQ river


RabbitMQ is a fast message broker, which can handle thousands of messages in a second. It can be very handy to be used in conjunction with ElasticSearch to bulk index records.

The RabbitMQ river plugin is designed to wait for messages that store bulk operations and index them.

Getting ready

You need a working ElasticSearch cluster and a working RabbitMQ instance installed in the same machine of ElasticSearch.

How to do it...

For using the RabbitMQ river, we need to perform the following steps:

  1. Firstly, we need to install the RabbitMQ river plugin, which is available on GitHub (https://github.com/elasticsearch/elasticsearch-river-rabbitmq). We can install the river plugin in the following way:

    bin/plugin -install elasticsearch/elasticsearch-river-rabbitmq/1.6.0
  2. The result should be as follows:

    -> Installing elasticsearch/elasticsearch-river-rabbitmq/1.6.0...
    Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-rabbitmq/elasticsearch-river-rabbitmq...

Using the JDBC river


Generally application data is stored in a DBMS of some kind (Oracle, MySQL, PostgreSql, Microsoft SQL Server, SQLite, and so on), to power up traditional application with advanced search capabilities of ElasticSearch and Lucene. All this data must be imported in ElasticSearch. The JDBC river by Jörg Prante allows to connect to these DBMSs, executes some queries and indexes the results.

Getting ready

You need a working ElasticSearch.

How to do it...

For using the JDBC river, we need to perform the following steps:

  1. Firstly, we need to install the JDBC river plugin, which is available on GitHub (https://github.com/jprante/elasticsearch-river-jdbc). We can install the river plugin in the following way:

    bin/plugin -url http://bit.ly/145e9Ly -install river-jdbc
  2. The result should be as follows:

    -> Installing river-jdbc...
    Trying http://bit.ly/145e9Ly...
    Downloading … .....DONE
    Installed river-jdbc into …/elasticsearch/plugins/river-jdbc

    Tip

    The JDBC river plugin does not bundle DBMS...

Using the Twitter river


In the previous recipes, we have seen rivers that fetch data from data stores, both SQL and NoSQL. In this recipe, we'll discuss how to use the Twitter river to collect tweets from Twitter and store them in ElasticSearch.

Getting ready

You need a working ElasticSearch and OAuth Twitter token. To obtain it, you need to log in to Twitter (https://dev.twitter.com/apps/) and create a new app at https://dev.twitter.com/apps/new.

How to do it...

For using the Twitter river, we need to perform the following steps:

  1. Firstly, we need to install the Twitter river plugin, which is available on Github (https://github.com/elasticsearch/elasticsearch-river-twitter). We can install the river plugin in the usual way as follows:

    bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.4.0
  2. The result should be as follows:

    -> Installing elasticsearch/elasticsearch-river-twitter/1.4.0...
    Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-twitter/elasticsearch...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
ElasticSearch Cookbook
Published in: Dec 2013Publisher: PacktISBN-13: 9781782166627
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro