Packt+ | Advance your knowledge in tech

You're reading from ElasticSearch Cookbook

Product typeBook

Published inDec 2013

Reading LevelBeginner

PublisherPackt

ISBN-139781782166627

Edition1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1)

Alberto Paro

Chapter 8. Rivers

In this chapter, we will cover the following topics:

Managing a river
Using the CouchDB river
Using the MongoDB river
Using the RabbitMQ river
Using the JDBC river
Using the Twitter river

Introduction

There are two ways to insert your data in ElasticSearch. In the previous chapters we have seen the index API, which allows storing documents in ElasticSearch via the PUT/POST API or the bulk shortcut. The other way is to use a service that fetches the data from an external source (one shot or periodically) and puts the data into the cluster.

ElasticSearch names these services as Rivers and the ElasticSearch community provides several rivers to connect to the following data sources:

CouchDB
MongoDB
RabbitMQ
SQL DBMS (Oracle, MySQL, PostgreSQL and so on)
Redis
Twitter
Wikipedia

The rivers are available as external plugins.

In this chapter we'll discuss how to manage a river (creating, checking, and deleting) and how to configure the most common ones.

Managing a river

In ElasticSearch, the following are the two main action-related river setups:

Creating a river
Deleting a river

Getting ready

You need a working ElasticSearch cluster.

How to do it...

For managing a river, we need to perform the following steps:

A river is uniquely defined by a name and a type. The type of the river is the type name defined in the loaded river plugins.
After the name and the type parameters, usually a river requires an extra configuration that can be passed in the _meta property.
To create a river, the HTTP method is PUT (POST also works):
```
curl -XPUT 'http://127.0.0.1:9200/_river/my_river/_meta' -d '{
    "type" : "dummy"
}'
```
The dummy type is a "fake" river always installed in ElasticSearch.

The result will be as follows:

{"ok":true,"_index":"_river","_type":"my_river","_id":"_meta","_version":1}

If you look at ElasticSearch logs, you'll see some new lines, which are as follows:

[2013-08-03 20:48:39,206][INFO ][cluster.metadata         ] [Elsie-Dee] [_river] creating index...

Using the CouchDB river

CouchDB is a NoSQL data store that stores data in the JSON format, similar to ElasticSearch. It can query with map/reduce tasks and it's RESTful, so every operation can be done via HTTP API calls.

Using ElasticSearch to search the CouchDB data is very handy as it extends CouchDB data store with Lucene search capabilities.

Getting ready

You need a working ElasticSearch cluster and a working CouchDB Server to connect to.

How to do it...

For using the CouchDB river, we need to perform the following steps:

Firstly, we need to install the CouchDB river plugin, which is available on GitHub and maintained by the ElasticSearch company. We can install the river plugin in the following way:
```
bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0
```
Tip
The CouchDB river plugin uses the attachment plugin and sometimes JavaScript scripting language, it is good practice to install them.
After restarting the node, we are able to create a configuration (config.json) for our CouchDB...

Using the MongoDB river

MongoDB is a very common NoSQL tool used all over the world. One of its main drawbacks is that it was not designed for text searching.

Thus, the latest MongoDB version provides full text search, its completeness, and functionality are far more limited than the current ElasticSearch version. So it's quite common to use MongoDB as the data store and ElasticSearch for searching. The MongoDB river, which initially was developed by me and now is maintained by Richard Louapre, helps to create a bridge between these two applications.

Getting ready

You need a working ElasticSearch cluster and a working MongoDB instance installed in the same machine of ElasticSearch in replica set (http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ and http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/). You need to restore the sample data available in mongodb/data using the following command:

mongorestore –d escookbook escookbook

How to do it...

For using the MongoDB...

Using the RabbitMQ river

RabbitMQ is a fast message broker, which can handle thousands of messages in a second. It can be very handy to be used in conjunction with ElasticSearch to bulk index records.

The RabbitMQ river plugin is designed to wait for messages that store bulk operations and index them.

Getting ready

You need a working ElasticSearch cluster and a working RabbitMQ instance installed in the same machine of ElasticSearch.

How to do it...

For using the RabbitMQ river, we need to perform the following steps:

Firstly, we need to install the RabbitMQ river plugin, which is available on GitHub (https://github.com/elasticsearch/elasticsearch-river-rabbitmq). We can install the river plugin in the following way:
```
bin/plugin -install elasticsearch/elasticsearch-river-rabbitmq/1.6.0
```

The result should be as follows:

-> Installing elasticsearch/elasticsearch-river-rabbitmq/1.6.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-rabbitmq/elasticsearch-river-rabbitmq...

Using the JDBC river

Generally application data is stored in a DBMS of some kind (Oracle, MySQL, PostgreSql, Microsoft SQL Server, SQLite, and so on), to power up traditional application with advanced search capabilities of ElasticSearch and Lucene. All this data must be imported in ElasticSearch. The JDBC river by Jörg Prante allows to connect to these DBMSs, executes some queries and indexes the results.

Getting ready

You need a working ElasticSearch.

How to do it...

For using the JDBC river, we need to perform the following steps:

Firstly, we need to install the JDBC river plugin, which is available on GitHub (https://github.com/jprante/elasticsearch-river-jdbc). We can install the river plugin in the following way:
```
bin/plugin -url http://bit.ly/145e9Ly -install river-jdbc
```

The result should be as follows:

-> Installing river-jdbc...
Trying http://bit.ly/145e9Ly...
Downloading … .....DONE
Installed river-jdbc into …/elasticsearch/plugins/river-jdbc

Tip

The JDBC river plugin does not bundle DBMS...

Using the Twitter river

In the previous recipes, we have seen rivers that fetch data from data stores, both SQL and NoSQL. In this recipe, we'll discuss how to use the Twitter river to collect tweets from Twitter and store them in ElasticSearch.

Getting ready

You need a working ElasticSearch and OAuth Twitter token. To obtain it, you need to log in to Twitter (https://dev.twitter.com/apps/) and create a new app at https://dev.twitter.com/apps/new.

How to do it...

For using the Twitter river, we need to perform the following steps:

Firstly, we need to install the Twitter river plugin, which is available on Github (https://github.com/elasticsearch/elasticsearch-river-twitter). We can install the river plugin in the usual way as follows:
```
bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.4.0
```

The result should be as follows:

-> Installing elasticsearch/elasticsearch-river-twitter/1.4.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-twitter/elasticsearch...

The rest of the chapter is locked

You have been reading a chapter from

ElasticSearch Cookbook

Published in: Dec 2013Publisher: PacktISBN-13: 9781782166627

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Alberto Paro

Alberto Paro is an engineer, manager, and software developer. He currently works as technology architecture delivery associate director of the Accenture Cloud First data and AI team in Italy. He loves to study emerging solutions and applications, mainly related to cloud and big data processing, NoSQL, Natural language processing (NLP), software development, and machine learning. In 2000, he graduated in computer science engineering from Politecnico di Milano. Then, he worked with many companies, mainly using Scala/Java and Python on knowledge management solutions and advanced data mining products, using state-of-the-art big data software. A lot of his time is spent teaching how to effectively use big data solutions, NoSQL data stores, and related technologies.
Read more about Alberto Paro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages