Packt+ | Advance your knowledge in tech

You're reading from Learning Couchbase

Product type Book

Published in Nov 2015

Publisher

ISBN-13 9781785288593

Pages 248 pages

Edition 1st Edition

Languages

Concepts

Database Programming

Author (1):

Henry Potsangbam

Table of Contents (12) Chapters

Introduction to Couchbase

The Couchbase Administration Interface

Storing Documents in Couchbase Using Buckets

Designing a Document for Couchbase

Introducing Client SDK

Retrieving Documents without Keys Using Views

Understanding SQL-Like Queries – N1QL

Full Text Search Using ElasticSearch

Data Replication and Compaction

Administration, Tuning, and Monitoring

Case Study – An E-Commerce Application

Index

Chapter 8. Full Text Search Using ElasticSearch

In the earlier chapters, we discussed various ways to fetch documents from buckets. If you have gone through the book from the beginning, you might recall this. Let me repeat what has been covered in the last few chapters. We discussed the retrieval of documents using the document ID, views (using MapReduce programming in JavaScript), and N1QL (Couchbase query language).

You need to read the previous chapters to understand all of these in detail. But you might wonder why you need all of these? There are three ways to retrieve data. More precisely, you could call them an evolution or better ways of fetching documents, depending on the use case.

In this chapter, we will see how to integrate ElasticSearch with Couchbase so that we can perform full text searches. In the first section, we will take an overview of ElasticSearch, and then we will learn how to integrate ElasticSearch with the Couchbase cluster. After that, we will execute some queries...

Understanding content-driven applications

What is a content-driven application? Nowadays, most applications are driven by content only. For instance, e-commerce applications have a lot of contents in terms of catalogs, items descriptions, feedback, and so on. I am pretty sure that you must have browsed through at least one e-commerce website to buy something or the other. Let's say you are looking for a book on Couchbase, so you go to www.packtpub.com to search a book on this. You are at the homepage. Then what do you do? You will look for the search box, right? Luckily, you found it in the homepage in the middle of the page. Then, enter Couchbase in the search box. This box then displays the number of titles with the word Couchbase in it. Out of these options, Learning Couchbase is an option that you need to click on, as shown here:

An e-commerce site

It will display some details of the book. You can read the following details:

Book Details
Table of Contents
About This Book
Who This Book...

Full text search overview

Full text search enables searching of documents by text. It's applicable to a wide range of applications, such as e-business or even analytics (for example, performing sentimental analytics).

Let's add a description in our existing document, as follows:

{
    "book": "Learning Couchbase",
"description": "If you are new to the NoSQL document system or have little or no experience in NoSQL development and administration and are planning to deploy Couchbase for your next project, then this book is for you. It would be helpful to have a bit of familiarity with Java."
}

If we want to search for the term "NoSQL" in any document, we will be using the full text search across the full JSON body. Couchbase provides these features by integrating with ElasticSearch.

So what is this ElasticSearch?

ElasticSearch (ES) is a framework or tool that performs querying or searching of text in a JSON document and returns the documents that matched the search text, along with some statistical...

Configuration and query

In this section, we will configure ElasticSearch and integrate it with Couchbase.

The Couchbase and ES integration architecture (Courtesy - Couchbase Documentation)

Let me briefly explain the architecture and usage pattern of ElasticSearch with the Couchbase cluster. Couchbase can replicate documents to ElasticSearch using XDCR, which will be discussed in detail in the next chapter. ElasticSearch will perform indexing on documents replicated from the Couchbase cluster by each attribute, and the developer will execute the search query in ES and use the resultset returned by ES to get the actual document from the Couchbase cluster, as shown in the preceding diagram. Why are we not storing the actual document in ES and fetching it from ES itself? In order to answer this query, you need to remember that ES is good for searching text and Couchbase is fast at data retrieval, since all documents are, by default, stored in the memory. Here, we are trying to use the best features...

Using the ES query API

Let's understand some queries used for searching documents in ES. Here is a sample document from the LearningCouchbase bucket:

{
  "name": "Henry P",
  "book": "Learning Couchbase",
  "skills": [
    "Couchbase",
    "Cassandra",
    "MongoDB"
  ]
}

We want to find document IDs for which the user document has the Couchbase skillset.

For this, you can query ElasticSearch using the following URL: http://localhost:9200/learningcouchbase/_search?pretty=true&q=skills:Couchbase.

If you want to find user documents with the Couchbase skillset and name it Henry, use http://localhost:9200/learningcouchbase/_search?pretty=true&q=skills:Couchbase+name:Henry&default_operator=AND.

An ES output

Here, the search criterion is provided using a simple query string as a parameter. And we are searching for all the documents with the preceding criteria in the learningcouchbase index only.

These are just the tip of the iceberg. There are a lot of options in ElasticSearch. There are...

An API to connect to ES

We have configured ES with the Couchbase cluster and were able to retrieve documents using some simple ES queries. Now let's discuss the steps involved in querying ES using Java APIs.

Here, I am not showing all of the code, as it will become repetitive. You can download the full source code from the website. I am going to show the code relevant to ES and how to retrieve the resultset.

In this process of querying ES and fetching documents, there are two steps. First, we need to connect to ElasticSearch and search the keywords using the REST API. This will return a resultset in a JSON document, as follows:

String url="http://localhost:9200/learningcouchbase/_search?pretty=true&q=skills:Couchbase+name:Henry";

static String fetchESQuery(String url) throws Exception {
    HttpGet request = new HttpGet(url);
    HttpClient client = HttpClientBuilder.create().build();
    HttpResponse response = client.execute(request);
    
 BufferedReader rd = new BufferedReader(new...

Summary

In this chapter, we discussed what full text search is and how to incorporate ES with Couchbase. You understood the various configuration steps of the ES integration. We executed some queries using ES DSL. Finally, we understood how to retrieve information from Couchbase after searching on ES.

In the next chapter, we will discuss replication of data across the cluster using XDCR features for disaster recovery. You will understand how replication occurs in Couchbase—inter-cluster—what its use cases are, and how to monitor it. In addition to XDCR, we will also cover the compaction process of Couchbase.