Mastering ElasticSearch

By Rafał Kuć , Marek Rogoziński
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction to ElasticSearch

About this book

ElasticSearch is fast, distributed, scalable, and written in the Java search engine that leverages Apache Lucene capabilities providing a new level of control over how you index and search even the largest set of data.

"Mastering ElasticSearch" covers the intermediate and advanced functionalities of ElasticSearch and will let you understand not only how ElasticSearch works, but will also guide you through its internals such as caches, Apache Lucene library, monitoring capabilities, and the Java API. In addition to that you'll see the practical usage of ElasticSearch configuration parameters, monitoring API, and easy-to-use and extend examples on how to extend ElasticSearch by writing your own plugins.

"Mastering ElasticSearch" starts by showing you how Apache Lucene works and what the ElasticSearch architecture looks like. It covers advanced querying capabilities, index configuration control, index distribution, ElasticSearch administration and troubleshooting. Finally you'll see how to improve the user’s search experience, use the provided Java API and develop your own custom plugins.

It will help you learn how Apache Lucene works both in terms of querying and indexing. You'll also learn how to use different scoring models, rescoring documents using other queries, alter how the index is written by using custom postings and what segments merging is, and how to configure it to your needs. You'll optimize your queries by modifying them to use filters and you'll see why it is important. The book describes in details how to use the shard allocation mechanism present in ElasticSearch such as forced awareness.

"Mastering ElasticSearch" will open your eyes to the practical use of the statistics and information API available for the index, node and cluster level, so you are not surprised about what your ElasticSearch does while you are not looking. You'll also see how to troubleshoot by understanding how the Java garbage collector works, how to control I/O throttling, and see what threads are being executed at the any given moment. If user spelling mistakes are making you lose sleep at night - don't worry anymore the book will show you how to configure and use the ElasticSearch spell checker and improve the query relevance of your queries. Last, but not least you'll see how to use the ElasticSearch Java API to use the ElasticSearch cluster from your JVM based application and you'll extend ElasticSearch by writing your own custom plugins.

If you are looking for a book that will allow you to easily extend your basic knowledge about ElasticSearch or you want to go deeper into the world of full text search using ElasticSearch then this book is for you.

 

Publication date:
October 2013
Publisher
Packt
Pages
386
ISBN
9781783281435

 

Chapter 1. Introduction to ElasticSearch

We hope that by reading this book you want to extend and build on basic ElasticSearch knowledge. We have assumed that you already know how to index data to ElasticSearch using single requests as well as bulk indexing. You should also know how to send queries to get the documents you are interested in, how to narrow down the results of your queries by using filtering, and how to calculate statistics for your data with the use of the faceting/aggregation mechanism. However, before getting to the exciting functionality that ElasticSearch offers, we think that we should start with a quick tour of Apache Lucene, the full text search library that ElasticSearch uses to build and search its indices, as well as the basic concepts that ElasticSearch is built on. In order to move forward and extend our learning, we need to ensure we don't forget the basics. It is easy to do. We also need to make sure that we understand Lucene correctly as Mastering ElasticSearch requires this understanding. By the end of this chapter we will have covered:

  • What Apache Lucene is

  • What overall Lucene architecture looks like

  • How the analysis process is done

  • What Apache Lucene query language is and how to use it

  • What are the basic concepts of ElasticSearch

  • How ElasticSearch communicates internally

 

Introducing Apache Lucene


In order to fully understand how ElasticSearch works, especially when it comes to indexing and query processing, it is crucial to understand how Apache Lucene library works. Under the hood, ElasticSearch uses Lucene to handle document indexing. The same library is also used to perform search against the indexed documents. In the next few pages we will try to show you the basics of Apache Lucene, just in case you've never used it.

Getting familiar with Lucene

You may wonder why ElasticSearch creators decided to use Apache Lucene instead of developing their own functionality. We don't know for sure, because we were not the ones that made the decision, but we assume that it was because Lucene is mature, highly performing, scalable, light, and yet, very powerful. Its core comes as a single file of Java library with no dependencies, and allows you to index documents and search them with its out of the box full text search capabilities. Of course there are extensions to Apache Lucene that allows different languages handling, enables spellchecking, highlighting, and much more; but if you don't need those features, you can download a single file and use it in your application.

Overall architecture

Although I would like to jump straight to Apache Lucene architecture, there are some things we need to know first in order to fully understand it, and those are:

  • Document: It is a main data carrier used during indexing and search, containing one or more fields, which contain the data we put and get from Lucene

  • Field: It is a section of the document which is built of two parts, the name and the value

  • Term: It is a unit of search representing a word from text

  • Token: It is an occurrence of a term from the text of the field. It consists of term text, start and end offset, and a type

Apache Lucene writes all the information to the structure called inverted index. It is a data structure that maps the terms in the index to the documents, not the other way around like the relational database does. You can think of inverted index as a data structure where data is term oriented rather than document oriented. Let's see how a simple inverted index can look. For example, let's assume that we have the documents with only title field to be indexed and they look like this:

  • ElasticSearch Server (document 1)

  • Mastering ElasticSearch (document 2)

  • Apache Solr 4 Cookbook (document 3)

So the index (in a very simple way) could be visualized as follows:

As you can see, each term points to the number of documents it is present in. This allows for very efficient and fast search, such as the term-based queries. In addition to that each term has a number connected to it: the count, telling Lucene how often it occurs.

Of course, the actual index created by Lucene is much more complicated and advanced, because term vectors (a small inverted index for a single field, which allows getting all tokens for that particular field) can be stored, original values of the fields can be stored, markers about deleted documents can be written, and so on. But all you need to know is how the data is organized, not what is exactly stored.

Each index is divided into multiple write once and read many time segments. When indexing, after a single segment was written to disk it can't be updated. For example, the information about deleted documents is are stored in a separate file, but the segment itself is not updated.

However, multiple segments can be merged together in a process called segments merge. After forcing segments merge, or after Lucene decides it is time for merging to be performed, segments are merged together by Lucene to create larger ones. This can be I/O demanding, however it is needed to clean up some information, because during that time some information that is not needed anymore is deleted; for example, the deleted documents. In addition to this, searching with the use of one larger segment is faster than searching against multiple smaller ones holding the same data. However, once again, remember that segments merge is I/O demanding operation and you shouldn't force merging, just configure your merge policy carefully.

Note

If you want to know what files are building the segments and what information is stored inside them, please take a look at Apache Lucene documentation available at http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/codecs/lucene45/package-summary.html.

Analyzing your data

Of course the question arises, how the data passed in the documents is transformed into the inverted index and how the query text is changed into terms to allow searching. The process of transforming this data is called analysis.

Analysis is done by the analyzer, which is built of tokenizer and zero or more filters, and can also have zero or more character mappers.

A tokenizer in Lucene is used to divide the text into tokens, which are basically terms with additional information, such as its position in the original text, and its length. The results of the tokenizer work is so called token stream where the tokens are put one by one and are ready to be processed by filters.

Apart from tokenizer, Lucene analyzer is built of zero or more filters that are used to process tokens in the token stream. For example, it can remove tokens from the stream, change them, or even produce new ones. There are numerous filters and you can easily create new ones. Some examples of filters are:

  • Lowercase filter: It makes all the tokens lowercased

  • ASCII folding filter: It removes non ASCII parts from tokens

  • Synonyms filter: It is responsible for changing one token to another on the basis of synonym rules

  • Multiple language stemming filters: These are responsible for reducing tokens (actually the text part that they provide) into their root or base forms, the stem

Filters are processed one after another, so we have almost unlimited analysis possibilities with adding multiple filters one after another.

The last thing is the character mappings, which is used before tokenizer and is responsible for processing text before any analysis is done. One of the examples of character mapper is HTML tags removal process.

Indexing and querying

We may wonder how that all affects indexing and querying when using Lucene and all the software that is built on top of it. During indexing, Lucene will use analyzer of your choice to process contents of your document; of course different analyzer can be used for different fields, so the title field of your document can be analyzed differently compared to the description field.

During query time, if you use one of the provided query parsers, your query will be analyzed. However, you can also choose the other path and not analyze your queries. This is crucial to remember, because some of the ElasticSearch queries are being analyzed and some are not. For example, the prefix query is not analyzed and the match query is analyzed.

What you should remember about indexing and querying analysis is that the index should be matched by the query term. If they don't match, Lucene won't return the desired documents. For example, if you are using stemming and lowercasing during indexing, you need to be sure that the term in the query are also lowercased and stemmed, or your queries will return no results at all.

Lucene query language

Some of the query types provided by ElasticSearch support Apache Lucene query parser syntax. Because of that, let's go deeper into Lucene query language and describe it.

Understanding the basics

A query is divided by Apache Lucene into terms and operators. A term, in Lucene, can be a single word or a phrase (group of words surrounded by double quote characters). If the query is set to be analyzed, the defined analyzer will be used on each of the terms that form the query.

A query can also contain Boolean operators that connect terms to each other forming clauses. The list of Boolean operators is as follows:

  • AND: It means that the given two terms (left and right operand) need to match in order for the clause to be matched. For example, we would run a query, such as apache AND lucene, to match documents with both apache and lucene terms in a document.

  • OR: It means that any of the given terms may match in order for the clause to be matched. For example, we would run a query, such as apache OR lucene, to match documents with apache or lucene (or both) terms in a document.

  • NOT: It means that in order for the document to be considered a match, the term appearing after the NOT operator must not match. For example, we would run a query lucene NOT elasticsearch to match documents that contain lucene term, but not elasticsearch term in the document.

In addition to that, we may use the following operators:

  • +: It means that the given term needs to be matched in order for the document to be considered as a match. For example, in order to find documents that match lucene term and may match apache term, we would run a query, such as +lucene apache.

  • -: It means that the given term can't be matched in order for the document to be considered a match. For example, in order to find document with lucene term, but not elasticsearch term we would run a query, such as +lucene -elasticsearch.

When not specifying any of the previous operators, the default OR operator will be used.

In addition to all these, there is one more thing; you can use parenthesis to group clauses together. For example, with something like this:

elasticsearch AND (mastering OR book)

Querying fields

Of course, just like in ElasticSearch, in Lucene all your data is stored in fields that build the document. In order to run a query against a field, you need to provide the field name, add the colon character, and provide the clause that should be run against that field. For example, if you would like to match documents with the term elasticsearch in the title field, you would run a query like this:

title:elasticsearch

You can also group multiple clauses to a single field. For example, if you would like your query to match all the documents having the elasticsearch term and the mastering book phrase in the title field, you could run a query like this:

 title:(+elasticsearch +"mastering book")

Of course, the previous query can also be expressed in the following way:

+title:elasticsearch +title:"mastering book"

Term modifiers

In addition to the standard field query with a simple term or clause, Lucene allows us to modify the terms we pass in the query with modifiers. The most common modifiers, which you are surely familiar with, are wildcards. There are two wildcards supported by Lucene the ? and *. The first one will match any character and the second one will match multiple characters.

Note

Please note by default these wildcard characters can't be used as the first character in a term because of the performance reasons.

In addition to that, Lucene supports fuzzy and proximity searches with the use of ~ character and an integer following it. When used with a single word term, it means that we want to search for terms that are similar to the one we've modified (so, called fuzzy search). The integer after the ~ character specifies the maximum number of edits that can be done to consider the term similar. For example, if we ran a query, such as writer~2, both the terms writer and writers would be considered a match.

When the ~ character is used on a phrase, the integer number we provide is telling Lucene how much distance between words is acceptable. For example, let's take the following query:

title:"mastering elasticsearch"

It would match the document with the title field containing mastering elasticsearch, but not mastering book elasticsearch. However, if we ran a query, such as title:"mastering elasticsearch"~2, it would result in both example documents matched.

In addition to that we can use boosting in order to increase our term importance by using the ^ character and providing a float number. The boost lower than one would result in decreasing the importance, boost higher than one will result in increasing the importance, and the default boost value is 1. Please refer to the Default Apache Lucene scoring explained section in Chapter 2, Power User Query DSL, for further reference what boosting is and how it is taken into consideration during document scoring.

In addition to all these, we can use square and curly brackets to allow range searching. For example, if we would like to run a range search on a numeric field we could run the following query:

price:[10.00 TO 15.00]

The above query would result in all documents with the price field between 10.00 and 15.00 inclusive.

In case of string based fields, we also can run a range query, for example:

name:[Adam TO Adria]

The previous query would result in all documents containing all the terms between Adam and Adria in the name field including them.

If you would like your range bound or bounds to be exclusive, use curly brackets instead of the square ones. For example, in order to find documents with the price field between 10.00 inclusive and 15.00 exclusive, we would run the following query:

price:[10.00 TO 15.00}

Handling special characters

In case you want to search for one of the special characters (which are +, -, &&, ||, !, (, ), { }, [ ], ^, ", ~, *, ?, :, \, /), you need to escape it with the use of the backslash (\) character. For example, to search for abc"efg term you need to do something like this:

abc\"efg
 

Introducing ElasticSearch


If you hold this book in your hands, you are probably familiar with ElasticSearch, at least the core concepts and basic usage. However, in order to fully understand how this search engine works, let's discuss it briefly.

As you probably know ElasticSearch is production-ready software for building search-oriented applications. It was originally started by Shay Banon and published in February 2010. After that it has rapidly gained popularity just within a few years, and became an important alternative to other open source and commercial solutions. It is one of the most downloaded open source projects, hitting more than 200,000 downloads a month.

Basic concepts

Let's go through the basic concepts of ElasticSearch and its features.

Index

ElasticSearch stores its data in one or more indices. Using analogies from the SQL world, index is something similar to a database. It is used to store the documents and read them from it. As we already mentioned, under the hood, ElasticSearch uses Apache Lucene library to write and read the data from the index. What one should remember about is that a single ElasticSearch index may be built of more than a single Apache Lucene index, by using shards and replicas.

Document

Document is the main entity in the ElasticSearch world (and also in Lucene world). At the end, all use cases of using ElasticSearch can be brought to a point where it is all about searching for documents. Document consists of fields and each field has a name and one or many values (in this case, field is called multi-valued). Each document may have a different set of fields; there is no schema or imposed structure. It should look familiar (these are the same rules as for Lucene documents). In fact, ElasticSearch documents are stored as Lucene documents. From the client point of view, document is a JSON object (see more about JSON format at http://en.wikipedia.org/wiki/JSON).

Mapping

As you already read in the Introducing Apache Lucene section, all documents are analyzed before being stored. We can configure how the input text is divided into tokens, which tokens should be filtered out, or what additional processing, such as removing HTML tags, is needed. In addition, various features are offered by ElasticSearch, such as sorting needs information about fields contents. This is where mapping comes to play: it holds all of these information. Besides the fact that ElasticSearch can automatically discover field type by looking at its value, sometimes (in fact usually always) we will want to configure the mappings ourselves to avoid unpleasant surprises.

Type

Each document in ElasticSearch has its type defined. This allows us to store various document types in one index and have different mappings for different document types.

Node

The single instance of the ElasticSearch server is called a node. A single node ElasticSearch deployment can be sufficient for many simple use cases, but when you have to think about fault tolerance or you have lots of data that cannot fit in a single server, you should think about multi-node ElasticSearch cluster.

Cluster

Cluster is a set of ElasticSearch nodes that work together to handle the load bigger than single instance can handle (both in terms of handling queries and documents). This is also the solution which allows us to have uninterrupted work of application even if several machines (nodes) are not available due to outage or administration tasks, such as upgrade. The ElasticSearch provides clustering almost seamlessly. In our opinion, this is one of the major advantages over competition; setting up a cluster in ElasticSearch world is really easy.

Shard

As we said previously, clustering allows us to store information volumes that exceed abilities of a single server. To achieve this requirement, ElasticSearch spread data to several physical Lucene indices. Those Lucene indices are called shards and the process of this spreading is called sharding. ElasticSearch can do this automatically and all parts of the index (shards) are visible to the user as one-big index. Note that besides this automation, it is crucial to tune this mechanism for particular use case because the number of shard index is built or is configured during index creation and cannot be changed later, at least currently.

Replica

Sharing allows us to push more data into ElasticSearch that is possible for a single node to handle. Replicas can help where load increases and a single node is not able to handle all the requests. The idea is simple: create additional copy of a shard, which can be used for queries just as original, primary shard. Note that we get safety for free. If the server with the shard is gone, ElasticSearch can use replica and no data is lost. Replicas can be added and removed at any time, so you can adjust their numbers when needed.

Gateway

During its work, ElasticSearch collects various information about cluster state, indices settings, and so on. This data is persisted in the gateway.

Key concepts behind ElasticSearch architecture

ElasticSearch was built with few concepts in mind. The development team wanted to make it easy to use and scalable, and these core features are visible in every corner of ElasticSearch. From the architectural perspective, the main features are:

  • Reasonable default values that allow the user to start using ElasticSearch just after installing it, without any additional tuning. This includes built-in discovery (for example, field types) and auto configuration.

  • Working in distributed mode by default. Nodes assume that there are or will be a part of the cluster, and during setup nodes try to automatically join the cluster.

  • Peer-to-peer architecture without single point of failure (SPOF). Nodes automatically connect to other machines in the cluster for data interchange and mutual monitoring. This covers automatic replication of shards.

  • Easily scalable both in terms of capacity and the number of data by adding new nodes to cluster.

  • ElasticSearch does not impose restriction on data organization in the index. This allows users to adjust to existing data model. As we noted in type description, ElasticSearch supports multiple data types in a single index and adjustment to business model includes handling relation between documents (although, this functionality is rather limited).

  • Near Real Time (NRT) searching and versioning. Because of distributed nature of ElasticSearch, there is no possibility to avoid delays and temporary differences between data located on the different nodes. ElasticSearch tries to reduce these issues and provide additional mechanisms as versioning.

Working of ElasticSearch

Let's now discuss briefly how ElasticSearch works.

The boostrap process

When the ElasticSearch node starts, it uses multicast (or unicast, if configured) to find the other nodes in the same cluster (the key here is the cluster name defined in the configuration) and connect to them. You can see the process illustrated in the following figure:

In the cluster, one of the nodes is elected as the master node. This node is responsible for managing the cluster state and process of assigning shards to nodes in reaction of changes in cluster topology.

Note

Note that a master node in ElasticSearch has no importance from the user perspective, which is different from other systems available (such as the databases). In practice you do not need to know which node is a master node; all operations can be sent to any node, and internally ElasticSearch will do all the magic. If necessary, any node can send subqueries parallel to other nodes and merge responses to return the full response to the user. All of this is done without accessing master node (nodes operate in peer-to-peer architecture).

The master node reads the cluster state and if necessary, goes into recovery process. During this state, it checks which shards are available and decides which shards will be the primary shards. After this the whole cluster enters into yellow state.

This means that a cluster is able to run queries but full throughput and all possibilities are not achieved yet (it basically means that all primary shard are allocated, but replicas are not). The next thing to do is find duplicated shards and treat them as replicas. When a shard has too few replicas, the master node decides where to put missing shards and additional replica are created based on a primary shard. If everything went well, the cluster enters into a green state (which means that all primary shard and replicas are allocated).

Failure detection

During normal cluster work, the master node monitors all the available nodes and checks if they are working. If any of them are not available for configured amount of time, the node is treated as broken and process of handling failure starts. This may mean rebalancing of the cluster—shards, which were present on the broken node are gone and for each such shard other nodes have to take responsibility. In other words, for every lost primary shard, a new primary shard should be elected from the remaining replicas of this shard. The whole process of placing new shards and replicas can (and usually should) be configured to match our needs. More information about it can be found in Chapter 4, Index Distribution Architecture.

Just to illustrate how it works, let's take an example of three nodes cluster, there will be a single master node and two data nodes. The master node will send the ping requests to other nodes and wait for the response. If the response won't come (actually how many ping requests may fail depends on the configuration), such a node will be removed from the cluster.

Communicating with ElasticSearch

We talked about how ElasticSearch is built, but after all, the most important part for us is how to feed it with data and how to build your queries. In order to do that ElasticSearch exposes a sophisticated API. The primary API is REST based (see http://en.wikipedia.org/wiki/Representational_state_transfer) and is easy to integrate with practically any system that can send HTTP requests.

ElasticSearch assumes that data is sent in the URL, or as the request body as JSON document (http://en.wikipedia.org/wiki/JSON). If you use Java or language based on JVM, you should look at Java API, which in addition to everything that is offered by the REST API has built-in cluster discovery.

It is worth mentioning that Java API is also used internally by the ElasticSearch itself to do all the node to node communication. You will find more about Java API in Chapter 8, ElasticSearch Java APIs, but for now let's briefly look on the possibilities and functionality exposed by this API. Note that we treat this as a little reminder (this book assumes that you have used these elements already). If not, we strongly suggest reading about this, for example, our ElasticSearch Server book covers all this information.

Indexing data

ElasticSearch has four ways of indexing data. The easiest way is using the index API, which allows you to send one document to a particular index. For example, by using the curl tool (see http://curl.haxx.se/), we can create a new document by using the following command:

curl -XPUT http://localhost:9200/blog/article/1 -d '{"title": "New
version of Elastic Search released!", "content": "...", "tags":
["announce", "elasticsearch", "release"] }'

The second and third way allows us to send many documents using the bulk API and the UDP bulk API. The difference between methods is the connection type. Common bulk command sends documents by HTTP protocol and UDP bulk sends these using connectionless datagram protocol. This is faster but not so reliable. The last method uses plugins, called rivers. The river runs on the ElasticSearch node and is able to fetch data from the external systems.

One thing to remember is that the indexing only takes place on the primary shard, not on the replica. If the indexing request will be sent to a node, which doesn't have the correct shard or contains replica, it will be forwarded to the primary shard.

Querying data

Query API is a big part of ElasticSearch API. Using the Query DSL (JSON based language for building complex queries), we can:

  • Use various query types including: simple term query, phrase, range, boolean, fuzzy, span, wildcard, spatial, and other query

  • Build complex queries with the use of simple queries combined together

  • Filter documents, throwing away ones, which does not match selected criteria without influencing the scoring

  • Find documents similar to given document

  • Find suggestions and corrections of a given phrase

  • Build dynamic navigation and calculate statistics using faceting

  • Use prospective search and find queries matching given document

When talking about querying, the important thing is that query is not a simple, single stage process. In general, the process can be divided into two phases, the scatter phase and the gather phase. The scatter phase is about querying all the relevant shards of your index. The gather phase is about gathering the results from the relevant shards, combining them, sorting, processing, and returning to the client.

Note

You can control the scatter and gather phases by specifying the search type to one of the six values currently exposed by ElasticSearch. We've talked about query scope in our previous book ElasticSearch Server, by Packt Publishing.

Index configuration

We already talked about automatic index configuration and ability to guess document field types and structure. Of course, ElasticSearch gives us the possibility to alter this behavior. We may, for example, configure our own document structure with the use of mappings, set the number of shards and replicas index will be built of, configure the analysis process, and so on.

Administration and monitoring

The administration and monitoring part of API allows us to change the cluster settings, for example, to tune the discovery mechanism or change index placement strategy. You can find various information about cluster state or statistics regarding each node and index. The API for the cluster monitoring is very comprehensive and example usage will be discussed in Chapter 5, ElasticSearch Administration.

 

Summary


In this chapter we've looked at the general architecture of Apache Lucene, how it works, how the analysis process is done, and how to use Apache Lucene query language. In addition to that we've discussed the basic concepts of ElasticSearch, its architecture, and internal communication.

In the next chapter you'll learn about the default scoring formula Apache Lucene uses, what the query rewrite process is, and how it works. In addition to that we'll discuss some of the ElasticSearch functionality, such as query rescore, multi near real-time get, and bulk search operations. We'll also see how to use the update API to partially update our documents, how to sort our data, and how to use filtering to improve performance of our queries. Finally, we'll see how we can leverage the use of filters and scopes in the faceting mechanism.

About the Authors

  • Rafał Kuć

    Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days.

    Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest.

    Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.

    Browse publications by this author
  • Marek Rogoziński

    Marek Rogoziński is a software architect and consultant with more than 10 years of experience. His specialization concerns solutions based on open source search engines, such as Solr and Elasticsearch, and the software stack for big data analytics including Hadoop, Hbase, and Twitter Storm.

    He is also a cofounder of the solr.pl site, which publishes information and tutorials about Solr and Lucene libraries. He is the coauthor of ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.

    He is currently the chief technology officer and lead architect at ZenCard, a company that processes and analyzes large quantities of payment transactions in real time, allowing automatic and anonymous identification of retail customers on all retailer channels (m-commerce/e-commerce/brick&mortar) and giving retailers a customer retention and loyalty tool.

    Browse publications by this author
Book Title
Access this book, plus 7,500 other titles for FREE
Access now