Querying the Data Grid in Coherence 3.5: Obtaining Query Results and Using Indexes

The easiest way to obtain query results is to invoke one of the QueryMap.entrySet methods:

Filter filter = ...;
Set<Map.Entry> results = cache.entrySet(filter);

This will return a set of Map.Entry instances representing both the key and the value of a cache entry, which is likely not what you want. More often than not you need only values, so you will need to iterate over the results and extract the value from each Map.Entry instance:

List values = new ArrayList(results.size());
for (Map.Entry entry : entries) {

After doing this a couple times you will probably want to create a utility method for this task. Because all the queries should be encapsulated within various repository implementations, we can simply add the following utility methods to our AbstractCoherenceRepository class:

public abstract class AbstractCoherenceRepository<K, V extends
Entity<K>> {
protected Collection<V> queryForValues(Filter filter) {
Set<Map.Entry<K, V>> entries = getCache().entrySet(filter);
return extractValues(entries);
protected Collection<V> queryForValues(Filter filter,
Comparator comparator) {
Set<Map.Entry<K, V>> entries =
getCache().entrySet(filter, comparator);
return extractValues(entries);
private Collection<V> extractValues(Set<Map.Entry<K, V>> entries) {
List<V> values = new ArrayList<V>(entries.size());
for (Map.Entry<K, V> entry : entries) {
return values;

What happened to the QueryMap.values() method?
Obviously, things would be a bit simpler if the QueryMap interface also had an overloaded version of the values method that accepts a filter and optionally comparator as arguments.
I'm not sure why this functionality is missing from the API, but I hope it will be added in one of the future releases. In the meantime, a simple utility method is all it takes to provide the missing functionality, so I am not going to complain too much.

Controlling query scope using data affinity

Data affinity can provide a significant performance boost because it allows Coherence to optimize the query for related objects. Instead of executing the query in parallel across all the nodes and aggregating the results, Coherence can simply execute it on a single node, because data affinity guarantees that all the results will be on that particular node. This effectively reduces the number of objects searched to approximately C/N, where C is the total number of objects in the cache query is executed against, and N is the number of partitions in the cluster.

However, this optimization is not automatic—you have to target the partition to search explicitly, using KeyAssociatedFilter:

Filter query = ...;
Filter filter = new KeyAssociatedFilter(query, key);

In the previous example, we create a KeyAssociatedFilter that wraps the query we want to execute. The second argument to its constructor is the cache key that determines the partition to search.

To make all of this more concrete, let's look at the final implementation of the code for our sample application that returns account transactions for a specific period. First, we need to add the getTransactions method to our Account class:

public Collection<Transaction> getTransactions(Date from, Date to) {
return getTransactionRepository().findTransactions(m_id, from, to);

Finally, we need to implement the findTransactions method within the CoherenceTransactionRepository:

public Collection<Transaction> findTransactions(
Long accountId, Date from, Date to) {
Filter filter = new FilterBuilder()
.equals("id.accountId", accountId)
.between("time", from, to)
return queryForValues(
new KeyAssociatedFilter(filter, accountId),
new DefaultTransactionComparator());

As you can see, we target the query using the account identifier and ensure that the results are sorted by transaction number by passing DefaultTransactionComparator to the queryForValues helper method we implemented earlier. This ensures that Coherence looks for transactions only within the partition that the account with the specified id belongs to.

Querying near cache

One situation where a direct query using the entrySet method might not be appropriate is when you need to query a near cache.

Because there is no way for Coherence to determine if all the results are already in the front cache, it will always execute the query against the back cache and return all the results over the network, even if some or all of them are already present in the front cache. Obviously, this is a waste of network bandwidth.

What you can do in order to optimize the query is to obtain the keys first and then retrieve the entries by calling the CacheMap.getAll method:

Filter filter = ...;
Set keys = cache.keySet(filter);
Map results = cache.getAll(keys);

The getAll method will try to satisfy as many results as possible from the front cache and delegate to the back cache to retrieve only the missing ones. This will ensure that we move the bare minimum of data across the wire when executing queries, which will improve the throughput.

However, keep in mind that this approach might increase latency, as you are making two network roundtrips instead of one, unless all results are already in the front cache. In general, if the expected result set is relatively small, it might make more sense to move all the results over the network using a single entrySet call.

Another potential problem with the idiom used for near cache queries is that it could return invalid results. There is a possibility that some of the entries might change between the calls to keySet and getAll. If that happens, getAll might return entries that do not satisfy the filter anymore, so you should only use this approach if you know that this cannot happen (for example, if objects in the cache you are querying, or at least the attributes that the query is based on, are immutable).

Sorting the results

We have already seen that the entrySet method allows you to pass a Comparator as a second argument, which will be used to sort the results. If your objects implement the Comparable interface you can also specify null as a second argument and the results will be sorted based on their natural ordering. For example, if we defined the natural sort order for transactions by implementing Comparable within our Transaction class, we could've simply passed null instead of a DefaultTransactionComparator instance within the findTransactions implementation shown earlier.

On the other hand, if you use near cache query idiom, you will have to sort the results yourself. This is again an opportunity to add utility methods that allow you to query near cache and to optionally sort the results to our base repository class. However, there is a lot more to cover in this article, so I will leave this as an exercise for the reader.

Paging over query results

The LimitFilter is somewhat special and deserves a separate discussion. Unlike other filters, which are used to compose query criteria, the LimitFilter is used to control how many result items are returned at a time. Basically, it allows you to page through query results n items at a time.

This also implies that unlike other filters, which are constructed, executed, and discarded, an instance of a LimitFilter is something you might need to hold on to for an extended period of time, as it is a mutable object that keeps track of the current page number, top and bottom anchor objects, and other state that is necessary to support paging.

Let's look at a simple example to better demonstrate the proper usage of a LimitFilter:

NamedCache countries = CacheFactory.getCache("countries");
LimitFilter filter = new LimitFilter(
new LikeFilter("getName", "B%"), 5);
Set<Map.Entry> entries = countries.entrySet(filter, null);
// contains countries 1-5 whose name starts with a letter 'B'
entries = countries.entrySet(filter, null);
// contains countries 6-10 whose name starts with a letter 'B'
entries = countries.entrySet(filter, null);
// contains countries 21-25 whose name starts with 'B'

As you can see, you can page through the result by executing the same query over and over again and modifying the current page of the LimitFilter between the query executions by calling the nextPage, previousPage, or setPage method.

The LimitFilter is extremely powerful as it allows you to execute the main query only once and then obtain the results in chunks of the size you specify. It maps very nicely to a common requirement for results paging within a web application, allowing you to bring the web server only the data it needs to generate the current page, thus reducing network traffic and improving application performance and scalability. You can safely store an instance of a LimitFilter within a user's HTTP session and reuse it later when the user navigates to another page of the results.

One thing to note in the preceding example is that we are using the entrySet method to retrieve the results, contrary to what we have discussed in the previous section. The reason for that is that we want to return countries sorted by name (natural order), and as I mentioned earlier, if we need to support paging over the sorted results we have no other option but to sort them within the cluster using an overload of the entrySet method that accepts a comparator.

However, this is really not an issue, as the amount of data sent over the wire will be naturally limited by the LimitFilter itself and will typically be very small, so we don't need to optimize the query for the near caching scenario.

Using indexes to improve query performance

Just as you can use indexes to improve query performance against a relational database, you can use them to improve the performance of a Coherence query. That is not to say that Coherence indexes are the same as database indexes—in fact, they are very different, and we'll discuss how indexes are implemented in Coherence shortly.

However, they are similar in the way they work, as they allow query processor to optimize queries by:

  1. Limiting the number of entries that have to be evaluated by the filter
  2. Avoiding the need for object deserialization by providing the necessary information within the index itself

Both of these features are very important and can have a significant impact on query performance. For that reason, it is recommended that you always create indexes for the attributes that you query on.

Anatomy of an Index

A Coherence index is an instance of a class that implements the com.tangosol.util.MapIndex interface:

public interface MapIndex {
ValueExtractor getValueExtractor();
boolean isOrdered();
Map getIndexContents();
Object get(Object key);

The getValueExtractor method returns the value extractor used to extract the attribute that should be indexed from an object, while the isOrdered method returns whether the index is sorted or not.

The get method allows us to obtain a value of the indexed attribute for the specified key directly from an index, which avoids object deserialization and repeat value extraction.

Finally, the getIndexContents method returns the actual index contents. This is a map that uses the value extracted from the indexed attribute as a key, while the value for each index entry is a set of cache keys corresponding to that attribute value.

Looking at an example should make the previous paragraph much easier to understand.

Let's assume that we have the following entries in the cache:




Person(firstName = 'Aleksandar', lastName = 'Seovic')


Person(firstName = 'Marija', lastName = 'Seovic')


Person(firstName = 'Ana Maria', lastName = 'Seovic')


Person(firstName = 'Novak', lastName = 'Seovic')


Person(firstName = 'Aleksandar', lastName = 'Jevic')

If we create an index on the lastName attribute, our index contents will look like this:




{ 5 }


{ 1, 2, 3, 4 }


{ 1, 5 }

Ana Maria

{ 3 }


{ 2 }


{ 4 }

Index internals
Keep in mind that while I'm showing the actual values in the previous examples, the Coherence index actually stores both keys and values in their internal binary format.
For the most part you shouldn't care about this fact, but it is good to know if you end up accessing index contents directly.

The previous example should also make it obvious why indexes have such a profound effect on query performance.

If we wanted to obtain a list of keys for all people in the cache that have last name 'Seovic', without an index Coherence would have to deserialize each cache entry, extract the lastName attribute and perform a comparison, and if the comparison matches then it would add the cache entry key to the resulting list of keys.

With an index, Coherence doesn't need to do any of this—it will simply look up an index entry with the key Seovic and return the set of keys from that index entry.

Creating indexes

Now that we know how Coherence indexes are structured and why we should use them, let's look at how we can create them.

At the beginning of this article, we showed an incomplete definition of a QueryMap interface. What we omitted from it are the two methods that allow us to create and remove cache indexes:

public interface QueryMap extends Map {
Set keySet(Filter filter);
Set entrySet(Filter filter);
Set entrySet(Filter filter, Comparator comparator);
void addIndex(ValueExtractor extractor,
boolean isOrdered,
Comparator comparator);
void removeIndex(ValueExtractor extractor);

As you can see, in order to create an index you need to specify three things:

  • The value extractor that should be used to retrieve attribute value to use as an index key
  • The flag specifying whether the index should be ordered or not
  • Finally, the comparator to use for ordering

The first argument is by far the most important, as it determines index contents. It is also used as the index identifier, which is why you need to ensure that all value extractors you create implement the equals and hashCode methods properly.

If you decide to create an ordered index, index entries will be stored within a SortedMap instance, which introduces some overhead on index updates. Because of that, you should only order indexes for attributes that are likely to be used for sorting query results or in range queries, such as greater, less, between, and so on.

The last argument allows you to specify a comparator to use for index ordering, but you can specify null if the attribute you are indexing implements the Comparable interface and you want the index to use natural ordering. Of course, if an index is not ordered, you should always specify null for this argument.

Now that we know all of this, let's see what the code to define indexes on firstName and lastName attributes from the previous example should look like:

NamedCache people = CacheFactory.getCache("people");
people.addIndex(new PropertyExtractor("firstName"), false, null);
people.addIndex(new PropertyExtractor("lastName"), true, null);

As you can see, adding indexes to a cache is very simple. In this case we have created an unordered index on the firstName attribute and an ordered index using natural string ordering on the lastName attribute.

The last thing you should know is that the call to the addIndex method is treated by Coherence as a hint that an index should be created. What this means in practice is that you can safely create the same set of indexes on each Coherence node, even if another node has already created those same indexes. If an index for the specified extractor already exists, Coherence will simply ignore all subsequent requests for its creation.

Coherence query limitations

You have probably noticed by now that all the filters we have used as examples are evaluated against a single cache. This is one of the limitations of the Coherence query mechanism—it is not possible to perform an equivalent of a table join and execute the query against it.

However, while the ability to execute queries across multiple caches would come in handy occasionally, this is not too big a problem in practice. In most cases, you can perform any necessary table joins before loading the data into the cache, so you end up with all the information you need to query on in a single cache. Remember, the purpose of Coherence is to bring data closer to the application in a format that is easily consumable by the application, so transforming data from multiple tables into instances of a single aggregate is definitely something that you will be doing often.

That said, there will still be cases when you don't have all the data you need in a single cache, and you really, really need that join. In those cases the solution is to execute the query directly against the backend database and obtain a list of identifiers that you can use to retrieve objects from a cache.

Another important limitation you should be aware of is that Coherence queries only take into account objects that are already in the cache— they will not load any data from the database into the cache automatically. Because partial results are typically not what you want, this implies that you need to preload all the data into the cache before you start executing queries against it.

Alternatively, you can choose not to use Coherence queries at all and adopt the same approach as in the previous case by querying the database directly in order to obtain identifiers for the objects in the result, and using those identifiers to look up objects in the cache. Of course, this assumes that your cache is configured to automatically load the missing objects from the database on gets.


In this article, you have learned how to Obtaining Query Results and about using Indexes. I cannot stress enough how important indexes are—make sure that you always use them and you will avoid a lot of potential performance problems.

In the next article we will cover Coherence Aggregators.

If you have read this article you may be interested to view :

You've been reading an excerpt of:

Oracle Coherence 3.5

Explore Title