Reader small image

You're reading from  RavenDB 2.x Beginner's Guide

Product typeBook
Published inSep 2013
PublisherPackt
ISBN-139781783283798
Edition1st Edition
Right arrow
Author (1)
Khaled Tannir
Khaled Tannir
author image
Khaled Tannir

Khaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and later with Commodore home computer products (Vic 20, Commodore 64, Commodore 128D, and Amiga 500). He has a Bachelor's degree in Electronics, a Master's degree in System Information Architectures, in which he graduated with a professional thesis, and completed his education with a Master of Research degree. He is a Microsoft Certified Solution Developer (MCSD) and has more than 20 years of technical experience leading the development and implementation of software solutions and giving technical presentations. He now works as an independent IT consultant and has worked as an infrastructure engineer, senior developer, and enterprise/solution architect for many companies in France and Canada. With significant experience in Microsoft .Net, Microsoft Server Systems, and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions, and multilingual applications in both domains: Internet and Desktops. He is always researching new technologies, learning about them, and looking for new adventures in France, North America, and the Middle-east. He owns an IT and electronics laboratory with many servers, monitors, open electronic boards such as Arduino, Netduino, RaspBerry Pi, and .Net Gadgeteer, and some smartphone devices based on Windows Phone, Android, and iOS operating systems. In 2012, he contributed to the EGC 2012 (International Complex Data Mining forum at Bordeaux University, France) and presented, in a workshop session, his work on "how to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize the use of data mining algorithms such as k-means and Apriori in a cloud computing environment. He is the author of RavenDB 2.x Beginner's Guide, Packt Publishing. He aims to get a PhD in Cloud Computing and Big Data and wants to learn more and more about these technologies. He enjoys taking landscape and night time photos, travelling, playing video games, creating funny electronic gadgets with Arduino/.Net Gadgeteer, and of course, spending time with his wife and family. You can reach him at contact@khaledtannir.net.
Read more about Khaled Tannir

Right arrow

Chapter 4. RavenDB Indexes and Queries

Wherever you use a database, you need some queries with search criteria to retrieve your data from this database. This chapter takes us forward towards querying the data in RavenDB.

In this chapter, we will learn how RavenDB indexes work and why we need them. Then, we will cover the different types of indexes and the problem that RavenDB indexes aim to solve.

You will learn about Map/Reduce and how RavenDB indexes implement this paradigm and use it to retrieve data from the server. With a step-by-step approach, we will create some indexes and learn how to query them.

In this chapter, we will cover:

  • RavenDB Map/Reduce implementation

  • RavenDB dynamic indexes

  • RavenDB static indexes

  • RavenDB stale indexes

The RavenDB indexes


All storage systems use indexes to find data quickly when a query is processed. In a database system, an index is a data structure that improves data retrieval operations. Therefore, creating a proper index can drastically increase the performance of an application.

An index in a relational database is very similar to an index in the back of a book. When a database server has no index to use for searching, the result is similar to the reader who looks at every page in a book to find a word. The database engine needs to visit every row in the table. In relational database terminology, we call this behavior a table scan, which becomes slower and more expensive as a table grows to thousands or millions of rows.

RavenDB indexes are used to retrieve data from the server but they do not work the same way as relational database indexes work. The main difference is that relational database indexes are schema-based and RavenDB is a schema-less document-oriented database, which means...

RavenDB Map/Reduce implementation


Map/Reduce is a programming model and an associated implementation for processing and generating large datasets. RavenDB indexes are Map/Reduce implementations and allow you to perform aggregations over multiple documents. Indexes use a Map function to specify what to retrieve from the server and optionally use Reduce and Transform functions to specify which results will be returned to the client.

Developer specifies one or more Map function(s) that processes a documents collection to generate a set of intermediate key-value pairs. The intermediate key-value pairs produced by the Map function are buffered in memory.

The Reduce function is not compulsory. An index may have zero or only one Reduce function. The Reduce function reads all intermediate key-value pairs generated by the Map(s) function(s) and aggregates associated values with the same intermediate key. After successful completion, the output of the Reduce function execution is available to the caller...

RavenDB dynamic indexes


When you make a query to RavenDB, the RavenDB query optimizer will search first for indexes matching that query before performing it. In case there is no matching index found, RavenDB automatically creates a temporary index for this query. When a query is performed often, it will optimize itself based on the actual requests coming in, and can decide to promote a temporary index to a permanent one.

Note

Dynamic indexes are Map/Reduce indexes. They have no reduction function. They are just mapping functions, which allow RavenDB to answer queries by knowing how to traverse the document.

Querying dynamic indexes

Dynamic indexes are created automatically on the fly by RavenDB. When querying the server if there are no matching indexes for this query, RavenDB will create a new temporary index and will use it to query the data on the server.

Time for action – querying a dynamic index


We will query the World database and retrieve all Countries for which the Area field is greater than or equal to 1000000. Before creating this query, you will import the Countries.csv file into the World database. After that, you will visualize how RavenDB will perform this task and you will look at the RavenDB logs in the prompt window.

  1. In Management Studio, import the Countries.csv file into the World database.

  2. Create a new Visual Studio project, name it RavenDB_Ch04.

  3. Add a new class, name it City and complete it as follows:

  4. Add a new class named Country and make it look as follows:

  5. Add the RavenDB DocumentStore initialization to the Main() method using the following code snippet:

    Note

    The World database has been created in Chapter 2, RavenDB Management Studio.

  6. Add this query to the Main() method in the Program class using the following code snippet:

  7. Save all the files and build the solution.

  8. Open the RavenDB installation folder and launch RavenDB server...

Time for action – querying a temporary index


To illustrate how RavenDB uses existing temporary indexes, we will recall the Query<Country>() method we created in the previous section. Also we will create another query using the same parameter, which will use the same temporary index. Then you will analyze the RavenDB logs in the RavenDB prompt window and visualize the Management Studio Indexes screen.

  1. Open RavenDB Management Studio, select the Indexes tab of the World database and ensure that Temp/Countries/ByArea_RangeSortByArea index exists. If not, follow all the steps of the previous section to create the temporary index.

  2. Open the RavenDB_Ch04 solution.

  3. Add the following code snippet to the Main() method in the Program class:

  4. Save all the files, build and run the solution.

  5. Open the RavenDB prompt window and analyze the RavenDB logs.

  6. In Management Studio, select the World database and click on the Indexes tab to display the Indexes screen and verify that there are no new temporary indexes...

Time for action – managing temporary indexes


RavenDB allows you to manage temporary indexes using the server configuration file option. Also, RavenDB will optimize itself by deleting temporary indexes if they have not been used for a given time, or will promote them to permanent indexes if they have been used enough.

These following steps summarize the temporary index management process in RavenDB:

  1. RavenDB looks for appropriate index to use in query.

  2. If found, it will return the most appropriate index.

  3. If not found, it will create an index that will deal with the query.

  4. Return that index as temporary.

  5. If that index is used enough, promote it into an Auto index

    Note

    Temporary indexes behavior is controlled by these configuration settings: Raven/TempIndexPromotionThreshold and Raven/TempIndexPromotionMinimumQueryCount.

    By default, the number of times a temporary index has to be queried before becoming a permanent index is 100. You can change these settings by changing the value of the Raven/TempIndexPromotionMinimumQueryCount...

RavenDB static indexes


RavenDB allows user to manually create and use indexes. These indexes explicitly created are called static indexes or named indexes. A static index allows the use of one or more Map functions. It may include a Reduce function and/or a Transform function. These functions will specify what to retrieve from the server and will be defined using the regular Linq expressions.

Static indexes are more efficient than dynamic indexes. Since dynamic indexes are created on the fly on first user query and are created as temporary indexes, this might be a performance issue on first run. Also, static indexes expose more functionality such as custom sorting, boosting, Full Text Search, Live Projections, spatial search support, and more.

So far we have created some queries so far to retrieve data from the RavenDB server using Linq expression. This can be used the same way to sort or aggregate data and to query specific fields in a document. When using indexes to aggregate data, there...

Time for action – defining a Map function for an index


We will create a new static index and add it to the World database (created in Chapter 2, RavenDB Management Studio). You will create this index using the PutIndex() method. Then you will analyze the RavenDB logs and open the index in Management Studio to view it and execute it.

  1. Open the RavenDB_Ch04 solution in Visual Studio.

  2. Add the following code to the Main() method to create the static index:

  3. Add the following code to the Main() method to query the Cities/CountryCode index:

  4. Save all the files, build and run RavenDB_Ch04.

  5. Select the RavenDB prompt window in Windows Explorer and analyze the RavenDB logs to understand how RavenDB created the index.

  6. In Management Studio, select the World database, click on the Indexes tab and open the Cities/CountryCode index in edit mode and look at the Map function code.

  7. In Management Studio, execute the Cities/CountryCode index and observe the result.

What just happened?

You just created your first static...

Time for action – adding a Reduce function to an index


We will create a new static index and will define its Map and Reduce functions. You will add this index to the World database using the PutIndex() method. Then you will modify the RavenDB_Ch04 project and add a new call to the PutIndex() method. After that you will open the index in Management Studio to view it and execute it.

  1. Open the RavenDB_Ch04 solution in Visual Studio.

  2. Add the following code snippet to the Main() method to create the index:

    Tip

    While writing the Map and Reduce functions, press Ctrl + Space to display all methods you can set.

  3. Add the following code snippet to the Main() method to query the index:

  4. Save all the files, build and run RavenDB_Ch04.

  5. In Management Studio, open the Cities/CountryPopulation index in edit mode and look at its Map and Reduce functions code.

  6. In Management Studio, execute the Cities/CountryPopulation index and observe the result.

What just happened?

You created the Cities/CountryPopulation static index...

Time for action – adding a TransformResults to the index


We will modify the CountryPopulation index and will define its Map, Reduce, and TransformResults functions. This new index version will aggregate the Population for each Country and will transform the query result to a new format the shape of which is the same as the CountryPopulation class which you will also create. Then you will open the index in Management Studio to execute it and view its result.

  1. Open the RavenDB_Ch04 project in Visual Studio.

  2. Add a new class to the RavenDB_Ch04 project, name it CountryPopulation and make it look like the following code snippet:

  3. Add the new CountryPopulation index definition to the Main() method using the following code:

  4. Add the following code to the Main() method to query the Cities/CountryPopulation index.

  5. Save all the files, build and run the solution.

  6. In Management Studio, execute the Cities/CountryPopulation index and observe the result when the Skip Transform option is checked and when it is not...

RavenDB stale indexes


RavenDB indexes can be stale. They are eventually consistent and eventually here usually means in under a second. When you query RavenDB to retrieve some data, it will return the data whether or not it has finished indexing this data in the background. RavenDB will let the user know if query results are stale, and can also be told to wait until non-stale results are available, this allows introducing new indexes on the fly. Live index rebuilds is a rare feature.

Note

Waiting for a non-stale index is not a recommended practice for production systems.

In RavenDB whenever new data is inserted or updated, a background process will perform data indexing. This might be useful to improve the server response time but in this case you may query stale indexes. In a lot of situations, a stale index isn't a problem, and as expressed on the RavenDB site:

Better stale than offline.

When you call the SaveChanges() method on the session object to persist changes on some objects, and try...

Time for action – checking for stale index results


You will add a new code snippet to the Main() method of the RavenDB_Ch04 to check if an index result is stale or not.

  1. Open the RavenDB_Ch04 project in Visual Studio.

  2. Add the following code to the Main() method:

  3. Save all the files, build and run the solution.

  4. Check the output window for the stale status of the index result.

What just happened?

You added the necessary code to the Main() method to check stale index result.

In order to perform stale index result checking, you first declare a new RavenQueryStatistics variable that will hold the query statistics information about the query and the index such as the IndexName, and the TotalResults which indicates the total query results documents count (line 190).

Then, we query the index using the Query() method on the Session instance object and specify the index name to check (line 191). To get back the query statistics, we call the Statistics() method (lines 192).

In this query you don't need to retrieve...

Time for action – explicitly waiting for a non-stale index result


You will add a new code snippet to the Main() method of the RavenDB_Ch04 project to tell RavenDB to wait for a non-stale result.

  1. Open the RavenDB_Ch04 project in Visual Studio.

  2. Add the following code to the Main() method:

  3. Save all the files, build and run the solution.

What just happened?

You added the necessary code to the Main() method to instruct RavenDB to explicitly wait for a non-stale index result.

In order to do that, you call the Customize() method on the Query object and call the WaitForNonStaleResultsAsOfNow() function within a lambda expression. This function takes a TimeSpan parameter to specify the time-out waiting delay. In this code snippet, we had specified 5 seconds as the time-out delay.

Note

Waiting for a non-stale index result is for use only for testing and learning purposes. It is strongly discouraged in a production environment.

Have a go hero – display all index names

Add a new method to the Program class...

Summary


In this chapter, we have learned about RavenDB indexes, how they work, and their different types. We specifically covered RavenDB's dynamic and static indexes and how to query each type of these indexes using Linq.

Afterward, we continued to discover how RavenDB uses Map/Reduce in static indexes, and how you can best implement it to take advantage of this programming model.

Throughout this chapter, we manually created indexes using the .NET Client API and implemented Map/Reduce/TransformResults functions. Then we finished with a sample method to learn how to manage stale indexes.

In the next chapter, we will put our newly learned skills to work and use them to learn another preferred and recommended way to create static indexes in RavenDB. Keep reading!

lock icon
The rest of the chapter is locked
You have been reading a chapter from
RavenDB 2.x Beginner's Guide
Published in: Sep 2013Publisher: PacktISBN-13: 9781783283798
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Khaled Tannir

Khaled Tannir has been working with computers since 1980. He began programming with the legendary Sinclair Zx81 and later with Commodore home computer products (Vic 20, Commodore 64, Commodore 128D, and Amiga 500). He has a Bachelor's degree in Electronics, a Master's degree in System Information Architectures, in which he graduated with a professional thesis, and completed his education with a Master of Research degree. He is a Microsoft Certified Solution Developer (MCSD) and has more than 20 years of technical experience leading the development and implementation of software solutions and giving technical presentations. He now works as an independent IT consultant and has worked as an infrastructure engineer, senior developer, and enterprise/solution architect for many companies in France and Canada. With significant experience in Microsoft .Net, Microsoft Server Systems, and Oracle Java technologies, he has extensive skills in online/offline applications design, system conversions, and multilingual applications in both domains: Internet and Desktops. He is always researching new technologies, learning about them, and looking for new adventures in France, North America, and the Middle-east. He owns an IT and electronics laboratory with many servers, monitors, open electronic boards such as Arduino, Netduino, RaspBerry Pi, and .Net Gadgeteer, and some smartphone devices based on Windows Phone, Android, and iOS operating systems. In 2012, he contributed to the EGC 2012 (International Complex Data Mining forum at Bordeaux University, France) and presented, in a workshop session, his work on "how to optimize data distribution in a cloud computing environment". This work aims to define an approach to optimize the use of data mining algorithms such as k-means and Apriori in a cloud computing environment. He is the author of RavenDB 2.x Beginner's Guide, Packt Publishing. He aims to get a PhD in Cloud Computing and Big Data and wants to learn more and more about these technologies. He enjoys taking landscape and night time photos, travelling, playing video games, creating funny electronic gadgets with Arduino/.Net Gadgeteer, and of course, spending time with his wife and family. You can reach him at contact@khaledtannir.net.
Read more about Khaled Tannir