RavenDB High Performance

By Brian Ritchie
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

RavenDB is an exciting technology that challenges developers to reconsider their old ways of thinking about databases. In this day and age, Internet-scale applications require this fresh perspective. RavenDB High Performance moves beyond the basics and guides you through building scalable applications using the rich features and extensibility of RavenDB.

RavenDB High Performance cuts through the noise and focuses on the key information you need to build scalable applications on the RavenDB document database. The book discusses every aspect of building a high performance system, from modeling your data to deploying it in a clustered environment. Examples are provided to make this information easy to apply to your specific application scenario.

Beginning with the NoSQL movement, RavenDB High Performance delves into the forces pushing developers beyond the traditional relational database solutions. From there, the book focuses on the design and development of web-based applications on RavenDB. It gives clear advice and examples to guide the reader through this new and exciting technology. Data modeling through documents is discussed in detail. This understanding is critical for building clean code and scalable applications. Once this foundation is established, the author focuses on key APIs that optimize data access and give end users great experiences. Scaling out and high availability techniques are also discussed in detail.

RavenDB High Performance brings together the resources you need for building scalable applications on RavenDB in an easy to understand and use format. Advice, diagrams, and code will help you quickly understand the concepts that you will apply to your next application.

Publication date:
August 2013


Chapter 1. A Different Kind of Database

When most people talk of a database, they mean a relational database. Relational databases have been the foundation of enterprise application for the past 30 years. First defined in June 1970 by Edgar Codd of IBM's San Jose Research Laboratory, relational databases store data in now familiar tables made up of rows and columns.

Relational databases have served us well for many years, so why do we need a different kind of database? Most developers have experience of building applications with relational databases and access to great tooling. However, relational databases do have their limits. As our systems grow, it becomes more difficult and expensive to scale a traditional relational database.

To understand how we got here, let's take a quick trip back into the recent past. Relational databases were created when big iron ruled the world. These centralized mainframes provided the foundation of the first relational database systems. As we moved into the client/server era, these databases moved onto lower priced servers. But fundamentally, they are still running on one central machine.


Explosive growth

Relational databases worked well when systems were serving hundreds or even thousands of users, but the Internet has changed all of that. The number of users and volume of data is growing exponentially. A variety of social applications have proved that applications can quickly attract millions of users. Relational databases were never built to handle this level of concurrent access.


Semi-structured data

In addition to the staggering growth, data is no longer simple rows and columns. Semi-structured data is everywhere. Extensible Markup Language (XML) and JavaScript Object Notation (JSON) are the lingua franca of our distributed applications. These formats allow complex relationships to be modeled through hierarchy and nesting. Relational databases struggle to effectively represent these data patterns. Due to this impedance mismatch, our applications are littered with additional complexity. Object relational mapping (ORM) tools have helped but not solved this problem.

With the growth of Software as a Service (SaaS) and cloud-based applications, the need for flexible schemas has increased. Each tenant is hosted on a unified infrastructure but they must retain the flexibility to customize their data model to meet their unique business needs. In these multi-tenant environments, a rigid schema structure imposed by a relational database does not work.


Architecture changes

While data is still king, how we architect our data-dependent systems has changed significantly over the past few decades. In many systems, the database acted as the integration point for different parts of the application. This required the data to be stored in a uniform way since the database was acting as a form of API. The following diagram shows the architectural transitions:

With the move to Service Oriented Architectures (SOA), how data is stored for a given component has become less important. The application interfaces with the service, not the database. The application has a dependency on the service contract, not on the database schema. This shift has opened up the possibilities to store data based on the needs of the service.


Rethinking the database

The factors we have been discussing have led many in our industry to rethink the idea of a database. Engineers wrestled with the limitations of the relational database and set out to build modern web-scale databases. The term NoSQL was coined to label this category of databases. Originally, the term stood for No SQL but has evolved to mean Not Only SQL. To confuse matters further, some NoSQL databases support a form of the SQL dialect. However, in all cases they are not relational databases.

While the NoSQL landscape continues to expand with more projects and companies getting in the action, there are four basic categories that databases fall into:

  • Document (CouchDB, MongoDB, RavenDB)

  • Graph (Neo4J, Sones)

  • Key/Value (Cassandra, SimpleDB, Dynamo, Voldemort)

  • Tabular/Wide Column (BigTable, Apache Hbase)

Document databases

Document databases are made up of semi-structure and schema free data structures known as documents. In this case, the term document is not speaking of a PDF or Word document. Rather, it refers to a rich data structure that can represent related data from the simple to the complex. In document databases, documents are usually represented in JavaScript Object Notation (JSON). A document can contain any number of fields of any length. Fields can also contain multiple pieces of data. Each document is independent and contains all of the data elements required by the entity.

The following is an example of a simple document:

  Name: "Alexander Graham Bell", 
  BornIn: "Edinburgh, United Kingdom", 
  Spouse: "Mabel Gardiner Hubbard"

And the following is an example of a more complex document:

  Name: "Galileo Galilei", 
  BornIn: "Pisa, Italy", 
  YearBorn: "1564",
  Children: [	
{ Name: "Virginia", YearBorn: "1600" },
{ Name: "Vincenzo", YearBorn: "1606" }

Since documents are JSON-based, the impedance mismatch that exists between the object-oriented and relational database worlds is gone. An object graph is simply serialized into JSON for storage. Now, the complexity of the entity has a small impact on the performance. Entire object graphs can be read and written in one database operation. There is no need to perform a series of select statements or create complex stored procedures to read the related objects.

JSON documents also add flexibility due to their schema free design. This allows for evolving systems without forcing the existing data to be restructured. The schema free nature simplifies data structure evolution and customization. However, care must be given to the evolving data structure. If the evolution is a breaking change, documents must be migrated or additional intelligence needs to be built into the application.


A document database for the .NET platform

Prior to RavenDB, document databases such as CouchDB treated .NET as an afterthought. In 2010, Oren Eini from Hibernating Rhinos decided to bring a powerful document database to the .NET ecosystem. According to his blog:

Raven is an OSS (with a commercial option) document database for the .NET/Windows platform. While there are other document databases around, such as CouchDB or MongoDB, there really isn't anything that a .NET developer can pick up and use without a significant amount of friction. Those projects are excellent in what they do, but they aren't targeting the .NET ecosystem.

RavenDB is built to be a first-class citizen on the .NET platform offering developers the ability to easily extend and embed the database in their applications. A few of the key features that make RavenDB compelling to .NET developers are as follows:

  • RavenDB comes with a fully functional .NET client API, which implements unit of work, change tracking, read and write optimizations, and much more. It also has a REST-based API, so you can access it via the JavaScript directly.

  • It allows developers to define indexes using LINQ (Language Integrated Queries). Supports map/reduce operations on top of your documents using LINQ.

  • It supports System.Transactions and can take part in distributed transactions.

  • The server can be easily extended by adding a custom .NET assembly.

RavenDB architecture

RavenDB leverages existing storage infrastructure called ESENT that is known to scale to amazing sizes. ESENT is the storage engine utilized by Microsoft Exchange and Active Directory. The storage engine provides the transactional data store for the documents. RavenDB also utilizes another proven technology called Lucene.NET for its high-speed indexing engine. Lucene.NET is an open source Apache project used to power applications such as AutoDesk, StackOverflow, Orchard, Umbraco, and many more.

The following diagram shows the primary components of the RavenDB architecture:

Storing documents

When a document is inserted or updated, RavenDB performs the following:

  1. A document change comes in and is stored in ESENT. Documents are immediately available to load by ID, but won't appear in searches until they are indexed.

  2. Asynchronous indexing task takes work from the queue and updates the Lucene index. The index can be created manually or dynamically based on the queries executed by the application.

  3. The document now appears in queries. Typically, index updates have an average latency of 20 milliseconds. RavenDB provides an API to wait for updates to be indexed if needed.

Searching and retrieving documents

When a document request comes in, the server is able to pull them directly from the RavenDB database when a document ID is provided. All searches and other inquiries hit the Lucene index. These methods provide near instant access, regardless of the database size.

A key difference between RavenDB and a relational database is the way index consistency is handled. A relational database ties index updates to data modifications. The insert, update, or delete only completes once the indexes have been updated. This provides users a consistent view of the data but can quickly degrade when the system is under heavy load.

RavenDB on the other hand uses a model for indexes known as eventual consistency. Indexes are updated asynchronously from the document storage. This means that the visibility of a change within an index is not always available immediately after the document is written. By queuing the indexing operation on a background thread, the system is able to continue servicing reads while the indexing operation catches up. Eventual consistency is a bit counter-intuitive. We do not want the user to view stale data. However, in a multiuser system our users view stale data all the time. Once the data is displayed on the screen, it becomes stale and may have been modified by another user.

The following diagram illustrates stale data in a multiuser system:

In many cases, this staleness does not matter. Consider a blog post. When you publish a new article, does it really matter if the article becomes visible to the entire world that nanosecond? Will users on the Internet really know if it wasn't? What typically matters is providing feedback to the user who made the change. Either let them know when the change becomes available or pausing briefly while the indexing catches up. If a user did not initiate the data change, then it is even easier. The change will simply become available when it enters the index. This provides a mechanism to give each user personal consistency. The user making the change can wait for their own changes to take affect while other users don't need to wait.

Eventual consistency is a tradeoff between application responsiveness for all users and consistency between indexes and documents. When used appropriately, this tradeoff becomes a tool for increasing the scalability of a system.



As you can see, RavenDB is truly a different kind of database. It makes fundamental changes to what we expect from a database. It requires us to approach problems from a fresh perspective. It requires us to think differently. Over the following chapters, we will explore how these design differences and unique capabilities help us build high performance applications.

About the Author

  • Brian Ritchie

    Brian Ritchie is a software architect with a track record of developing large scale enterprise systems and leading development teams through difficult scalability challenges. Brian has 20 years of experience in Software Development and is currently Chief Architect at PaySpan, Inc. which provides innovative reimbursement solutions to healthcare payers and providers. Brian is active in the development community giving presentations to local user groups and code camps. He has also contributed to Mono (the open source version of .NET) and various other open source projects. You can follow him on Twitter at http://twitter.com/brian_ritchie and read his blog at http://weblogs.asp.net/britchie.

    Browse publications by this author
Book Title
Unlock this book and the full library for FREE
Start free trial