About this book

Relational databases have been used for decades, and in the last few years NoSQL has been a growing choice for large-scale web applications. Non-relational databases provide the scale and speed that you may need for your application. To switch you must know the options available, the advantages and drawbacks, and scenarios which it is suited to the most and where it should be avoided at all costs.

Getting Started with NoSQL is a from-the-ground up guide that takes you from the very first steps to a real-world NoSQL application. It provides you with a step-by-step approach to design and implement a NoSQL application that will help you make clear decisions on database choices and database model choices. The book is suited for a developer, an architect, as well as a CTO.

This book is a comprehensive guide to working with NoSQL. You will learn to make key decisions, and to design and implement NoSQL applications. You will learn about NoSQL jargon, data models, and databases on the market. The case studies and comparisons presented will help you to make a decision on whether or not to use NoSQL, and if so which model and product to use. This book is an indispensable resource for you to have in your library. You will learn everything you need to know about understanding and working with NoSQL and how to implement an application with the correct NoSQL for you.

Publication date:
March 2013
Publisher
Packt
Pages
142
ISBN
9781849694988

 

Chapter 1. An Overview of NoSQL

Now that you have got this book in your hand, you must be both excited and anxious about NoSQL. In this chapter, we get a head-start on:

  • What NoSQL is

  • What NoSQL is not

  • Why NoSQL

  • A list of NoSQL databases

For over decades, relational databases have been used to store what we know as structured data. The data is sub-divided into groups, referred to as tables. The tables store well-defined units of data in terms of type, size, and other constraints. Each unit of data is known as column while each unit of the group is known as row . The columns may have relationships defined across themselves, for example parent-child, and hence the name relational databases. And because consistency is one of the critical factors, scaling horizontally is a challenging task, if not impossible.

About a decade earlier, with the rise of large web applications, research has poured into handling data at scale. One of the outputs of these researches is non-relational database, in general referred to as NoSQL database. One of the main problems that a NoSQL database solves is scale, among others.

 

Defining NoSQL


According to Wikipedia:

In computing, NoSQL (mostly interpreted as "not only SQL") is a broad class of database management systems identified by its non-adherence to the widely used relational database management system model; that is, NoSQL databases are not primarily built on tables, and as a result, generally do not use SQL for data manipulation.

The NoSQL movement began in the early years of the 21st century when the world started its deep focus on creating web-scale database. By web-scale, I mean scale to cater to hundreds of millions of users and now growing to billions of connected devices including but not limited to mobiles, smartphones, internet TV, in-car devices, and many more.

Although Wikipedia treats it as "not only SQL", NoSQL originally started off as a simple combination of two words—No and SQL—clearly and completely visible in the new term. No acronym. What it literally means is, "I do not want to use SQL". To elaborate, "I want to access database without using any SQL syntax". Why? We shall explore the in a while.

Whatever be the root phrase, NoSQL today is the term used to address to the class of databases that do not follow relational database management system (RDBMS) principles, specifically being that of ACID nature, and are specifically designed to handle the speed and scale of the likes of Google, Facebook, Yahoo, Twitter, and many more.

History

Before we take a deep dive into it, let us set our context right by exploring some key landmarks in history that led to the birth of NoSQL.

From Inktomi, probably the first true search engine, to Google, the present world leader, the computer scientists have well recognized the limitations of the traditional and widely used RDBMS specifically related to the issues of scalability, parallelization, and cost, also noting that the data set is minimally cross-referenced as compared to the chunked, transactional data, which is mostly fed to RDBMS.

Specifically, if we just take the case of Google that gets billions of requests a month across applications that may be totally unrelated in what they do but related in how they deliver, the problem of scalability is to be solved at each layer—right from data access to final delivery. Google, therefore, had to work innovatively and gave birth to a new computing ecosystem comprising of:

  • GFS: Distributed filesystem

  • Chubby: Distributed coordination system

  • MapReduce: Parallel execution system

  • Big Data: Column oriented database

These systems were initially described in papers released from 2003 to 2006 listed as follows:

These and other papers led to a spike in increased activities, specially in open source, around large scale distributed computing and some of the most amazing products were born. Some of the initial products that came up included:

 

What NoSQL is and what it is not


Now that we have a fair idea on how this side of the world evolved, let us examine at what NoSQL is and what it is not.

NoSQL is a generic term used to refer to any data store that does not follow the traditional RDBMS model—specifically, the data is non-relational and it does not use SQL as the query language. It is used to refer to the databases that attempt to solve the problems of scalability and availability against that of atomicity or consistency.

NoSQL is not a database. It is not even a type of database. In fact, it is a term used to filter out (read reject) a set of databases out of the ecosystem. There are several distinct family trees available. In Chapter 4, Advantages and Drawbacks, we explore various types of data models (or simply, database types) available under this umbrella.

Traditional RDBMS applications have focused on ACID transactions:

  • Atomicity: Everything in a transaction succeeds lest it is rolled back.

  • Consistency: A transaction cannot leave the database in an inconsistent state.

  • Isolation: One transaction cannot interfere with another.

  • Durability: A completed transaction persists, even after applications restart.

Howsoever indispensible these qualities may seem, they are quite incompatible with availability and performance on applications of web-scale. For example, if a company like Amazon were to use a system like this, imagine how slow it would be. If I proceed to buy a book and a transaction is on, it will lock a part of the database, specifically the inventory, and every other person in the world will have to wait until I complete my transaction. This just doesn’t work!

Amazon may use cached data or even unlocked records resulting in inconsistency. In an extreme case, you and I may end up buying the last copy of a book in the store with one of us finally receiving an apology mail. (Well, Amazon definitely has a much better system than this).

The point I am trying to make here is, we may have to look beyond ACID to something called BASE , coined by Eric Brewer:

  • Basic availability: Each request is guaranteed a response—successful or failed execution.

  • Soft state: The state of the system may change over time, at times without any input (for eventual consistency).

  • Eventual consistency: The database may be momentarily inconsistent but will be consistent eventually.

Eric Brewer also noted that it is impossible for a distributed computer system to provide consistency, availability and partition tolerance simultaneously. This is more commonly referred to as the CAP theorem.

Note, however, that in cases like stock exchanges or banking where transactions are critical, cached or state data will just not work. So, NoSQL is, definitely, not a solution to all the database related problems

 

Why NoSQL?


Looking at what we have explored so far, does it mean that we should look at NoSQL only when we start reaching the problems of scale? No.

NoSQL databases have a lot more to offer than just solving the problems of scale which are mentioned as follows:

  • Schemaless data representation: Almost all NoSQL implementations offer schemaless data representation. This means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time—including adding new fields or even nesting the data, for example, in case of JSON representation.

  • Development time: I have heard stories about reduced development time because one doesn’t have to deal with complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to create your final view?

  • Speed: Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability of winning users over.

  • Plan ahead for scalability: You read it right. Why fall into the ditch and then try to get out of it? Why not just plan ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden spikes of load. Of course, you win users over straightaway.

 

List of NoSQL Databases


The buzz around NoSQL still hasn’t reached its peak, at least to date. We see more offerings in the market over time. The following table is a list of some of the more mature, popular, and powerful NoSQL databases segregated by data model used:

Document

Key-Value

XML

Column

Graph

MongoDB

Redis

BaseX

BigTable

Neo4J

CouchDB

Membase

eXist

Hadoop / HBase

FlockDB

RavenDB

Voldemort

 

Cassandra

InfiniteGraph

Terrastore

MemcacheDB

 

SimpleDB

 
   

Cloudera

 

This list is by no means comprehensive, nor does it claim to be. One of the positive points about this list is that most of the databases in the list are open source and community driven.

Chapter 4, Advantages and Drawbacks, provides an in-depth study of the various popular data models used in NoSQL databases.

Chapter 6, Case Study, does an exhaustive comparison of some of these databases along various key parameters including, but not limited to, data model, language, performance, license, price, community, resources, extensibility, and many more.

 

Summary


In this chapter, we learned about the fundamentals of NoSQL—what it is all about and more critically, what it is not. We took a splash in the history to appreciate the reasons and science behind it. You are recommended to explore the web for historical events around this to take a deep dive in appreciating it.

NoSQL is not a solution to each and every application. It is worth noting that most of the products do throw away the traditional ACID nature giving way to BASE infrastructure. Having said that, some products standout—CouchDB and Neo4j, for example, are ACID compliant NoSQL databases.

Adopting NoSQL is not only a technological change but also change in mindset, behaviour and thought process meaning that if you plan to hire a developer to work with NoSQL, he/she must understand the new models.

In the next chapter, we will have a quick look at the taxonomy and jack up our vocabulary before we dive deeply into NoSQL.

About the Author

  • Gaurav Vaish

    Gaurav Vaish works as principal engineer with Yahoo! India. He works primarily in three domains – cloud, web, and devices including mobile, connected TV, and the like. His expertise lies in designing and the architecture of applications for these domains.

    Gaurav started his career in 2002 with Adobe Systems India working in their engineering solutions group.

    In 2005, he started his own company, Edujini Labs, focusing on corporate training and collaborative learning.

    He holds a B. Tech. in Electrical Engineering with a specialization in Speech Signal Processing from IIT Kanpur.

    He runs his personal blog at http://www.mastergaurav.com and http://www.m10v.com.

    Browse publications by this author
Book Title
Access this book, plus 7,500 other titles for FREE
Access now