CouchDB and PHP Web Development Beginner's Guide

2 (1 reviews total)
By Tim Juravich
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Introduction to CouchDB

About this book

CouchDB is a NoSQL database which is making waves in the development world. It’s the tool of choice for many PHP developers so they need to understand the robust features of CouchDB and the tools that are available to them.

CouchDB and PHP Web Development Beginner’s Guide will teach you the basics and fundamentals of using CouchDB within a project. You will learn how to build an application from beginning to end, learning the difference between the “quick way” to do things, and the “right way” by looking through a variety of code examples and real world scenarios.

You will start with a walkthrough of setting up a sound development environment and then learn to create a variety of documents manually and programmatically. You will also learn how to manage their source control with Git and keep track of their progress. With each new concept, such as adding users and posts to your application, the author will take you through code step-by-step and explain how to use CouchDB’s robust features. Finally, you will learn how to easily deploy your application and how to use simple replication to scale your application.

Publication date:
June 2012
Publisher
Packt
Pages
304
ISBN
9781849513586

 

Chapter 1. Introduction to CouchDB

Welcome to CouchDB and PHP Web Development Beginner's Guide. In this book, we will learn the ins and outs of building a simple but powerful website using CouchDB and PHP. For you to understand why we do certain things in CouchDB, it's first important for you to understand the history of NoSQL databases and learn CouchDB's place in database history.

In this chapter we will:

  • Go over a brief history of databases and their place in technology

  • Talk about how databases evolved into the concept of NoSQL

  • Define NoSQL databases by understanding different classifications of NoSQL databases, the CAP theorem and its avoidance of the ACID model

  • Look at the history of CouchDB and its main contributors

  • Talk about what makes CouchDB special

Let's start by looking at the evolution of databases and how NoSQL arrived on the scene.

The NoSQL database evolution

In the early 1960s, the term database was introduced to the world as a simple layer that would serve as the backbone behind information systems. The simple concept of separating applications from data was new and exciting, and it opened up possibilities for applications to become more robust. At this point, databases existed first as tape-based devices, but soon became more usable as system direct-access storage on disks.

In 1970, Edgar Codd proposed a more efficient way of storing data — the relational model. This model would also use SQL to allow the applications to find the data stored within its tables. This relational model is nearly identical to what we know as traditional relational databases today. While this model was widely accepted, it wasn't until the mid 1980s that there was hardware that could actually make effective use of it. By 1990, hardware finally caught up, and the relational model became the dominant method for storing data.

Just as in any area of technology, competition arose with Relational Database Management Systems (RDBMS) . Some examples of popular RDMBS systems are Oracle, Microsoft SQL Server, MySQL, and PostgreSQL.

As we moved past the year 2000, applications began to produce incredible amounts of data through more complex applications. Social networks entered the scene. Companies wanted to make sense of the vast amounts of data that were available. This shift brought up some serious concerns about the datastructure, scalability, and availability of data that the relational model didn't seem to handle. With the uncertainty of how to manage this large amount of ever-changing data, the term NoSQL emerged.

The term NoSQL isn't short for "no SQL;" it actually stands for "not only SQL". NoSQL databases are a group of persistent solutions, which do not follow the relational model and do not use SQL for querying. On top of that, NoSQL wasn't introduced to replace relational databases. It was introduced to complement relational databases where they fell short.

What makes NoSQL different

Besides the fact that NoSQL databases do not use SQL to query data, there are a few key characteristics of NoSQL databases. In order to understand these characteristics, we'll need to cover a lot of terminology and definitions. It's not important that you memorize or remember everything here, but it's important for you to know exactly what makes up a NoSQL database.

The first thing that makes NoSQL databases different is their data structure. There are a variety of different ways in which NoSQL databases are classified.

Classification of NoSQL databases

NoSQL databases (for the most part) fit into four main data structures:

  • Key-value stores: They save data with a unique key and a value. Their simplicity allow them to be incredibly fast and scale to enormous sizes.

  • Column stores: They are similar to relational databases, but instead of storing records, they store all of the values for a column together in a stream.

  • Document stores: They save data without it being structured in a schema, with buckets of key-value pairs inside a self-contained object. This datastructure is reminiscent of an associative array in PHP. This is where CouchDB lands on the playing field. We'll go much deeper into this topic in Chapter 3, Getting Started with CouchDB and Futon.

  • Graph databases: They store data in a flexible graph model that contains a node for each object. Nodes have properties and relationships to other nodes.

We won't go too deeply into examples of each of these types of databases, but it's important to look at the different options that are out there. By looking at databases at this level, it's relatively easy for us to see (in general) how the data will scale to size and complexity, by looking at the following screenshot:

If you look at this diagram, you'll see that I've placed a Typical Relational Database with a crude performance line. This performance line gives you a simple idea of how a database might scale in size and complexity. How is it possible that NoSQL databases perform so much better in regards to high size and complexity of data?

For the most part, NoSQL databases are scalable because they rely on distributed systems and ignore the ACID model. Let's talk through what we gain and what we give up through a distributed system, and then define the ACID model.

When talking about any distributed system (not just storage or databases), there is a concept that defines the limitations of what you can do. This is known as the CAP theorem.

CAP theorem

Eric Brewer introduced the CAP theorem in the year 2000. It states that in any distributed environment, it is impossible for it to provide three guarantees.

  • Consistency: All the servers in the system will have the same data. So, anyone using the system will get the latest data, regardless of which node they talk to in the distributed system.

  • Availability: All of the servers will always return data.

  • Partition-tolerance: The system continues to operate as a whole, even if an individual server fails or cannot be reached.

By looking at these choices, you can tell that it would definitely be ideal to have all three of these things guaranteed, but it's theoretically impossible. In the real world, each NoSQL database picks two of the three options, and usually develops some kind of process to mitigate the impact of the third, unhandled property.

We'll talk about which approach CouchDB takes shortly, but there is still a bit to learn about another concept that NoSQL databases avoid: ACID.

ACID

ACID is a set of properties that apply to database transactions, which are the core of traditional relational databases. While transactions are incredibly powerful, they are also one of the things that make reading and writing quite a bit slower in relational databases.

ACID is made up of four main properties:

  • Atomicity: This is an all or nothing approach to dealing with data. Everything in the transaction must happen successfully, or none of the changes are committed. This is a key property whenever money or currency is handled in a system, and requires a system of checks and balances.

  • Consistency: Data will only be accepted if it passes all of the validation in place on the database, such as triggers, data types, and constraints.

  • Isolation: Transactions will not affect other transactions that are occurring, and other users won't see partial results of a transaction in progress.

  • Durability: Once the data is saved, it is safe against errors, crashes, and other software malfunctions.

Again, as you read through the definition of ACID, you are probably thinking to yourself, "These are all must haves!" That may be the case, but keep in mind that most NoSQL databases do not fully employ ACID, because it's near impossible to have all of these restrictions and still have blazing fast writes to data.

So what does all of that mean?

I've given you a lot of definitions now, but let's try to wrap it together into a few simple lists. Let's talk through the advantages and disadvantages of NoSQL databases, when to use, and when to avoid NoSQL databases.

Advantages of NoSQL databases

With the introduction of NoSQL databases, there are lot of advantages:

  • You can do things that simply weren't possible with the processing and query power of traditional relational databases.

  • Your data is scalable and flexible, allowing it to scale to size and complexity faster, right out of the box.

  • There are new data models to consider. You don't have to force your data into a relational model if it doesn't make sense.

  • Writing data is blazing fast.

As you can see, there are some clear advantages of NoSQL databases, but as I mentioned before, there are still some negatives that we need to consider.

Negatives of NoSQL databases

However, along with the good, there's also some bad:

  • There are no common standards; each database does things just a little bit differently

  • Querying data does not involve the familiar SQL model to find records

  • NoSQL databases are still relatively immature and constantly evolving

  • There are new data models to consider; sometimes it can be confusing to make your data fit

  • Because a NoSQL database avoids the ACID model, there is no guarantee that all of your data will be successfully written

Some of those negatives may be pretty easy for you to stomach, except for NoSQL's avoidance of the ACID model.

When you should use NoSQL databases

Now that we have a good take on the advantages and disadvantages, let's talk about some great use cases for using NoSQL databases:

  • Applications that have a lot of writing

  • Applications where the schema and structure of the data might change

  • Large amount of unstructured or semi-structured data

  • Traditional relational databases feel restricting, and you want to try something new.

That list isn't exclusive, but there are no clear definitions on when you can use NoSQL databases. Really, you can use them for just about every project.

When you should avoid NoSQL databases

There are, however, some pretty clear areas that you should avoid when storing data in NoSQL.

  • Anything involving money or transactions. What happens if one record doesn't save correctly because of NoSQL avoidance of the ACID model or the data isn't 100 percent available because of the distributed system?

  • Business critical data or line of business applications, where missing one row of data could mean huge problems.

  • Heavily-structured data requiring functionality in a relational database.

For all of these use cases, you should really focus on using relational databases that will make sure that your data is safe and sound. Of course, you can always include NoSQL databases where it makes sense.

When choosing a database, it's important to remember that "There is no silver bullet." This phrase is used a lot when talking about technology, and it means that there is no one technology that will solve all of your problems without having any side effects or negative consequences. So choose wisely!

 

The NoSQL database evolution


In the early 1960s, the term database was introduced to the world as a simple layer that would serve as the backbone behind information systems. The simple concept of separating applications from data was new and exciting, and it opened up possibilities for applications to become more robust. At this point, databases existed first as tape-based devices, but soon became more usable as system direct-access storage on disks.

In 1970, Edgar Codd proposed a more efficient way of storing data — the relational model. This model would also use SQL to allow the applications to find the data stored within its tables. This relational model is nearly identical to what we know as traditional relational databases today. While this model was widely accepted, it wasn't until the mid 1980s that there was hardware that could actually make effective use of it. By 1990, hardware finally caught up, and the relational model became the dominant method for storing data.

Just as in any area of technology, competition arose with Relational Database Management Systems (RDBMS) . Some examples of popular RDMBS systems are Oracle, Microsoft SQL Server, MySQL, and PostgreSQL.

As we moved past the year 2000, applications began to produce incredible amounts of data through more complex applications. Social networks entered the scene. Companies wanted to make sense of the vast amounts of data that were available. This shift brought up some serious concerns about the datastructure, scalability, and availability of data that the relational model didn't seem to handle. With the uncertainty of how to manage this large amount of ever-changing data, the term NoSQL emerged.

The term NoSQL isn't short for "no SQL;" it actually stands for "not only SQL". NoSQL databases are a group of persistent solutions, which do not follow the relational model and do not use SQL for querying. On top of that, NoSQL wasn't introduced to replace relational databases. It was introduced to complement relational databases where they fell short.

What makes NoSQL different

Besides the fact that NoSQL databases do not use SQL to query data, there are a few key characteristics of NoSQL databases. In order to understand these characteristics, we'll need to cover a lot of terminology and definitions. It's not important that you memorize or remember everything here, but it's important for you to know exactly what makes up a NoSQL database.

The first thing that makes NoSQL databases different is their data structure. There are a variety of different ways in which NoSQL databases are classified.

Classification of NoSQL databases

NoSQL databases (for the most part) fit into four main data structures:

  • Key-value stores: They save data with a unique key and a value. Their simplicity allow them to be incredibly fast and scale to enormous sizes.

  • Column stores: They are similar to relational databases, but instead of storing records, they store all of the values for a column together in a stream.

  • Document stores: They save data without it being structured in a schema, with buckets of key-value pairs inside a self-contained object. This datastructure is reminiscent of an associative array in PHP. This is where CouchDB lands on the playing field. We'll go much deeper into this topic in Chapter 3, Getting Started with CouchDB and Futon.

  • Graph databases: They store data in a flexible graph model that contains a node for each object. Nodes have properties and relationships to other nodes.

We won't go too deeply into examples of each of these types of databases, but it's important to look at the different options that are out there. By looking at databases at this level, it's relatively easy for us to see (in general) how the data will scale to size and complexity, by looking at the following screenshot:

If you look at this diagram, you'll see that I've placed a Typical Relational Database with a crude performance line. This performance line gives you a simple idea of how a database might scale in size and complexity. How is it possible that NoSQL databases perform so much better in regards to high size and complexity of data?

For the most part, NoSQL databases are scalable because they rely on distributed systems and ignore the ACID model. Let's talk through what we gain and what we give up through a distributed system, and then define the ACID model.

When talking about any distributed system (not just storage or databases), there is a concept that defines the limitations of what you can do. This is known as the CAP theorem.

CAP theorem

Eric Brewer introduced the CAP theorem in the year 2000. It states that in any distributed environment, it is impossible for it to provide three guarantees.

  • Consistency: All the servers in the system will have the same data. So, anyone using the system will get the latest data, regardless of which node they talk to in the distributed system.

  • Availability: All of the servers will always return data.

  • Partition-tolerance: The system continues to operate as a whole, even if an individual server fails or cannot be reached.

By looking at these choices, you can tell that it would definitely be ideal to have all three of these things guaranteed, but it's theoretically impossible. In the real world, each NoSQL database picks two of the three options, and usually develops some kind of process to mitigate the impact of the third, unhandled property.

We'll talk about which approach CouchDB takes shortly, but there is still a bit to learn about another concept that NoSQL databases avoid: ACID.

ACID

ACID is a set of properties that apply to database transactions, which are the core of traditional relational databases. While transactions are incredibly powerful, they are also one of the things that make reading and writing quite a bit slower in relational databases.

ACID is made up of four main properties:

  • Atomicity: This is an all or nothing approach to dealing with data. Everything in the transaction must happen successfully, or none of the changes are committed. This is a key property whenever money or currency is handled in a system, and requires a system of checks and balances.

  • Consistency: Data will only be accepted if it passes all of the validation in place on the database, such as triggers, data types, and constraints.

  • Isolation: Transactions will not affect other transactions that are occurring, and other users won't see partial results of a transaction in progress.

  • Durability: Once the data is saved, it is safe against errors, crashes, and other software malfunctions.

Again, as you read through the definition of ACID, you are probably thinking to yourself, "These are all must haves!" That may be the case, but keep in mind that most NoSQL databases do not fully employ ACID, because it's near impossible to have all of these restrictions and still have blazing fast writes to data.

So what does all of that mean?

I've given you a lot of definitions now, but let's try to wrap it together into a few simple lists. Let's talk through the advantages and disadvantages of NoSQL databases, when to use, and when to avoid NoSQL databases.

Advantages of NoSQL databases

With the introduction of NoSQL databases, there are lot of advantages:

  • You can do things that simply weren't possible with the processing and query power of traditional relational databases.

  • Your data is scalable and flexible, allowing it to scale to size and complexity faster, right out of the box.

  • There are new data models to consider. You don't have to force your data into a relational model if it doesn't make sense.

  • Writing data is blazing fast.

As you can see, there are some clear advantages of NoSQL databases, but as I mentioned before, there are still some negatives that we need to consider.

Negatives of NoSQL databases

However, along with the good, there's also some bad:

  • There are no common standards; each database does things just a little bit differently

  • Querying data does not involve the familiar SQL model to find records

  • NoSQL databases are still relatively immature and constantly evolving

  • There are new data models to consider; sometimes it can be confusing to make your data fit

  • Because a NoSQL database avoids the ACID model, there is no guarantee that all of your data will be successfully written

Some of those negatives may be pretty easy for you to stomach, except for NoSQL's avoidance of the ACID model.

When you should use NoSQL databases

Now that we have a good take on the advantages and disadvantages, let's talk about some great use cases for using NoSQL databases:

  • Applications that have a lot of writing

  • Applications where the schema and structure of the data might change

  • Large amount of unstructured or semi-structured data

  • Traditional relational databases feel restricting, and you want to try something new.

That list isn't exclusive, but there are no clear definitions on when you can use NoSQL databases. Really, you can use them for just about every project.

When you should avoid NoSQL databases

There are, however, some pretty clear areas that you should avoid when storing data in NoSQL.

  • Anything involving money or transactions. What happens if one record doesn't save correctly because of NoSQL avoidance of the ACID model or the data isn't 100 percent available because of the distributed system?

  • Business critical data or line of business applications, where missing one row of data could mean huge problems.

  • Heavily-structured data requiring functionality in a relational database.

For all of these use cases, you should really focus on using relational databases that will make sure that your data is safe and sound. Of course, you can always include NoSQL databases where it makes sense.

When choosing a database, it's important to remember that "There is no silver bullet." This phrase is used a lot when talking about technology, and it means that there is no one technology that will solve all of your problems without having any side effects or negative consequences. So choose wisely!

 

Introduction to CouchDB


For this book and for a variety of my own projects and startups, I chose CouchDB. Let's take a historical look at CouchDB, then quickly touch on its approach to the CAP theorem, and its strengths and weaknesses.

The history of CouchDB

In April 2005, Damien Katz posted a blog entry about a new database engine he was working on, later to be called CouchDB, which is an acronym for Cluster Of Unreliable Commodity Hardware. Katz, a former Lotus Notes developer at IBM, was attempting to create a fault-tolerant document database in C++, but soon after, shifted to the Erlang OTP platform. As months went by, CouchDB started to evolve under the self-funding of Damien Katz, and in February 2008, it was introduced to the Apache Incubator project. Finally, in November 2008, it graduated as a top-level project.

Damien's team, CouchOne, merged with the Membase team in 2011 to form a new company called Couchbase. This company was formed to merge CouchDB and Membase into a new product, and increase the documentation and visibility for the product.

In early 2012, Couchbase announced that it would be shifting focus from facilitating CouchDB and moving to create Couchbase Server 2.0. This new database takes a different approach to the database, which meant that it would not be contributing to the CouchDB community anymore. This news was met with some distress in the CouchDB community until Cloudant stepped in.

Cloudant, the chief CouchDB hosting company and creator of BigCouch, a fault tolerant and horizontally scalable clustering frameworking built for CouchDB, announced that they would merge their changes back to CouchDB, and take on the role of continuing development of CouchDB.

In early 2012, at the time of writing, CouchDB's most major release was 1.1.1 in March 31, 2011. But CouchDB 1.2 is looking to be released just around the corner!

Defining CouchDB

According to http://couchdb.apache.org/, CouchDB can be defined as:

  • A document database server, accessible via a RESTful JSON API

  • Ad-hoc and schema-free with a flat address space

  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management

  • Query-able and index-able, featuring a table oriented reporting engine that uses JavaScript as a query language.

You might be able to read between the lines, but CouchDB chose availability and partial-tolerance from the CAP theorem, and focuses on eventual consistency using replication.

We could go really deep into what each of these bullet points mean, because it will take the rest of the book until we've touched on them in depth. In each chapter, we'll begin to build on top of our CouchDB knowledge until we have a fully operational application in the wild.

 

Summary


I hope you enjoyed this chapter and are ready to take a deep dive into really learning the ins and outs of CouchDB. Let's recap everything we learned in this chapter.

  • We talked about the history of databases and the emergence of NoSQL databases

  • We defined the advantages and disadvantages of using NoSQL

  • We looked at the definition and history of CouchDB

That's it for the history lesson. Fire up your computer. In the next chapter, we'll set everything up to develop web applications with CouchDB and PHP, and make sure that it's all set up correctly.

About the Author

  • Tim Juravich

    Tim Juravich is an experienced product, program, and technology leader that has spent nearly the past decade leading teams through a variety of projects in PHP, Ruby, and .NET. After gaining experience at several Fortune 500 companies, Tim discovered entrepreneurship, founded three of his own startups, and has helped dozens of other startups open their doors. Tim currently serves as the Director of Program Management for Thinktiv, a venture accelerator.

    Browse publications by this author

Latest Reviews

(1 reviews total)
I did not find the code 100% correct in the latter parts of the book, but the code was fairly complicated and it may have been just my ignorance.
CouchDB and PHP Web Development Beginner's Guide
Unlock this book and the full library for FREE
Start free trial