This chapter introduces the concept of buckets in detail. It will explain how documents are stored in Couchbase and how they are maintained in a Couchbase cluster. We will explore the various types of bucket and their usage. You will also understand in detail the various parts of documents that are stored in a bucket. Besides buckets and documents, you will also understand the internal mechanisms of Couchbase, including ejection, replication, warmup, rebalancing, and so on.
You're reading from Learning Couchbase
We already came across the term bucket in the previous chapter. Now, let me explain this concept in detail, since it's the component that administrators and developers will be working with most of the time. In fact, I used to wonder why it is named "bucket". Perhaps, we can store anything in it as we do in the physical world, hence the name "bucket". In any database system, the main purpose is to store data, and the logical namespace for storing data is called a database. Likewise, in Couchbase, the namespace for storing data is called a bucket. So in brief, it's a data container that stores data related to applications, either in RAM or in disks.
In fact, buckets help you to partition application data depending on an application's requirements. If you are hosting different types of applications in a cluster, say an e-commerce application and a data warehouse, you can partition them using buckets. You can create two buckets, one for the e-commerce application and another for the data...
By now, you must have understood the concept of buckets, its working and configuration, and so on. Let's now understand the items that get stored in buckets. So, what is a document? A document is a piece of information or data that gets stored in a bucket. It's the smallest item that can be stored in a bucket. As a developer, you will always be working on a bucket, in terms of documents. Documents are similar to a row in the RDBMS table schema but, in NoSQL terminologies, it will be referred to as a document. It's a way of thinking and designing data objects. All information and data should get stored as a document as if it were a physical document. All NoSQL databases, including Couchbase, don't require a fixed schema to store documents or data in a particular bucket. These documents are represented in the form of JSON. Further information and design practices for a document, along with JSON, will be discussed in the next chapter. For the time being, let's try to...
Now you are able to create a bucket and store documents in it. So, let's try to understand another concept, vBucket, which helps in replicating documents across the nodes in a cluster, before moving to the next chapter. In order to understand vBucket, you need to understand document ID, which we already discussed. It is a unique key per bucket, that is associated with each document. Whenever an application needs to store a document in a bucket, it needs to be associated with a unique key, just as a primary key does in the RDBMS table.
Depending on the document ID, documents are distributed across the nodes in a cluster. Each bucket is divided into 1024 logical partitions which are called vBucket. Each partition is bound to a particular node in the cluster. This bindings of vBucket to server nodes is stored in a cluster map, which is a lookup structure. Each vBucket will have a subset of document IDs. This mechanism allows effective distribution and sharding of documents across the...
Let's understand some of the internal concepts of the Couchbase cluster. It will help you to determine the ideal value for various parameters for fine-tuning Couchbase, when we look at tuning in Chapter 10, Administration, Tuning, and Monitoring.
Before we conclude the chapter, let's understand some concepts about internal workings of Couchbase. We will discuss how performance is provided in Couchbase, the replication process, protocol usage, and so on.
As discussed earlier, Couchbase ensures that the most frequently accessed data is stored in the RAM, which is an inbuilt caching layer, and boosts performance, but eventually flushes data to disks for persistence. However, if all the data needs to be stored only in the RAM, then the cluster will require a lot of memory. Thus, to hold large amount of data, Couchbase flushes documents out of the memory to accommodate incoming documents. This process flushes the document to the disk before removing...
In this chapter, you learned how to create a bucket. We explored the concepts of documents and the mechanism of data storage in the Couchbase cluster. Next, we saw some internal mechanisms of Couchbase, such as ejection, replication, warmup, rebalancing, and so on.
In the next chapter, we will explore documents in detail and review some of the design considerations that need to be kept in mind while designing a document in Couchbase.