Introduction to MongoDB
MongoDB, the most popular document database, is a NoSQL, non-relational key-value store, a JSON database, and more. It is a robust, feature-rich developer data platform with various built-in features that you need for modern applications, such as machine learning and AI capabilities, streaming, functions, triggers, serverless, device sync, and full-text search.
Though MongoDB is non-relational, it can easily handle relational data. It offers courses, tutorials, and documentation at learn.mongodb.com on how to best model and handle that data, even while using the document format.
Who uses MongoDB
10 years ago, the use of MongoDB was somewhat niche. It was a young database with compelling features for developers like you. Now, in 2023, MongoDB is used by a myriad of varied industries, and its use cases span across all kinds of situations and types of data stored. Some of the largest banks, automakers, government agencies, and gaming companies in the world use MongoDB for their production applications. The most famous users of MongoDB are Coinbase, Epic Games, Morgan Stanley, Adobe, Tesla, Canva, Ulta Beauty, Cathay Pacific, Dongwha, and Vodafone.
The MongoDB Atlas platform has millions of users, all of whom trust that their data will be managed safely and effectively in the cloud. This popularity has taken MongoDB to great heights, not just in terms of its growth and value as a company but also in terms of its raw developer mindshare. As of 2023, one in four developers uses MongoDB extensively in production. This ratio is much larger in other developer communities such as Go and JavaScript (approximately 40%).
Why developers love MongoDB
Along with its versatality, and robust features, MongoDB is the preferred choice for several reasons, such as:
- Flexibility and schema-less: Unlike traditional relational databases, MongoDB allows you to store and retrieve data without strict schemas or predefined structures. This flexibility is particularly useful when data evolves over time or when you are dealing with unstructured or semi-structured data.
- Scalability and performance: MongoDB is highly scalable and performs exceptionally well, making it suitable for both large-scale applications and personal projects. MongoDB Atlas provides a free-forever tier for side projects.
- Rich query language: MongoDB offers a powerful query language and indexing capabilities, simplifying common operations such as
findOne
andupdateOne
. - Developer-friendly data format: Data in MongoDB closely resembles objects in popular programming languages, reducing data mapping complexities and expediting development.
- Simplicity and quick start: MongoDB's simplicity and hassle-free setup makes it easy to adapt. No complex sales processes or licensing hassles are involved.
What attracts most developers is the simplicity of working with MongoDB on a daily basis, in particular, the seamless experience of creating, updating, and interacting with data. For example, consider a Python developer attempting to insert a document, query that document, and receive a set of results, using the following code:
from pymongo import MongoClient # Connect to MongoDB client = MongoClient('mongodb://localhost:27017/') db = client['mydatabase'] # Specify the database name collection = db['mycollection'] # Specify the collection name # Create a document to be inserted document = { 'name': 'Jane Doe', 'age': 30, 'email': 'janedoe@example.com' } # Insert the document into the collection result = collection.insert_one(document) # Check if the insertion was successful if result.acknowledged: print('Document inserted successfully.') print('Inserted document ID:', result.inserted_id) else: print('Failed to insert document.')
Note that the developer creates a dictionary representing the document to be inserted. In this case, it contains the name
, age
, and email
details. The developer doesn't need to create an ID for the document, because MongoDB automatically creates a unique identifier on each document.
To retrieve this document, you can filter the query by using any of the document's field individually, or in combination. Let's see that in action:
from pymongo import MongoClient # Connect to MongoDB client = MongoClient('mongodb://localhost:27017/') db = client['mydatabase'] # Specify the database name collection = db['mycollection'] # Specify the collection name # Retrieve documents based on specific conditions query = { 'age': {'$gte': 29}, # Retrieve documents where age is greater than or equal to 29 } documents = collection.find(query) # Iterate over the retrieved documents for document in documents: print(document)
Pretty simple! The preceding example demonstrates how you can use a MongoDB query operator such as $gte
(greater than or equal to) to filter your query. But the real magic happens when the document is returned. When MongoDB returns a document, it will be represented as a Python dictionary. Each field in the document is a key-value pair within the dictionary, similar to the following example:
{ '_id': ObjectId('60f5c4c4543b5a2c7c4c73a2'), 'name': 'Jane Doe', 'age': 30, 'email': 'janedoe@example.com' }
MongoDB has a suite of language libraries and drivers that act as a translation layer between the client and server, intercepting each operation and translating it into MongoDB's query language. With this, you can interact with the data using your native programming language in a purely idiomatic way.
Alongside the other offerings of the MongoDB Atlas developer data platform, it truly abstracts away the difficulties of working with a database, and instead allows you to interact with data purely via your code and IDE. This is infinitely preferable while using a separate database shell, database UI, and other database-specific tools. Since MongoDB Atlas offers a completely managed MongoDB database, you can set up and register via a command-line interface.
At its heart, the mission of MongoDB is to be a powerful database for developers, and its features are tuned to the programming language communities and framework integrations, rather than to database administration tools. This will become more apparent in the subsequent chapters, where you'll learn more about MongoDB Atlas, Atlas Vector Search, full-text search, and features such as aggregation—all through the lens of a developer.
By the end of this book, you'll learn how effective these tools can be and make database management simpler!
Efficiency of the inherent complexity of MongoDB databases
The most interesting part of the modern database is understanding its architecture and why it's built that way. Fundamentally, MongoDB is a distributed system. The database server itself was originally built with the anticipation that most users would run it with a default configuration—replica set, sometimes also referred to as a cluster. When you explore this architecture in-depth, you'll notice the true complexities.
By default, replica set of MongoDB is a three-node configuration. All three nodes are data-bearing, which means that there is a complete copy of the database available on each node. Each database is hosted on a separate instance or host, which can be in the same availability zone, data center, or region. This default configuration is to ensure both redundancy and high availability. Chapter 2, The MongoDB Architecture will discuss replica sets in more detail.
If one of the instances becomes unresponsive or unavailable, a healthy node is promoted to become the primary node. This failover between members occurs automatically, and there's no impact on operations for the users of the database. This process considers many different factors, including node availability, data freshness, and responsiveness. This election process and protocol, while simple to understand at a high-level, is very nuanced. But since the operations continue without interruption, you hardly know or understand these details.
How is this possible?
Behind the scenes, write operations to MongoDB are propagated from the primary node to the secondary nodes via a process called replication. The best way to explain replication is with the example of a single write to the database. An inbound write from the client application (your app) will be first directed to the primary node. That primary node will apply the write to its copy of the database. Then, the write is recorded in the operations log (oplog), which is tailed by secondary nodes.
Replication in MongoDB is based on the RAFT consensus protocol. One particular example of how this implementation varies is leader elections. In the traditional RAFT protocol, leader and primary node election occurs through a combination of randomized election timeouts and message exchanges. In MongoDB, there are settings for node priority. This priority is considered along with data freshness and response time when electing a primary node.
It is often true that the write operation is not written simultaneously to all nodes—there is a lag heavily influenced by factors such as network latency, the distance between nodes, hardware configuration, and workload. If one of the mongod
nodes falls behind, it will catch up or resync itself when it is able to do so using the oplog to determine the gaps in its operations. The MongoDB system monitors the replication lag between nodes to track this metric and assess whether the delay between primary and secondary nodes is acceptable, and if not, takes necessary action. This process is unique among databases as well.
This default configuration of MongoDB is a replica set with three members, where replication of data between nodes and failover between nodes are all handled automatically. This configuration is both durable and highly available, which makes it easy to use. For developers who require larger, global deployments, MongoDB has a sharded cluster model. The first thing to understand is that a sharded cluster consists of replica sets. It is a way of further dividing your data into effectively replicated partitions.
Figure 1.1: Replicated partitions set with primary and secondary nodes
If you require a global deployment with multiple terabytes of data, get started with Chapter 2, The MongoDB Architecture. It will cover how to split data, how to migrate data between regions or shards, how to marry data from multiple regions for analytics, and the performance of sharded cluster architectures.
Summary
MongoDB is a simple yet powerful database. It abstracts away many of the more complicated implementation details so that you can focus on building applications. It's easy to get started with and offers a powerful, idiomatic developer experience that allows you to interact with the database, exclusively in the programming language of your choice, via your IDE.
The rest of the book details how powerful and flexible MongoDB is, the new features in MongoDB 7.0, and how you can use it to your advantage. Besides being a great database for web applications, transactional data, flexible schemas, and high-performance workloads, it is also a great database for learning through hands-on experience and building proofs of concept.
In the next chapter, you'll see how replication and sharding can help increase reliability and availability for your applications.