MongoDB for Java Developers

4 (1 reviews total)
By Francesco Marchioni
  • Instant online access to over 8,000+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

The NoSQL movement is growing in relevance, attracting more and more developers. The MongoDB database is a well-recognized rising star in the NoSQL world. It is a document database, which allows data persistence and enables you to query data in a nested state without any schema constraint and complex joins between documents.

This book provides all the knowledge you need to make MongoDB fit in your application schema, at the best of its capabilities. It starts from a basic introduction to the driver that can be used to perform some low-level interaction with the storage. Then it moves to use different patterns to abstract the persistence layer into your applications, starting from the flexible Google JSON library to the Hibernate OGM Framework and finally landing on the Spring data framework.

By the end of this book, you will know everything you need to use MongoDB in your Java applications.

Publication date:
August 2015
Publisher
Packt
Pages
192
ISBN
9781785280276

 

Chapter 1. Introduction to MongoDB

In this book, you will learn how to develop Java applications using the MongoDB database, which is an open source document-oriented database, recognized as a rising star in the NoSQL world. In a nutshell, MongoDB is a document database, which allows data to persist in a nested state, and importantly, it can query that nested data in an ad hoc fashion. It enforces no schema, so documents can optionally contain fields or types that no other document in the collection contains.

The focus of this book is on applications development; however, we will at first gather all the resources to connect to MongoDB and add a quick introduction to the world of NoSQL databases. We will cover the following topics in more detail:

  • A bird's eye view of the NoSQL landscape

  • Installing MongoDB and client tools

  • Using the MongoDB shell

 

Getting into the NoSQL movement


NoSQL is a generic term used to refer to any data store that does not follow the traditional RDBMS model—specifically, the data is nonrelational and it generally does not use SQL as a query language. Most of the databases that are categorized as NoSQL focus on availability and scalability in spite of atomicity or consistency.

This seems quite a generic definition of NoSQL databases; however, all databases that fall into this category have some characteristics in common such as:

  • Storing data in many formats: Almost all RDBMS databases are based on the storage or rows in tables. NoSQL databases, on the other hand, can use different formats such as document stores, graph databases, key-value stores and even more.

  • Joinless: NoSQL databases are able to extract your data using simple document-oriented interfaces without using SQL joins.

  • Schemaless data representation: A characteristic of NoSQL implementations is that they are based on a schemaless data representation, with the notable exception of the Cassandra database (http://cassandra.apache.org/). The advantage of this approach is that you don't need to define a data structure beforehand, which can thus continue to change over time.

  • Ability to work with many machines: Most NoSQL systems buy you the ability to store your database on multiple machines while maintaining high-speed performance. This brings the advantage of leveraging low cost machines with separate RAM and disk and also supports linear scalability.

On the other hand, all database developers and administrators know the ACID acronym. It says that database transactions should be:

  • Atomicity: Everything in a transaction either succeeds or is rolled back

  • Consistency: Every transaction must leave the database in a consistent state

  • Isolation: Each transaction that is running cannot interfere with other transactions

  • Durability: A completed transaction gets persisted, even after applications restart

At first glance, these qualities seem vital. In practice, however, for many applications, they are incompatible with the availability and performance in very large environments. As an example, let's suppose that you have developed an online book store and you want to display how many of each book you have in your inventory. Each time a user is in the process of buying a book, you need to lock part of the database until they finish so that every visitors from the world will see the exact inventory numbers. That works just fine for a small homemade site but not if you run Amazon.com. For this reason, when we talk about NoSQL databases, or, generally, if we are designing distributed systems, we might have to look beyond the traditional ACID properties. As stated by the CAP theorem, coined by Eric Brewer, the following set of requirements are truly essential when designing applications for distributed architectures:

  • Consistency: This means the database mostly remains adherent to its rules (constraints, triggers, and so on) after the execution of each operation and that any future transaction will see the effects of the earlier transactions committed. For example, after executing an update, all the clients see the same data.

  • Availability: Each operation is guaranteed a response—a successful or failed execution. This, in practice, means no downtime.

  • Partition tolerance: This means the system continues to function even if the communication among the servers is temporarily unreliable (for example, the servers involved in the transaction may be partitioned into multiple groups, which cannot communicate with one another).

In practice, as it is theoretically impossible to have all three requirements met, a combination of two must be chosen and this is usually the deciding factor in what technology is used, as shown in the following figure:

If you are designing a typical web application that uses a SQL database, most likely, you are in the CA part of the diagram. This is because a traditional RDBMS is typically transaction-based (C) and it can be highly available (A). However, it cannot be Partition Tolerance (P) because SQL databases tend to run on single nodes.

MongoDB, on the other hand, is consistent by default (C). This means if you perform a write on the database followed by a read, you will be able to read the same data (assuming that the write was successful).

Besides consistency, MongoDB leverages Partition Tolerance (P) by means of replica sets. In a replica set, there exists a single primary node that accepts writes, and asynchronously replicates a log of its operations to other secondary databases.

However, not all NoSQL databases are built with the same focus. An example of this is CouchDB. Just like MongoDB, it is document oriented and has been built to scale across multiple nodes easily; on the other hand, while MongoDB (CP) favors consistency, CouchDB favors availability (AP) in spite of consistency. CouchDB uses a replication model called Eventual Consistency. In this model, clients can write data to one database node without waiting for acknowledgment from other nodes. The system takes care to copy document changes between nodes, so that they can eventually be in sync.

The following table summarizes the most common NoSQL databases and their position relative to CAP attributes:

Database

Consistent, Partition-Tolerant (CP)

Available, Partition-Tolerant (AP)

BigTable

X

 

Hypertable

X

 

HBase

X

 

MongoDB

X

 

Terrastore

X

 

Redis

X

 

Scalaris

X

 

MemcacheDB

X

 

Berkeley DB

X

 

Dynamo

 

X

Voldemort

 

X

Tokyo Cabinet

 

X

KAI

 

X

Cassandra

 

X

CouchDB

 

X

SimpleDB

 

X

Riak

 

X

Comparing RDBMS and NoSQL databases

As you might guess, there is no absolute winner between traditional databases and the new NoSQL standard. However, we can identify a set of pros and cons related to each technology. This can lead to a better understanding of which one is most fit for our scenarios. Let's start from traditional RDBMS:

RDBMS pros

RDBMS cons

ACID transactions at the database level make development easier.

The object-relational mapping layer can be complex.

Fine-grained security on columns and rows using views prevents views and changes by unauthorized users. Most SQL code is portable to other SQL databases, including open source options.

RDBMS doesn't scale out when joins are required.

Typed columns and constraints will validate data before it's added to the database and increase data quality.

Sharding over many servers can be done but requires application code and will be operationally inefficient.

The existing staff members are already familiar with entity-relational design and SQL.

Full-text search requires third-party tools.

Well-consolidated theoretical basis and design rules.

Storing high-variability data in tables can be challenging.

The following is a table that contains the advantages and disadvantages of NoSQL databases:

NoSQL pros

NoSQL cons

It can store complex data types (such as documents) in a single item of storage.

There is a lack of server-side transactions; therefore, it is not fit for inherently transactional systems.

It allows horizontal scalability, which does not require you to set up complex joins and data can be easily partitioned and processed in parallel.

Document stores do not provide fine-grained security at the element level.

It saves on development time as it is not required to design a fine-grained data model.

NoSQL systems are new to many staff members and additional training may be required.

It is quite fast for inserting new data and for simple operations or queries.

The document store has its own proprietary nonstandard query language, which prohibits portability.

It provides support for Map/Reduce, which is a simple paradigm that allows for scaling computation on a cluster of computing nodes.

There is an absence of standardization. No standard APIs or query languages. It means that migration to a solution from different vendors is more costly. Also, there are no standard tools (for example, for reporting).

Living without transactions

As you can imagine, one of the most important factors when deciding to use MongoDB or traditional RDBMS is the need for transactions.

With an RDBMS, you can update the database in sophisticated ways using SQL and wrap multiple statements in a transaction to get atomicity and rollback. MongoDB doesn't support transactions. This is a solid tradeoff based on MongoDB's goal of being simple, fast, and scalable. MongoDB, however, supports a range of atomic update operations that can work on the internal structures of a complex document. So, for example, by including multiple structures within one document (such as arrays), you can achieve an update in a single atomic way, just like you would do with an ordinary transaction.

Note

As documents can grow in complexity and contain several nested documents, single-document atomicity can be used as a replacement for transactions in many scenarios.

On the other hand, operations that includes multiple documents (often referred to as multi-document transactions), are conversely not atomic.

In such scenarios, when you need to synchronize multi-document transactions, you can implement the 2PC (two-phase commit) in your application so that you can provision these kinds of multidocument updates. Discussing about this pattern, however, is out of the scope of this book, but if you are eager to know more, you can learn more from http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/.

So, to sum it up, if your application's requirements can be met via document updates (also by using nested documents to provide an atomic update), then this is a perfect use case for MongoDB, which will allow a much easier horizontal scaling of your application.

On the other hand, if strict transaction semantics (such as a banking application) are required, then nothing can beat a relational database. In some scenarios, you can combine both approaches (RDBMS and MongoDB) to get the best of both worlds, at the price of a more complex infrastructure to maintain. Such hybrid solutions are quite common; however, you can see them in production apps such as the New York Times website.

Managing read-write concurrency

In RDBMS, managing the execution of concurrent work units is a fundamental concept. The underlying implementation of each database uses behind the scenes locks or Multiversion control to provide the isolation of each work unit. On the other hand, MongoDB uses reader/writer locks that allow concurrent readers shared access to a resource, such as a database or collection, but give exclusive access to a single write operation. In more detail here is how MongoDB handles read and write locks:

  • There can be an unlimited number of simultaneous readers on a database

  • There can only be one writer at a time on any collection in any one database

  • The writers block out the readers once a write request comes in; all the readers are blocked until the write completes (which is also known as writer-greedy)

Since version 2.2 of MongoDB, it is possible to restrict the scope of the lock just the database the read or write operation was working with. If you are using MongoDB 3.0 or later, the scope of the lock is pulled in further than ever before. Now, when a write is occurring, only the documents involved in the write operation will be locked. In order to store information about locks, MongoDB relies on a storage engine, which is a part of the database and is responsible for managing how data is stored on the disk. In particular, MongoDB 3.0 comes with two storage engines:

  • MMAPv1: This is the default storage engine, which uses collection-level locking

  • WiredTiger: This is the new storage engine, which ships with document-level locking and compression (only available for the 64-bit version)

Note

By using the WiredTiger storage engine, all write operations happen within the context of a document-level lock. As a result, multiple clients can modify more than one document in a single collection at the same time. Thanks to this granular concurrency control, MongoDB can more effectively support workloads with read, write, and updates, as well as high-throughput concurrent workloads.

 

MongoDB core elements


In order to understand the capabilities of MongoDB, you need to learn the core elements the database is composed of. Actually, MongoDB is organized with a set of building blocks, which include the following:

  • Database: This is, just like for the database, the top-level element. However, a relational database contains (mostly) tables and views. A Mongo Database, on the other hand, is a physical container of a structure called a collection. Each database has its own set of files on the filesystem. A single MongoDB server typically has multiple databases.

  • Collection: This is a set of MongoDB documents. A collection is the equivalent of an RDBMS table. There can be only one collection with that name on the database but obviously multiple collections can coexist in a database. Typically, the collections contained in a database are related, although they do not enforce a schema as RDBMS tables do.

  • Documents: This is the most basic unit of data in MongoDB. Basically, it is composed by a set of key-value pairs. Unlike database records, documents have a dynamic schema, which means documents that are part of the same collection do not need to have the same set of fields. Much the same way, the fields contained in a document may hold different data types.

The following diagram summarizes the concepts we just discussed:

The heart of MongoDB – the document

At the heart of MongoDB is the document, an ordered set of keys with associated values. The representation of a document varies by the programming language, but most languages have a data structure that is a natural fit, such as a map, hash, or dictionary. Here is a very basic example of a document, which is understood by MongoDB:

{"name" : "Francesco",
 "age" : 44,
 "phone":"123-567-890"}

Most documents will be more complex than this simple one and will often contain embedded data within them. These denormalized data models allow applications to retrieve and manipulate related data in a single database operation:

{"name" : "Francesco",
 "age" : 44,
 "contact" : {
    "phone":"123-567-890"
  }
}

As you can see from the preceding example, we have included the contact information within the same document by using an embedded document with a single key named contact.

Each document requires a key, which needs to be unique within a document. The keys contained in a document are strings. Any UTF-8 character can be included in a key, with a few exceptions:

  • You cannot include the character \0 (also known as the null character) in a key. This character is used to indicate the end of a key.

  • The . and $ characters are internally used by the database so they should be used only in limited cases. As a general rule, it is better to completely avoid using these characters as most MongoDB drivers can generate exceptions when they are used inappropriately.

Finally, you need to be aware that MongoDB is both type-sensitive and case-sensitive. For example, these documents are distinct:

{"age" : 18}
{"age" : "18"}

The same applies to the following documents:

{"age" : 18}
{"Age" : 18}

Understanding how MongoDB stores data

The sample documents you have seen so far should be familiar to you if you have ever heard about JavaScript Object Notation (JSON). JSON is a human and machine-readable open standard that simplifies data interchange and is also one of the most used formats for data interchange in applications along with XML. JSON is able to deal with all the basic data types used by applications such as String, numbers, Boolean values, as well as arrays and hashes. MongoDB is able to store JSON documents into its collections to store records. Let's see an example of a JSON document:

{ 
  _id":1,
  "name":{ 
    "first":"Dennis",
    "last":"Ritchie"
  },
  "contribs":[ 
    "Altran",
    "B",
    "C",
    "Unix"
  ],
  "awards":[ 
    { 
      "award":"Turing Award",
      "year":1983
    },
    { 
      "award":"National medal of technology",
      "year":1999
    }
  ]
}

A JSON-based database returns a set of data that can be easily parsed by most programming languages such as Java, Python, JavaScript, and others, reducing the amount of code you need to build into your application layer.

Behind the scenes, MongoDB represents JSON documents using a binary-encoded format called BSON. Documents encoded with BSON enhance the JSON data model to provide additional data types and efficiency when encoding/decoding data within different languages.

MongoDB uses a fast and lightweight BSON implementation, which is highly traversable and supports complex structures such as embedded objects and arrays.

Data types accepted in documents

So far, we have used just two basic data types, String and Integer. MongoDB offers a wide choice of data types, which can be used in your documents:

  • String: This is the most common data type as it contains a string of text (such as: "name": "John").

  • Integer (32 bit and 64-bit): This type is used to store a numerical value (for example, "age" : 40). Note that an Integer requires no quotes to be placed before or after the Integer.

  • Boolean: This data type can be used to store either a TRUE or a FALSE value.

  • Double: This data type is used to store floating-point values.

  • Min/Max keys: This data type is used to compare a value against the lowest and highest BSON elements, respectively.

  • Arrays: This type is used to store arrays or list or multiple values into one key (for example, ["John, Smith","Mark, Spencer"]).

  • Timestamp: This data type is used to store a timestamp. This can be useful to store when a document has been last modified or created.

  • Object: This data type is used for storing embedded documents.

  • Null: This data type is used for a null value.

  • Symbol: This data type allows storing characters such as String; however, it's generally used by languages that use a specific symbol type.

  • Date: This data type allows storing the current date or time in the Unix time format (POSIX time).

  • Object ID: This data type is used to store the document's ID.

  • Binary data: This data type is used to store a binary set of data.

  • Regular expression: This data type is used for regular expressions. All options are represented by specific characters provided in alphabetical order. You will learn more about regular expressions.

  • JavaScript code: This data type is used for JavaScript code.

 

Installing and starting MongoDB


Installing Mongo DB is much easier than most RDBMS as it's just a matter of unzipping the archived database and, if necessary, configure a new path for data storage. Let's look at the installation for different operating system architectures.

Installing MongoDB on Windows

For installing MongoDB on Windows, perform the following steps:

  1. Download the latest stable release of MongoDB from http://www.mongodb.org/downloads. (At the time of writing, the latest stable release is 3.0.3, which is available as Microsoft Installer or as a ZIP file). Ensure you download the correct version of MongoDB for your Windows system.

  2. Execute the MSI Installer, or if you have downloaded MongoDB as a ZIP file, simply extract the downloaded file to C:\drive or any other location.

MongoDB requires a data directory to store its files. The default location for the MongoDB data folder on Windows is c:\data\db. Execute the following command from the command prompt to create the default folder:

C:\mongodb-win32-x86_64-3.0.3>md data

In Command Prompt, navigate to the bin directory present in the mongodb installation folder and point to the folder where data is stored:

C:\mongodb-win32-x86_64-3.0.3\bin> mongod.exe  --dbpath "C:\mongodb-win32-x86_64-3.0.3\data"

This will show the waiting for the connections message on the console output, which indicates that the mongod.exe process is running successfully.

Installing MongoDB on Linux

The installation on Linux can be different depending on your Linux distribution. Here is a general-purpose installation process:

  1. Download the latest MongoDB distribution, which is appropriate for your OS architecture:

    curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.0.3.tgz
    
  2. Extract the downloaded files:

    tar -zxvf mongodb-linux-x86_64-3.0.3.tgz
    
  3. Copy files to a target directory:

    mkdir -p mongodb
    cp -R -n mongodb-linux-x86_64-3.0.3/ mongodb
    
  4. Include MongoDB scripts in the system's PATH variable:

    export PATH=<mongodb-install-directory>/bin:$PATH
    
  5. Just like we did for Windows, we will create the data folder:

    mkdir -p /data/db
    
  6. Now, you can start MongoDB much the same way as with Windows:

    mongod --dbpath /data/db
    

MongoDB start up options

The list of start up options, which can be applied to the mongod server is quite large and is detailed at http://docs.mongodb.org/manual/reference/program/mongod/.

The following table summarizes the most common options for a handy reference:

Option

Description

--help, -h

This returns the information on the options and use of mongod.

--version

This returns the mongod release number.

--config <filename>

This specifies the configuration file to be used by mongod.

--port <port>

This specifies the TCP listening port on which MongoDB listens. (the default is 27017)

--bind_ip <ip address>

This specifies the IP address that mongod binds to in order to listen for connections from applications (the default is All interfaces.).

--logpath <path>

This sends all diagnostic logging information to a log file instead of to a standard output or to the host's syslog system.

--logappend

This appends new entries to the end of the log file rather than overwriting the content of the log when the mongod instance restarts.

--httpinterface

This enables the HTTP interface. Enabling the interface can increase network exposure.

--fork

This enables a daemon mode that runs the mongod process in the background. By default, mongod does not run as a daemon.

--auth

This enables authorization to control the user's access to database resources and operations. When authorization is enabled, MongoDB requires all clients to authenticate themselves first in order to determine the access for the client.

--noauth

This disables authentication. It is currently the default and exists for future compatibility and clarity.

--rest

This enables the simple REST API. Enabling the REST API enables the HTTP interface, even if the HTTP interface option is disabled, and as a result can increase network exposure.

--profile <level>

This changes the level of database profiling (0 Off, which means no profiling; 1 On, which only includes slow operations; and 2 On, which includes all the operations.)

--shutdown

This safely terminates the mongod process. It is available only on Linux systems.

In addition, the following options can be used to vary the storage of the database:

Option

Description

--dbpath <path>

This is the directory where the mongod instance stores its data. The default is /data/db on Linux and OS X and C:\data\db on Windows.

--storageEngine string

This specifies the storage engine for the mongod database. The valid options include mmapv1 and wiredTiger. The default is mmapv1.

--directoryperdb

This stores each database's files in its own folder in the data directory. When applied to an existing system, the --directoryperdb option alters the storage pattern of the data directory.

Troubleshooting MongoDB installation

On startup, the server will print some version and system information and then begin waiting for connections. By default, MongoDB listens for connections on port 27017. The server process will fail to start if the port is already used by another process—the most common cause of it is that another instance of MongoDB is already running on your machine.

Note

You can stop mongod by typing Ctrl + C in the shell that is running the server. In a clean shutdown, the mongod process completes all running operations, flushes all data to files, and closes all data files. Within the Securing database access section of this chapter, we show how to use the Mongo shell to shut down the database from the Mongo shell.

The mongod command also launches a basic HTTP server that listens, by default, on port 28017. This web server can be used to capture REST request (see http://docs.mongodb.org/ecosystem/tools/http-interfaces/) and to query for administrative information about your database by pointing to http://localhost:28017 with your web browser.

Note

You need to start mongod with the --rest option in order to enable the web administration console.

The following screenshot depicts the web administration GUI when executed from the browser:

 

Mongo tools


MongoDB ships with a set of shell commands, which can be useful to administrate your server. We will shortly provide a description of each command, so that you can get an initial introduction to the server administration:

  • bsondump: This displays BSON files in a human-readable format

  • mongoimport: This converts data from JSON, TSV, or CSV and stores them into a collection

  • mongoexport: This writes an existing collection using the CSV or JSON formats

  • mongodump/mongorestore: This dumps MongoDB data to disk using the BSON format (mongodump), or restores them (mongorestore) to a live database

  • mongostat: This monitors running MongoDB servers, replica sets, or clusters

  • mongofiles: This reads, writes, deletes, or updates files in GridFS

  • mongooplog: This replays oplog entries between MongoDB servers

  • mongotop: This monitors data reading/writing on a running Mongo server

Here is an example of how to use the mongoimport tool to import a CSV-formatted data contained in /var/data/users.csv into the collection users in the sample database on the MongoDB instance running on the localhost port numbered 27017:

mongoimport --db sample --collection users --type csv --headerline --file /var/data/users.csv

In the preceding example, mongoimport determines the name of files using the first line in the CSV file, because of --headerline.

If you want to export the MongoDB documents, you can use the mongoexport tool. Let's look at an example of how to export the collection users (part of the sampled database), limited to the first 100 records:

mongoexport --db sampledb --collection users --limit 100 --out export.json

As part of your daily backup strategy, you should consider using the mongodump tool, which is a utility for creating a binary export of the contents of a database.

Note

mongodump does not provide a backup of the local database.

The following command creates a database dump for the collection named users contained in the database named sampled. In this case, the database is running on the local interface on port 27017:

mongodump  --db test --collection users

The preceding command will create a BSON binary file named users.bson and a JSON file named users.metadata.json containing the documents. The files will be created under dump/[database-name].

Finally, the mongorestore program loads binary data from a database dump created by mongodump to a MongoDB instance. mongorestore can both create a new database and add data to an existing database:

mongorestore --collection users --db sampledb dump/sampledb/users.bson
 

Introduction to the MongoDB shell


MongoDB ships with a JavaScript shell that allows interaction with a MongoDB instance from the command line. The shell is the bread-and-butter tool for performing administrative functions, monitoring a running instance, or just inserting documents.

To start the shell, run the mongo executable:

$ mongo
MongoDB shell version: 3.0.3
connecting to: test

The shell automatically attempts to connect to a running MongoDB server on startup, so make sure you start mongod before starting the shell.

If no other database is specified on startup, the shell selects a default database called test. As a way of keeping all the subsequent tutorial exercises under the same namespace, let's start by switching to the sampledb database:

> use sampledb
switched to db sampledb

If you are coming from an RDBMS background, you might be surprised that we can switch to a new database without formerly creating it. The point is that creating the database is not required in MongoDB. Databases and collections are first created when documents are actually inserted. Hence, individual collections and databases can be created at runtime just as the structure of a document is known.

If you want to check the list of available databases, then you can use the show dbs command:

>show dbs
local     0.78125GB
test      0.23012GB

As you can see, the database we created (sampledb) is not present in the list. To display the database, you need to insert at least one document into it. The next section will show you how to do it.

Inserting documents

As we said, MongoDB documents can be specified in the JSON format. For example, let's recall the simple document that we have already introduced:

{"name" : "francesco",
 "age" : 44,
 "phone":"123-567-890"
}

In order to insert this document, you need to choose a collection where the document will be stored. Here's how you can do it with the Mongo shell:

db.users.insert({"name": "francesco","age": 44, "phone": "123-567-890"})

As for databases, collections can be created dynamically by specifying it into the insert statement. Congratulations, you've just saved your first document!

Note

MongoDB supports a special kind of collection named Capped collections, which are fixed-size collections that are able to support high-throughput operations where insert and retrieve documents are based on insertion order. Capped collections need to be created first before being able to use them. We will show you how to use Capped collections in the next chapter, using the Java driver.

Querying documents

The find method is used to perform queries in MongoDB. If no argument is given to the find method, it will return all the documents contained in the collection as in the following statement:

> db.users.find()

The response will look something like this:

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }

Maybe you have noticed that the _id field has been added to the document. This is a special key that works like a primary key. As a matter of fact, every MongoDB document requires a unique identifier and if you don't provide one in your document, then a special MongoDB ID will be generated and added to the document at that time.

Now, let's include another user in our collections so that we can refine our searches:

> db.users.insert({"name": "owen","age": 32, "phone": "555-444-333"})

Your collection should now include two documents, as verified by the count function:

> db.users.count()
2

Note

As you can see from the preceding insert command, document keys are specified with quotes. This is not mandatory but generally a good practice as it makes queries more readable.

Having two documents in our collection, we will learn how to add a query selector to our find statement so that we filter users based on a key value. For example, here is how to find a user whose name is owen:

> db.users.find({"name": "owen"})
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

Multiple conditions can be specified within a query, just like you would do with a WHERE – AND construct in SQL:

> db.users.find({"name": "owen", "age": 32})

{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

Choosing the keys to return

The queries mentioned earlier are equivalent to a SELECT * statement in SQL terms. You can use a projection to select a subset of fields to return from each document in a query result set. This can be especially useful when you are selecting large documents, as it will reduce the costs of network latency and deserialization.

Projections are commonly activated by means of binary operators (0,1); the binary operator 0 means that the key must not be included in the search whilst 1 obviously means that the key has to be included. Here is an example of how to include the name and age keys in the fields to be returned (along with the id field, which is always included by default:

> db.users.find({}, {"name": 1,"age": 1})
{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44  }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32 }

By setting the projection values for the name and age to 0, the phone number is returned instead:

> db.users.find({}, {"name": 0,"age": 0})
{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "phone" : "123-456-789" }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "phone" : "555-444-333" }

Note

Note that you cannot have a mix of inclusions and exclusions in your projection. The exception to the rule is the _id field. In fact, {_id: 0, name: 1, age: 1} works but any inclusion/exclusion combination of other fields does not.

Using ranges in your queries

Quite commonly, your queries will use some functions to restrict the range of the returned data, which is done in most SQL dialects and languages with the > and < or = operators.

The equivalent operators in MongoDB terms are $gt, $gte, $lt, and $lte. Here is how to find users whose age is greater than 40 using the $gt operator:

> db.users.find({ age: { $gt: 40 } })

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }

The $gte operator, on the other hand, is able to select keys that are greater than or equal (>=) to the one specified:

> db.users.find({ age: { $gte: 32 } })

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

The $lt and $lte operators, on the other hand, allow you to select keys which are smaller and smaller/equal to the value specified.

Using logical operators to query data

You cannot think of a scripting language without logical operators and MongoDB is no exception. The most common logical operators are named $or, $and, and $not in MongoDB.

We will not enter into the basics of logical operators, rather let's see a concrete example of the logical operator OR:

db.users.find( { $or: [ { "age": { $lt: 35 } }, { "name": "john" } ] } )

In the preceding query, we are selecting users whose age is smaller than 35 or have the name john. As one of the conditions evaluates to true, it will return one user:

{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

By turning to the AND logical operator, on the other hand, no users will be returned:

db.users.find( { $and: [ { "age": { $lt: 35 } }, { "name": "john" } ] } )

By using the NOT operator, you can invert the effect of a query expression and return documents that do not match the query expression. For example, if you wanted to query for all users with last names not beginning with f, you could use $not as follows:

> db.users.find({"name": {$not: /^f/} })
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

Tip

Using LIKE in Mongo

Note the /expr/ operator, which can be used to achieve a SQL-like equivalent expression. For example, in its simplest form, you can use it to query for phone numbers, which are like 444:

> db.users.find({"phone": /444/})

Updating documents

In order to update an existing document, you need to provide two arguments:

  • The document to update

  • How the selected documents should be modified

Let's see a practical example, supposing that you wanted to change the key age for the user owen to be 39:

> db.users.update({name: "owen"}, {$set: {"age": 39}})

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

The outcome of the statement informs us that the update matched one document which was modified. A find issued on the users collection reveals that the change has been applied:

> db.users.find()

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 39, "phone" : "555-444-333" }

Note

Be aware that executing an update without the $set operator won't update the fields but replace the whole document, while preserving the _id field.

The update supports an additional option, which can be used to perform a more complex logic. For example, what if you wanted to update the record if it exists, and create it if it doesn't? This is called upsert and can be achieved by setting the upsert option to true, as in the following command line:

> db.users.update({user: "frank"}, {age: 40},{ upsert: true} )

WriteResult({
        "nMatched" : 0,
        "nUpserted" : 1,
        "nModified" : 0,
        "_id" : ObjectId("55082f5ea30be312eb167fcb")
})

As you can see from the output, an upsert has been executed and a document with the age key has been added:

> db.users.find()

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "age" : 39, "phone" : "555-444-333" }
{ "_id" : ObjectId("55082f5ea30be312eb167fcb"), "age" : 40 } 

Updating a document with MongoDB can be done also on a portion of a document, for example, you can remove a single key from your collection by using the $unset option. In the following update, we are removing the age key to all documents whose name key equals to owen.

> db.users.update({name: "owen"}, {$unset : { "age" : 1} })

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Executing the find on our collection confirms the update:

> db.users.find()

{ "_id" : ObjectId("5506d5988d7bd8471669e675"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
{ "_id" : ObjectId("5506eea18d7bd8471669e676"), "name" : "owen", "phone" : "555-444-333" }
{ "_id" : ObjectId("55082f5ea30be312eb167fcb"), "age" : 40 }

The opposite of the $unset operator is $push, which allows you to append a value to a specified field. So here is how you can restore the age key for the user owen:

> db.users.update({name: "owen"}, {$push : { "age" : 39} })
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Note

You can achieve the same result by using $set on a field, which is not included in the document.

Deleting data

As you have just seen, the update operator is quite flexible and allows trimming or pushing keys to your collections. If you need to delete a whole set of documents, then you can use the remove operator. When used without any parameter, it is equivalent to the TRUNCATE command in SQL terms:

> db.users.remove()

Most of the time, you need to be more selective when deleting documents as you might need to remove just a set of documents matching one or more conditions. For example, here is how to remove users older than 40:

> db.users.remove({ "age": { $gt: 40 } })
WriteResult({ "nRemoved" : 1 })

Just like the TRUNCATE statement in SQL, it just removes documents from a collection. If you want to delete the collection, then you need to use the drop() method, which deletes the whole collection structure, including any associated index:

> db.users.drop()

Beyond basic data types

Although the basic data types we have used so far will be fine for most use cases, there are a couple of additional types that are crucial to most applications, especially when mapping Mongo types to a language driver such as a Mongo driver for Java.

Arrays

MongoDB has a rich query language that supports storing and accessing documents as arrays. One of the great things about arrays in documents is that MongoDB understands their structure and knows how to reach inside arrays to perform operations on their content. This allows us to query on arrays and build indexes using their content.

Let's start creating a couple of documents containing an array of items:

> db.restaurant.insert({"menu" : ["bread", "pizza", "coke"]})
WriteResult({ "nInserted" : 1 })

> db.restaurant.insert({"menu" : ["bread", "omelette", "sprite"]})
WriteResult({ "nInserted" : 1 })

We will now show you how to query on the array selection to find the menu, which includes pizza:

> db.restaurant.find({"menu" : "pizza"})

{ "_id" : ObjectId("550abbfe89ef057ee0671650"), "menu" : [ "bread","pizza", "coke" ] }

Should you need to match arrays using more than one element, then you can use $all. This allows you to match a list of elements. For example, let's see how you can query on the above collection by matching two items in the menu:

> db.restaurant.find({"menu" : {$all : ["pizza", "coke"]}})

{ "_id" : ObjectId("550abbfe89ef057ee0671650"), "menu" : [ "bread", "pizza", "coke" ] }

Embedded documents

You can use a document as a value for a key. This is called an embedded document. Embedded documents can be used to organize data in a more natural way than just a flat structure of key-value pairs. This matches well with most object-oriented languages, which holds a reference to another structure in their class.

Let's start by defining a structure, which is assigned to a variable in the mongo shell:

x = { 
   "_id":1234,
   "owner":"Frank's Car",
   "cars":[ 
      { 
         "year":2011,
         "model":"Ferrari",
         price:250000
      },
      { 
         "year":2013,
         "model":"Porsche",
         price:250000
      }
   ]
}

Since the Mongo shell is a JavaScript interface, it is perfectly fine to write something like the preceding code and even use functions in order to enhance objects in the shell. Having defined our variable, we can insert it into the cars collection as follows:

> db.cars.insert(x);
WriteResult({ "nInserted" : 1 })

Alright. We have just inserted a document which in turn contains an array of documents. We can query our subdocument by using the dot notation. For example, we can choose the list of cars whose model is Ferrari by using the cars.model criteria:

> db.cars.find( { "cars.model": "Ferrari" }).pretty()
{
  "_id" : 1234,
  "owner" : "Frank's Car",
  "cars" : [
    {
      "year" : 2011,
      "model" : "Ferrari",
      "price" : 250000
    },
    {
      "year" : 2013,
      "model" : "Porsche",
      "price" : 250000
    }
  ]

}

Note

Also, the pretty function provides a pretty formatted JSON in the response.

Some useful functions

We will complete our excursus on the Mongo shell with some handy functions, which can be used to achieve a more precise control over your queries. The ones we will cover in this section are the limit, sort, and skip functions.

You can use the limit function to specify the maximum number of documents returned by your query. You obviously need to provide the number of records to be returned as a parameter. By setting this parameter to 0, all the documents will be returned:

> db.users.find().limit(10)

The sort function, on the other hand, can be used to sort the results returned from the query in ascending (1) or descending (-1) order. This function is pretty much equivalent to the ORDER BY statement in SQL. Here is a basic example of sort:

> db.users.find({}).sort({"name":1})

{ "_id" : ObjectId("5506d5708d7bd8471669e674"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
{ "_id" : ObjectId("550ad3ef89ef057ee0671652"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

This example sorts the results based on the name key-value in ascending order, which is the default sorting order. If you want to switch to the descending order, then you would need to add the -1 flag to the sort operator.

Note

Note that the sort function when issued against the document's _id will be sorted on a time criteria.

The next one in the list is the skip function, which skips the first n documents in a collection. For example, here is how to skip the first document in a search across the users collection:

> db.users.find().skip(1)
{ "_id" : ObjectId("550ad3ef89ef057ee0671652"), "name" : "owen", "age" : 32, "phone" : "555-444-333" }

All the preceding commands can be also combined to produce a powerful expression. For example, the preceding command will return a different user when combined with a sort function in descending order:

> db.users.find().skip(1).sort({"name":-1})
{ "_id" : ObjectId("5506d5708d7bd8471669e674"), "name" : "francesco", "age" : 44, "phone" : "123-456-789" }
 

Securing database access


We will conclude this chapter by informing you about database security. So far, we have started and used MongoDB without any authentication process. Actually, starting mongod without any additional option exposes the database to any user who is aware of the process.

We will show how to provide secure access by means of the mongo shell. So, launch the mongo shell and connect to the admin database, which holds information about the users:

use admin

Now, let's use the createUser function to add a user named administrator with the password mypassword and grant unlimited privileges (the role root):

db.createUser(
    {
      user: "administrator",
      pwd: "mypassword",
      roles: [ "root" ]
    }
)

Now, shut down the server by using the following command:

db.shutdownServer()

We will restart the database using the –-auth option, which forces user authentication:

mongod --dbpath "C:\mongodb-win32-x86_64-3.0.3\data" --auth

Now, the database is started in secure mode. You can connect from the mongo shell in two different ways. The first one should be used with caution on Linux/Unix systems, as it exposes the user/password in the process list:

mongo -u administrator -p mypassword --authenticationDatabase admin

As an alternative, you can start the mongo shell and authenticate it at the beginning of the session (you need to select the admin database at first as the authentication keys are stored on the admin DB):

use admin
db.auth('admin','mypassword')

use yourdb
. . . .
 

Summary


This chapter has provided a whistle-stop tour of the basics of the MongoDB and NoSQL databases. We have gone through some advantages that can be gained when choosing a NoSQL database and the trade-offs compared with a relational database.

Then, we took you through the installation of MongoDB and some start up options that can be used to customize your server. Finally, we talked about the MongoDB shell and learned how to manipulate data using some basic CRUD operations.

In the next chapter, we will show you how to connect to MongoDB using the Java driver and perform some equivalent actions using Java.

About the Author

  • Francesco Marchioni

    Francesco Marchioni is a Red Hat Certified JBoss Administrator (RHCJA) and Sun Certified Enterprise Architect working at Red Hat in Rome, Italy. He started learning Java in 1997, and since then he has followed the path to the newest Application Program Interfaces released by Sun. In 2000, he joined the JBoss community when the application server was running the 2.X release.

    He has spent years as a software consultant, where he has envisioned many successful software migrations from vendor platforms to open source products, such as JBoss AS, fulfilling the tight budget requirements of current times.

    Over the last 10 years, he has authored many technical articles for OReilly Media and ran an IT portal focused on JBoss products (http://www.mastertheboss.com).

    Browse publications by this author

Latest Reviews

(1 reviews total)
Kurze aber verständliche Einführung
Book Title
Access this book, plus 8,000 other titles for FREE
Access now