Ruby with MongoDB for Web Development

Creating documents

Let's first see how we can create documents in MongoDB. As we have briefly seen, MongoDB deals with collections and documents instead of tables and rows.

Time for action – creating our first document

Suppose we want to create the book object having the following schema:

book = { name: "Oliver Twist", author: "Charles Dickens", publisher: "Dover Publications", published_on: "December 30, 2002", category: ['Classics', 'Drama'] }


On the Mongo CLI, we can add this book object to our collection using the following command:

> db.books.insert(book)


Suppose we also add the shelf collection (for example, the floor, the row, the column the shelf is in, the book indexes it maintains, and so on that are part of the shelf object), which has the following structure:

shelf : { name : 'Fiction', location : { row : 10, column : 3 }, floor : 1 lex : { start : 'O', end : 'P' }, }


Remember, it's quite possible that a few years down the line, some shelf instances may become obsolete and we might want to maintain their record. Maybe we could have another shelf instance containing only books that are to be recycled or donated. What can we do? We can approach this as follows:

  • The SQL way: Add additional columns to the table and ensure that there is a default value set in them. This adds a lot of redundancy to the data. This also reduces the performance a little and considerably increases the storage. Sad but true!
  • The NoSQL way: Add the additional fields whenever you want. The following are the MongoDB schemaless object model instances:

> { "_id" : ObjectId("4e81e0c3eeef2ac76347a01c"), "name" : "Fiction", "location" : { "row" : 10, "column" : 3 }, "floor" : 1 } { "_id" : ObjectId("4e81e0fdeeef2ac76347a01d"), "name" : "Romance", "location" : { "row" : 8, "column" : 5 }, "state" : "window broken", "comments" : "keep away from children" }

What just happened?

You will notice that the second object has more fields, namely comments and state. When fetching objects, it's fine if you get extra data. That is the beauty of NoSQL. When the first document is fetched (the one with the name Fiction), it will not contain the state and comments fields but the second document (the one with the name Romance) will have them. Are you worried what will happen if we try to access non-existing data from an object, for example, accessing comments from the first object fetched? This can be logically resolved—we can check the existence of a key, or default to a value in case it's not there, or ignore its absence. This is typically done anyway in code when we access objects. Notice that when the schema changed we did not have to add fields in every object with default values like we do when using a SQL database. So there is no redundant information in our database. This ensures that the storage is minimal and in turn the object information fetched will have concise data. So there was no redundancy and no compromise on storage or performance. But wait! There's more.

NoSQL scores over SQL databases

The way many-to-many relations are managed tells us how we can do more with MongoDB that just cannot be simply done in a relational database. The following is an example:

Each book can have reviews and votes given by customers. We should be able to see these reviews and votes and also maintain a list of top voted books.

If we had to do this in a relational database, this would be somewhat like the relationship diagram shown as follows: (get scared now!)

The vote_count and review_count fields are inside the books table that would need to be updated every time a user votes up/down a book or writes a review. So, to fetch a book along with its votes and reviews, we would need to fire three queries to fetch the information:

SELECT * from book where id = 3; SELECT * from reviews where book_id = 3; SELECT * from votes where book_id = 3;

We could also use a join for this:

SELECT * FROM books JOIN reviews ON reviews.book_id = JOIN votes ON votes.book_id =;

In MongoDB, we can do this directly using embedded documents or relational documents.

Using MongoDB embedded documents

Embedded documents, as the name suggests, are documents that are embedded in other documents. This is one of the features of MongoDB and this cannot be done in relational databases. Ever heard of a table embedded inside another table?

Instead of four tables and a complex many-to-many relationship, we can say that reviews and votes are part of a book. So, when we fetch a book, the reviews and the votes automatically come along with the book.

Embedded documents are analogous to chapters inside a book. Chapters cannot be read unless you open the book. Similarly embedded documents cannot be accessed unless you access the document.

For the UML savvy, embedded documents are similar to the contains or composition relationship.

Time for action – embedding reviews and votes

In MongoDB, the embedded object physically resides inside the parent. So if we had to maintain reviews and votes we could model the object as follows:

book : { name: "Oliver Twist", reviews : [ { user: "Gautam", comment: "Very interesting read" }, { user: "Harry", comment: "Who is Oliver Twist?" } ] votes: [ "Gautam", "Tom", "Dick"] }

What just happened?

We now have reviews and votes inside the book. They cannot exist on their own. Did you notice that they look similar to JSON hashes and arrays? Indeed, they are an array of hashes. Embedded documents are just like hashes inside another object.

There is a subtle difference between hashes and embedded objects as we shall see later on in the book.

Have a go hero – adding more embedded objects to the book

Try to add more embedded objects such as orders inside the book document. It works!

order = { name: "Toby Jones" type: "lease", units: 1, cost: 40 }

Fetching embedded objects

We can fetch a book along with the reviews and the votes with it. This can be done by executing the following command:

> var book = db.books.findOne({name : 'Oliver Twist'}) > 2 > book.votes.length 3 > [ { user: "Gautam", comment: "Very interesting read" }, { user: "Harry", comment: "Who is Oliver Twist?" } ] > book.votes [ "Gautam", "Tom", "Dick"]

This does indeed look simple, doesn't it? By fetching a single object, we are able to get the review and vote count along with the data.

Use embedded documents only if you really have to! Embedded documents increase the size of the object. So, if we have a large number of embedded documents, it could adversely impact performance. Even to get the name of the book, the reviews and the votes are fetched.

Using MongoDB document relationships

Just like we have embedded documents, we can also set up relationships between different documents.

Time for action – creating document relations

The following is another way to create the same relationship between books, users, reviews, and votes. This is more like the SQL way.

book: { _id: ObjectId("4e81b95ffed0eb0c23000002"), name: "Oliver Twist", author: "Charles Dickens", publisher: "Dover Publications", published_on: "December 30, 2002", category: ['Classics', 'Drama'] } Every document that is created in MongoDB has an object ID associated with it. In the next chapter, we shall soon learn about object IDs in MongoDB. By using these object IDs we can easily identify different documents. They can be considered as primary keys. So, we can also create the reviews collection and the votes collection as follows: users: [ { _id: ObjectId("8d83b612fed0eb0bee000702"), name: "Gautam" }, { _id : ObjectId("ab93b612fed0eb0bee000883"), name: "Harry" } ] reviews: [ { _id: ObjectId("5e85b612fed0eb0bee000001"), user_id: ObjectId("8d83b612fed0eb0bee000702"), book_id: ObjectId("4e81b95ffed0eb0c23000002"), comment: "Very interesting read" }, { _id: ObjectId("4585b612fed0eb0bee000003"), user_id : ObjectId("ab93b612fed0eb0bee000883"), book_id: ObjectId("4e81b95ffed0eb0c23000002"), comment: "Who is Oliver Twist?" } ] votes: [ { _id: ObjectId("6e95b612fed0eb0bee000123"), user_id : ObjectId("8d83b612fed0eb0bee000702"), book_id: ObjectId("4e81b95ffed0eb0c23000002"), }, { _id: ObjectId("4585b612fed0eb0bee000003"), user_id : ObjectId("ab93b612fed0eb0bee000883"), } ]

What just happened?

Hmm!! Not very interesting, is it? It doesn't even seem right. That's because it isn't the right choice in this context. It's very important to know how to choose between nesting documents and relating them.

In your object model, if you will never search by the nested document (that is, look up for the parent from the child), embed it.

Just in case you are not sure about whether you would need to search by an embedded document, don't worry too much – it does not mean that you cannot search among embedded objects. You can use Map/Reduce to gather the information.

Comparing MongoDB versus SQL syntax

This is a good time to sit back and evaluate the similarities and dissimilarities between the MongoDB syntax and the SQL syntax. Let's map them together:

SQL commands

NoSQL (MongoDB) equivalent



SELECT * FROM books WHERE id = 3;

db.books.find( { id : 3 } )

SELECT * FROM books WHERE name LIKE 'Oliver%'

db.books.find( { name : /^Oliver/ } )

SELECT * FROM books WHERE name like '%Oliver%'

db.books.find( { name : /Oliver/ } )

SELECT * FROM books WHERE publisher = 'Dover Publications' AND published_date = "2011-8-01"

db.books.find( { publisher : "Dover Publications", published_date : ISODate("2011-8-01") } )

SELECT * FROM books WHERE published_date > "2011-8-01"

db.books.find ( { published_date : { $gt : ISODate("2011-8-01") } } )

SELECT name FROM books ORDER BY published_date

db.books.find( {}, { name : 1 } ).sort( { published_date : 1 } )

SELECT name FROM books ORDER BY published_date DESC

db.books.find( {}, { name : 1 } ).sort( { published_date : -1 } )

SELECT from books JOIN votes where votes.book_id =

db.books.find( { votes : { $exists : 1 } }, { : 1 } )

Some more notable comparisons between MongoDB and relational databases are:

  • MongoDB does not support joins. Instead it fires multiple queries or uses
    Map/Reduce. We shall soon see why the NoSQL faction does not favor joins.
  • SQL has stored procedures. MongoDB supports JavaScript functions.
  • MongoDB has indexes similar to SQL.
  • MongoDB also supports Map/Reduce functionality.
  • MongoDB supports atomic updates like SQL databases.
  • Embedded or related objects are used sometimes instead of a SQL join.
  • MongoDB collections are analogous to SQL tables.
  • MongoDB documents are analogous to SQL rows.

Using Map/Reduce instead of join

We have seen this mentioned a few times earlier—it's worth jumping into it, at least briefly.

Map/Reduce is a concept that was introduced by Google in 2004. It's a way of distributed task processing. We "map" tasks to works and then "reduce" the results.

Understanding functional programming

Functional programming is a programming paradigm that has its roots from lambda calculus. If that sounds intimidating, remember that JavaScript could be considered a functional language. The following is a snippet of functional programming:

$(document).ready( function () { $('#element').click( function () { # do something here }); $('#element2').change( function () { # do something here }) });

We can have functions inside functions. Higher-level languages (such as Java and Ruby) support anonymous functions and closures but are still procedural functions. Functional programs rely on results of a function being chained to other functions.

Building the map function

The map function processes a chunk of data. Data that is fed to this function could be accessed across a distributed filesystem, multiple databases, the Internet, or even any mathematical computation series!

function map(void) -> void

The map function "emits" information that is collected by the "mystical super gigantic computer program" and feeds that to the reducer functions as input. MongoDB as a database supports this paradigm making it "the all powerful" (of course I am joking, but it does indeed make MongoDB very powerful).

Time for action – writing the map function for calculating vote statistics

Let's assume we have a document structure as follows:

{ name: "Oliver Twist", votes: ['Gautam', 'Harry'] published_on: "December 30, 2002" }

The map function for such a structure could be as follows:

function() { emit(, {votes : this.votes} ); }

What just happened?

The emit function emits the data. Notice that the data is emitted as a (key, value) structure.

  • Key: This is the parameter over which we want to gather information. Typically it would be some primary key, or some key that helps identify the information.
  • For the SQL savvy, typically the key is the field we use in the GROUP BY clause.

  • Value: This is a JSON object. This can have multiple values and this is the data that is processed by the reduce function.

We can call emit more than once in the map function. This would mean we are processing data multiple times for the same object.

Building the reduce function

The reduce functions are the consumer functions that process the information emitted from the map functions and emit the results to be aggregated. For each emitted data from the map function, a reduce function emits the result. MongoDB collects and collates the results. This makes the system of collection and processing as a massive parallel processing system giving the all mighty power to MongoDB.

The reduce functions have the following signature:

function reduce(key, values_array) -> value

Time for action – writing the reduce function to process emitted information

This could be the reduce function for the previous example:

function(key, values) { var result = {votes: 0} values.forEach(function(value) { result.votes += value.votes; }); return result; }

What just happened?

reduce takes an array of values – so it is important to process an array every time. There are various options to Map/Reduce that help us process data.

Let's analyze this function in more detail:

function(key, values) { var result = {votes: 0} values.forEach(function(value) { result.votes += value.votes; }); return result; }

The variable result has a structure similar to what was emitted from the map function. This is important, as we want the results from every document in the same format. If we need to process more results, we can use the finalize function (more on that later). The result function has the following structure:

function(key, values) { var result = {votes: 0} values.forEach(function(value) { result.votes += value.votes; }); return result; }

The values are always passed as arrays. It's important that we iterate the array, as there could be multiple values emitted from different map functions with the same key. So, we processed the array to ensure that we don't overwrite the results and collate them.

Understanding the Ruby perspective

Until now we have just been playing around with MongoDB. Now let's have a look at this from Ruby. Aaahhh… bliss!

For this example, we shall write some basic classes in Ruby. We are using Rails 3 and the Mongoid wrapper for MongoDB. (We shall see more about MongoDB wrappers later in the book)

Setting up Rails and MongoDB

To set up a Rails project, we first need to install the Rails gem. We shall also install the Bundler gem that goes hand-in-hand with Rails.

Time for action – creating the project

First we shall create the sample Rails project. Assuming you have installed Ruby already, we need to install Rails. The following command shows how to install Rails and Bundler.

$ gem install rails $ gem install bundler

What just happened?

The preceding commands will install Rails and Bundler. For the sake of this example, I am working with Rails 3.2.0 (that is, the current latest version) but I recommend that you should use the latest version of Rails available.

Understanding the Rails basics

Rails is a web framework written in Ruby. It was released publicly in 2005 and it has gathered a lot of steam since then. It is interesting to note that until Rails 2.x, the framework was a tightly coupled one. This was when other loosely coupled web frameworks made their way into the developer market. The most popular among them were Merb and Sinatra. These frameworks leveraged Ruby to its full potential but were competing against each other.

Around 2008-2009, the Rails core team (David Hanson and team)
met the makers of Merb (Yehuda Katz and team) and they got
together and discussed a strategy that has literally changed the
face of web development. Rails 3 emerged with a bang; it had a
brand new framework with Metal and Rack with loosely coupled
components and very customizable middleware. This has made
Rails extremely popular today.

Using Bundler

Bundler is another awesome gem by "Carlhuda" (Yahuda and Carl Leche) that manages gem dependencies in Ruby applications.

Why do we need the Bundler

In the "olden" days, when everything was a system installation, things would be running smoothly till somebody upgraded a system library or a gem... and then Kaboom! — the application crashed for no apparent reason and no code change. Some libraries break compatibility, which in turn requires us to install the new gems. So, even if a system administrator upgraded the system (as a routine maintenance activity), our Ruby application was prone to crashes.

A bigger problem arose when we were required to install multiple Ruby applications on the same system. Ruby version, Rails version, gem versions, and system libraries all could potentially clash to make development and deployment a nightmare!

One solution was to freeze gems and the Ruby version. This required us to ship everything into our application bundle. Not only was this inefficient but also increased the size of the bundle.

Then came along Bundler and, as the name suggests, it keeps track of dependencies in a Ruby application. Java has a similar package called Maven. But wait! Bundler has more in store. We can now package gems (via a Gemfile) and specify environments with it. So, if we
require some gems only for testing, it can be specified to be a part of only the "test" group.

If that's not sold you over using Bundler, we can specify the source of the gem files too – github, sourceforge or even a gem in our local file system.

Bundler generates Gemfile.lock that manages the gem dependencies for the application. It uses the system-installed gems; so that we don't have to freeze gems or Ruby versions with each application.

Setting up Sodibee

Now that we have installed Rails and Bundler, it's time to set up the Sodibee project.

Time for action – start your engines

Now we shall create the Sodibee project in Rails 3. It can be done using the following command:

$ rails new sodibee –JO

In the previous command, -J means skip-prototype (and use jQuery instead) and -O means skip-activerecord. This is important, as we want to use MongoDB.

Add the following to Gemfile:

gem 'mongoid' gem 'bson' gem 'bson_ext'

Now on command line, type the following:

$ bundle install

In Rails 3.2.1 a lot of automaton has been added. bundle install is part of the process of creating a project.

What just happened?

The previous command: bundle install fetches missing gems, their dependencies, and installs them. It then generates Gemfile.lock. After bundle install is complete, you would see the following on the screen:

$ bundle install Fetching source index for Using rake (0.9.2) Using abstract (1.0.0) Using activesupport (3.2.0) Using builder (2.1.2) Using i18n (0.5.0) Using activemodel (3.2.0) Using erubis (2.6.6) Using rack (1.2.4) Using rack-mount (0.6.14) Using rack-test (0.5.7) Installing tzinfo (0.3.30) Using actionpack (3.2.0) Using mime-types (1.16) Using polyglot (0.3.2) Using treetop (1.4.10) Using mail (2.2.19) Using actionmailer (3.2.0) Using arel (2.0.10) Using activerecord (3.2.0) Using activeresource (3.2.0) Using bson (1.4.0) Using bundler (1.0.10) Using mongo (1.3.1) Installing mongoid (2.2.1) Using rdoc (3.9.4) Using thor (0.14.6) Using railties (3.2.0) Using rails (3.2.0) Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.

Setting up Mongoid

Now that the Rails application is set up, let's configure Mongoid. Mongoid is an Object Document Mapper (ODM) tool that maps Ruby objects to MongoDB documents. For now, we shall simply issue the command to configure Mongoid.

Time for action – configuring Mongoid

The Mongoid gem has a Rails generator command to configure Mongoid.

A Rails generator, as the name suggests, sets up files. Generators are used frequently in gems to set up config files, with default settings, g can be used instead of writing generate.

$ rails g mongoid:config

What just happened?

This command created a config/mongoid.yml file that is used to connect to MongoDB. The file would look like the following code snippet:

development: host: localhost database: sodibee_development test: host: localhost database: sodibee_test # set these environment variables on your prod server production: host: <%= ENV['MONGOID_HOST'] %> port: <%= ENV['MONGOID_PORT'] %> username: <%= ENV['MONGOID_USERNAME'] %> password: <%= ENV['MONGOID_PASSWORD'] %> database: <%= ENV['MONGOID_DATABASE'] %> # slaves: # - host: slave1.local # port: 27018 # - host: slave2.local # port: 27019 gautam-2:sodibee gautam$

Notice that there are now three environments to work with—development, test, and production. By default, Rails will pick up the development environment. We do not need to explicitly create the database in MongoDB. The first call to the database will create the database for us.

The previous command also configures the config/application.rb to ensure that ActiveRecord is disabled. ActiveRecord is the default Rails ORM (Object Relational Mapper). As we are using Mongoid, we need to disable ActiveRecord.

Building the models

Now that we have the project set up, it's time we create the models. Each model will autocreate collections in MongoDB. To create a model, all we need to do is create a file in the app/models folder.

Time for action – planning the object schema

Here we shall build the different models and add their relations.

Building the book model

This app/models/book.rb would contain the following code:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

What just happened?

Let's study the previous code snippet in more detail:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

The preceding code includes the Mongoid module to save the documents in MongoDB

include is the Ruby way of adding methods to the Ruby class by including modules. This is called module mixin. We can include as
many modules in a class as we want. Modules make the class richer by adding all the module methods as instance methods.extend is the Ruby way of adding class methods to a Ruby class by including modules in it. All the methods from the modules included become class methods.

Let's have a look at the previous snippet again:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

The previous code configures the name and the type of the fields for a document.

Notice the Ruby 1.9 syntax for a hash. No more hash rockets (=>). Instead in we use the JSON notation directly. Remember it's type:String and not type : String. You must have the key and the colon (:) together.

Let's have a look at the snippet again:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

The previous snippet is a relational document. This means that the document has a reference to the author document.

Let's have a look at the snippet for the second time:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

The previous snippet is a many-to-many relationship between books and categories.

Let's have a look at the snippet a third time:

class Book include Mongoid::Document field :title, type: String field :publisher, type: String field :published_on, type: Date field :votes, type: Array belongs_to :author has_and_belongs_to_many :categories embeds_many :reviews end

The previous snippet is an example of nested or embedded documents. All the review documents will be embedded into the books.

Have a go hero – building the remaining models

We need the Author, Category, and Review models. Here is how we can do this. The app/models/author.rb file contains the following code:

class Author include Mongoid::Document field :name, type: String has_many :books end

The app/models/category.rb file contains the following code:

class Category include Mongoid::Document field :name, type: String has_and_belongs_to_many :books end

Note that the category and books have a many-to-many relationship. The app/models/review.rb file contains the following code:

class Review include Mongoid::Document field :comment, type: String field :username, type: String embedded_in :book end

It's very important that the inverse relation that is, the embedded_in is mentioned in reviews. This tells Mongoid how to store the embedded object. If this is not written, objects will be not get embedded.

Testing from the Rails console

Nothing is ever complete without testing. The Rails community is almost fanatical about integrating tests into the project. We shall learn about testing soon, but for now let's test our code from the Rails console.

Time for action – putting it all together

Now we shall test these models to see if they indeed work as expected. We shall create different objects and their relations. The fun begins! Let's start the Rails console and create our first book object:

$ rails console

The Rails console is a command-line interactive command prompt that loads the Rails environment and the models. It's the best way to check and test if our data models are correct.

Let's create a book now. We can do that using the following code:

> b = "Oliver Twist", publisher: "Dover Publications", published_on: Date.parse("2002-12-30") ) => #

Here, we have populated the basic title, publisher, and published_on fields. Now let's work with the relations. Let's create an author, which can be done as follows:

> Author.create(name: "Charles Dickens") => # => #

Let's create a couple of categories too. This can be done as follows:

> Category.create(name: "Fiction") => # > Category.create(name: "Drama") => #

Now, let's add an author and some categories to our book. This can be done as follows:

> = Author.where(name: "Charles Dickens").first => # > b.categories << Category.first => [] > b.categories << Category.last => [] > b => # > => true

Remember to save the object!

Save returns true if the object was saved successfully, otherwise it returns false. Save will raise an exception if the save was unsuccessful.

What just happened?

We have just created books, authors, and categories.

Hmm... category and books have a many-to-many relationship. So does this mean thatcategory objects should also be updated? Let's check:

> Category.first => # > Category.last => #

Yeah!, we are in good shape.

Let's check what MongoDB has stored. Start the Mongo CLI and see the books.

We can do this as follows:

$ mongo MongoDB shell version: 1.8.3 connecting to: test > use sodibee_development switched to db sodibee_development > db.books.findOne() { "_id" : ObjectId("4e86e45efed0eb0be0000010"), "category_ids" : [ ObjectId("4e86e4cbfed0eb0be0000012"), ObjectId("4e86e4d9fed0eb0be0000013") ], "name" : "Oliver Twist", "publisher" : "Dover Publications", "published_on" : ISODate("2002-12-30T00:00:00Z"), "author_id" : ObjectId("4e86e4b6fed0eb0be0000011") } >

And let's see the categories and author objects too.

> db.categories.findOne() { "_id" : ObjectId("4e86e4cbfed0eb0be0000012"), "book_ids" : [ ObjectId("4e86e45efed0eb0be0000010") ], "name" : "Fiction" } > db.categories.findOne({name: "Drama"}) { "_id" : ObjectId("4e86e4d9fed0eb0be0000013"), "book_ids" : [ ObjectId("4e86e45efed0eb0be0000010") ], "name" : "Drama" } > db.authors.findOne() { "_id" : ObjectId("4e86e4b6fed0eb0be0000011"), "name" : "Charles Dickens" } >

All is well!

Have a go hero – adding more books, authors, and categories

Let's get creative (and funny) by adding the following:

  • Adventures of Banana Man by Willie Slip in the Adventure category
  • World's craziest Moments and Dizzying moments by Mary Go Round in the Travel category
  • Procrastinate and Laziness Personified by Toby D Cided in the Self-help category

Understanding many-to-many relationships in MongoDB

In a SQL database, a many-to-many relationship is done using an intermediate table. For example, the many-to many relationship we have mentioned previously between books and categories, would be achieved in the following manner in a SQL database:

As MongoDB is a schemaless database, we do not need any additional temporary collections. The following is what the book object stores:

> db.books.findOne() { "_id" : ObjectId("4e86e45efed0eb0be0000010"), "category_ids" : [ ObjectId("4e86e4cbfed0eb0be0000012"), ObjectId("4e86e4d9fed0eb0be0000013") ], "name" : "Oliver Twist", "publisher" : "Dover Publications", "published_on" : ISODate("2002-12-30T00:00:00Z"), "author_id" : ObjectId("4e86e4b6fed0eb0be0000011") } >

The following is what the category object stores:

> db.categories.findOne() { "_id" : ObjectId("4e86e4cbfed0eb0be0000012"), "book_ids" : [ ObjectId("4e86e45efed0eb0be0000010") ], "name" : "Fiction" }

No intermediate collections needed!

Using embedded documents

When we built the models, we embedded reviews in the book mode. An example would be ideal to explain this.

Time for action – adding reviews to books

Let's start the Rails console again and add reviews to books. This is done as follows:

> b = Book.where(title: "Oliver Twist").first => # > "Fast paced book!", username: "Gautam") => # > "Excellent literature", username: "Tom") => #

What just happened?

That's it—we just created reviews for books. Let's fetch them and check: => [#, #]

Let's look at the following code to see what was stored in MongoDB:

> db.books.findOne() { "_id" : ObjectId("4e86e45efed0eb0be0000010"), "author_id" : ObjectId("4e86e4b6fed0eb0be0000011"), "category_ids" : [ ObjectId("4e86e4cbfed0eb0be0000012"), ObjectId("4e86e4d9fed0eb0be0000013") ], "name" : "Oliver Twist", "published_on" : ISODate("2002-12-30T00:00:00Z"), "publisher" : "Dover Publications", "reviews" : [ { "comment" : "Fast paced book!", "username" : "Gautam", "_id" : ObjectId("4e86f68bfed0eb0be0000018") }, { "comment" : "Excellent literature", "username" : "Tom", "_id" : ObjectId("4e86f6fffed0eb0be000001a") } ] } >

Notice that the reviews are embedded inside the book object. Now when we fetch the book object, we will automatically get all the reviews too.

Choosing whether to embed or not to embed

Suppose we want to prepare orders for a book. The book can be leased or purchased. If we want to maintain an order history in terms of lease and purchase, how do we build the Lease, Purchase, and Order models?

Time for action – embedding Lease and Purchase models

We have three model files Order, Lease, and Purchase as follows:

# app/models/order class Order include Mongoid::Document field :created_at, type: DateTime field :type, type: String # Lease, Purchase belongs_to :book embeds_one :lease embeds_one :purchase end

Now, depending on the type field, we can determine which embedded object to pick up, the lease, or the purchase. You can design the Lease and Purchase models as shown in the following code:

# app/models/lease.rb class Lease include Mongoid::Document field :from, type: DateTime field :till, type: DateTime embedded_in :order end # app/models/purchase.rb class Purchase include Mongoid::Document field :quantity, type: Integer field :price, type: Float embedded_in :order end

Working with Map/Reduce

To see an example of how Map/Reduce works, let's now add votes to books. The following shows how we can add votes:

{ "username" : "Dick", "rating" : 5 }

Rating could be on a scale of 1 to 10, with 10 being the best. Every user can rate a book. Our aim is to collect the total rating by all users. We shall save this information as a hash in the votes array in the book object. This should not be confused with an embedded object (as it does not have an object ID).

We have not seen the MongoDB data types such as ObjectId and ISODate. All usual data types such as integer, float, string, hash, and array are supported.

The following is how we save this information as a hash in the votes array in the book object:

> db.books.findOne() { "_id" : ObjectId("4e86e45efed0eb0be0000010"), "author_id" : ObjectId("4e86e4b6fed0eb0be0000011"), "category_ids" : [ ObjectId("4e86e4cbfed0eb0be0000012"), ObjectId("4e86e4d9fed0eb0be0000013") ], "name" : "Oliver Twist", "published_on" : ISODate("2002-12-30T00:00:00Z"), "publisher" : "Dover Publications", "reviews" : [ { "comment" : "Fast paced book!", "username" : "Gautam", "_id" : ObjectId("4e86f68bfed0eb0be0000018") }, { "comment" : "Excellent literature", "username" : "Tom", "_id" : ObjectId("4e86f6fffed0eb0be000001a") } "votes" : [ { "username" : "Gautam", "rating" : 3 } ] }

Before we see the example of Map/Reduce, it would be fun to add more books and votes, so that the Map/Reduce results make more sense. This is done as shown next:

> Book.create(name: "Great Expectations", author: Author.first) => # > Book.create(name: "A tale of two cities", author: Author.first) => #

Now let's add votes for all three books.

First, for Oliver Twist (for example, one vote by Gautam)

a = Book.first => # > b.votes = [] => [] > b.votes << {username: "Gautam", rating: 3} => [{:username=>"Gautam", :rating=>3}] > => true

Note that we first set b.votes = [] ,that is, an empty array. This is because MongoDB does not add the fields to the database until they are populated. So, by default b.votes would return nil. Hence it's important to initialize it the first time.

Now, for Great Expectations (for example, three votes, one each by Gautam, Tom, and Dick)

> b = Book.where(name: "Great Expectations").first => # > b.votes = [] => [] > b.votes << {username: "Gautam", rating: 9} => [{:username=>"Gautam", :rating=>9}] > b.votes << {username: "Tom", rating: 3} => [{:username=>"Gautam", :rating=>9}, {:username=>"Tom", :rating=>3}] > b.votes << {username: "Dick", rating: 7} => [{:username=>"Gautam", :rating=>9}, {:username=>"Tom", :rating=>3}, {:username=>"Dick", :rating=>7}] > => true

Finally, for The Tale of Two Cites (for example, two votes, one each by Gautam and Dick)

> c = Book.where(name: /cities/).first => # > c.votes = [] => [] > c.votes << {username: "Gautam", rating: 9} => [{:username=>"Gautam", :rating=>9}] > c.votes << {username: "Dick", rating: 5} => [{:username=>"Gautam", :rating=>9}, {:username=>"Dick", :rating=>5}] > => true

If we want to collect all the votes and add up the rating for each user, it can be a pretty cumbersome task to iterate over all of these objects. This is the where Map/Reduce helps us.

One alternative to Map/Reduce in this particular example would be to capture the vote count per book by incrementing a counter while inserting votes and reviews itself. However, we shall use Map/Reduce here so that we understand how it works.

Time for action – writing the map function to calculate ratings

This is how we can write the map function. As we have seen earlier, this function will emit information, in our case, the key is the username and the value is the rating:

function() { this.votes.forEach(function(x) { emit(x.username, {rating: x.rating}); }); }

What just happened?

This is a JavaScript function. MongoDB understands and processes all JS functions. Every time emit() is called, some data is emitted for the reduce function to process. In the preceding code this represents the collection object.

What we want to do is emit all the ratings for each element in the votes array for every book. The emit() takes the key and value as parameters. So, we are emitting the users votes for the reduce function to process. It's also important to remember the data structure we are emitting as the value. It should be consistent for all objects. In our case {rating: x.rating}.

Time for action – writing the reduce function to process the emitted results

Now let's write the reduce function. This takes a key and an array of values, shown as follows:

function(key, values) { var result = {rating: 0}; values.forEach(function(value) { result.rating += value.rating; }); return result; }

What just happened?

The reduce function is the one which processes the values that were emitted from the map function.

Remember that the values parameter is always an array. The map function could emit results for the same key multiple times, so we should be sure to process the value as an array and accumulate results. The return structure should be the same as what was emitted.

MongoDB supports Map/Reduce and will invoke Map/Reduce functions in parallel. This gives it power over standard SQL databases. The closest a SQL database comes to this is when we use a GROUP BY query. It depends on the indexes and the query fired that can get us similar results like Map/Reduce.

Using Map/Reduce together

As MongoDB requires JavaScript functions, the trick here is to pass the JavaScript functions to the MongoDB engine via a string on the Rails console. So, we create two strings for the map and reduce functions.

Time for action – working with Map/Reduce using Ruby

We shall now create two strings in Ruby for these functions:

> map = %q{function() { this.votes.forEach(function(x) { emit(x.username, {rating: x.rating}); }); } } > reduce = %q{function(key, values) { var result = {rating: 0}; values.forEach(function(value) { result.rating += value.rating; }); return result; } }

%q is an efficient, clean, and optimized way of writing multiline strings in Ruby!

Remember that we are now in the MongoDB realm, so we should not work on Ruby objects but only on the MongoDB collection. So, we call map_reduce on the book collection, as follows:

> results = Book.collection.map_reduce(map, reduce, out: "vr") => #

The output you saw previously is the MongoDB collection Map/Reduce result. Let's fetch the full results now. The following command does it for us:< /p>

> results.find().to_a => [{"_id"=>"Dick", "value"=>{"rating"=>12.0}}, {"_id"=>"Gautam", "value"=>{"rating"=>21.0}}, {"_id"=>"Tom", "value"=>{"rating"=>3.0}}]

What just happened?

Voila! This shows that we have the following result:

  • Dick has 12 ratings
  • Gautam has 21 ratings
  • Tom has 3 ratings

Tally these ratings manually with the preceding code and verify.

What would you have to do if you did not have Map/Reduce? Iterate over all book objects and collect the votes array. Then keep a temporary hash of usernames and keep aggregating the ratings. Lots of work indeed!

Don't always jump into using Map/Reduce. Sometimes it's just easier to query properly. Suppose, we want to find all the books that have votes or reviews for them, what do we do?

  • Do we iterate every book object and check the length of the votes array or the reviews array?
  • Do we run Map/Reduce for this?
  • ? Is there a direct query for this?

We can directly fire a query from the Rails console, as follows:

irb> Book.any_of({:reviews.exists => true}, {:votes.exists => true})

If we want to search directly on the mongo console, we have to execute the following command:

mongo> db.books.find({"$or":[{reviews:{"$exists" : true}}, {votes : {"$exists": true}}]})

Remember, we should use Map/Reduce only when we have to process data and return results (for example, when it's mostly statistical data). For most cases, there would be a query (or multiple queries) that would get us our results.


Here we really jumped into Ruby and MongoDB, didn't we? We saw how to create objects in MongoDB directly and then via Ruby using Mongoid. We saw how to set up a Rails project, configure Mongoid, and build models. We even went the distance to see how Map/Reduce would work in MongoDB.

We saw a lot of new things too, which require explanation. For example, the various data types that are supported in MongoDB, such as ObjectId, ISODate.

You've been reading an excerpt of:

Ruby and MongoDB Web Development Beginner's Guide

Explore Title