The focus of this chapter will be on a feature that's unique to MongoDB called the Aggregation Framework (https://www.mongodb.com/presentations/aggregation-framework-0?jmp=docs&_ga=2.166048830.1278448947.1531711178-137143613.1528093145), which is vital when forming complex queries. This feature allows database developers, or DBAs, to return subsets of data that are grouped, sorted, and filtered. We will start our discussion by forming a simple aggregation using single-purpose methods (https://docs.mongodb.com/manual/aggregation/#single-purpose-aggregation-operations). After that, we will get into the more complex topics of forming an aggregation pipeline (https://docs.mongodb.com/manual/aggregation/#aggregation-pipeline) and making use of the map-reduce (https://docs.mongodb.com/manual/aggregation/#map-reduce) function. We will...
You're reading from MongoDB 4 Quick Start Guide
An overview of aggregation
Before diving into the specifics, it's important to lay some groundwork. The first question which comes to mind is, What is aggregation? That question would be logically followed by, Why use it?
What is aggregation?
The main purpose of aggregation operations is to refine query results by grouping together field values from multiple documents, and then performing one or more transformations. Aggregation in MongoDB can be as simple as presenting the results of a query into a set of one or more fields, or as complex as performing a multistage query, breaking the output into buckets, and performing operations on each result set. A more advanced usage would be to manipulate complex fields within...
Using single-purpose aggregation
Single-purpose aggregation operators are available so that you can operate on a collection or a cursor. The following table summarizes the operators which can operate on a collection:
db.collection.count() | Wraps the $count aggregation operator to produce the number of documents in a collection. |
db.collection.distinct() |
Wrapper for the distinct command (https://docs.mongodb.com/manual/reference/command/distinct/#distinct). Produces distinct values for document fields across a collection. |
The following table summarizes single-purpose aggregation operations which can be performed on a cursor (such as the iteration returned after executing db.collection.find()):
cursor.count() | Equivalent to db.collection.count() (see prior table) |
cursor.limit() | Limits the number of documents in the final result |
cursor.sort() | Returns the results in... |
Using the aggregation pipeline
The MongoDB aggregation pipeline framework consists of the aggregate() collection method, and a sequence of operations referred to as stages (https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/#aggregation-pipeline-stages). This sequence is referred to as a pipeline.
For illustration, let's assume that there's a collection called purchases, where each purchase has an amount of information as well as embedded customer and product objects:
We wish to generate a report on the total sales for each customer from Australia. A simple db.collection.find() command will not suffice as it is incapable of grouping the customers. The problem is further compounded by the fact that country information is embedded in the customer object within each purchase. In order to generate this report, we will first need to address stages.
Using map-reduce
The db.collection.mapReduce() method delivers similar results to that of the aggregation pipeline. The main difference is that rather than performing operations in stages, map-reduce uses JavaScript functions to produce results. This gives you access to the full programming power that's available in JavaScript. Because it operates outside of the aggregation framework, however, performance is generally worse. If there is a high degree of complexity in your query, it might be worth considering using this feature. Otherwise, the MongoDB documentation recommends using the aggregation pipeline framework.
To demonstrate map-reduce functionality, we will use the same purchases collection that we described previously. Here is the general structure of a mapReduce() command:
When we run this method, we get operational metadata. Unlike the aggregate() method, the output...
Using the MongoDB Compass aggregation pipeline builder
The MongoDB Compass tool, introduced in Chapter 1, Introducing MongoDB, has an extremely useful feature which assists you in developing complex aggregation pipeline queries. To use Compass to build an aggregation pipeline query, you first need to start Compass and connect to MongoDB. You will then need to select the database and collection upon which you wish to perform an aggregation.
In the following example, we select the sweetscomplete database and the purchases collection. From the horizontal menu, we then select Aggregations. Here is how the screen appears so far:
We then turn our attention to the dialog box in the bottom left. Clicking on Select, we add our first stage, $match. You can then start typing the start of the desired expression. The following tables summarizes possible initial actions:
If You Type .... |
Summary
In this chapter, you learned how to conduct complex queries using the aggregation pipeline framework. You learned about stages, expression operators, and how to accumulate information such as sum, average, and so on. One of the most important aspects of the aggregation pipeline framework that you learned about in this chapter is the ability to access embedded objects or arrays.
You also learned about single-purpose aggregation (for example, sort and limit), as well as how to use map-reduce. You learned that, although map-reduce gives you flexibility in that JavaScript functions can be used, the aggregation framework is preferred as it uses native MongoDB methods and offers better performance.
In the next chapter, you will learn about how to maintain MongoDB performance.