Reader small image

You're reading from  MongoDB Fundamentals

Product typeBook
Published inDec 2020
PublisherPackt
ISBN-139781839210648
Edition1st Edition
Tools
Concepts
Right arrow
Authors (4):
Amit Phaltankar
Amit Phaltankar
author image
Amit Phaltankar

Amit Phaltankar is a software developer and a blogger experienced in building lightweight and efficient software components. He specializes in wiring web-based applications and handling large-scale data sets using traditional SQL, NoSQL, and big data technologies. He is experienced in many technology stacks and loves learning and adapting to newer technology trends. Amit is passionate about improving his skill set and loves guiding and grooming his peers and contributing to blogs. He is also an author of MongoDB Fundamentals.
Read more about Amit Phaltankar

Juned Ahsan
Juned Ahsan
author image
Juned Ahsan

Juned Ahsan is a software professional with more than 14 years of experience. He has built software products and services for companies and clients such as Cisco, Nuamedia, IBM, Nokia, Telstra, Optus, Pizzahut, AT&T, Hughes, Altran, and others. Juned has a vast experience in building software products and architecting platforms of different sizes from scratch. He loves to help and mentor others and is a top 1% contributor on StackOverflow. He is passionate about cognitive CX, cloud computing, artificial intelligence, and NoSQL databases.
Read more about Juned Ahsan

Michael Harrison
Michael Harrison
author image
Michael Harrison

Michael Harrison started his career at the Australian telecommunications leader Telstra. He worked across their networks, big data, and automation teams. He is now a lead software developer and the founding member of Southbank Software, a Melbourne based startup that builds tools for the next generation of database technologies.
Read more about Michael Harrison

Liviu Nedov
Liviu Nedov
author image
Liviu Nedov

Liviu Nedov is a senior consultant with more than 20 years of experience in database technologies. He has provided professional and consulting services to customers in Australia and Europe. Throughout his career, he has designed and implemented large enterprise projects for customers like Wotif Group, Xstrata Copper/Glencore, and the University of Newcastle and Energy, Queensland. He is currently working at Data Intensity, which is the largest multi-cloud service provider for applications, databases, and business intelligence. In recent years, he is actively involved in MongoDB NoSQL database projects, database migrations, and cloud DBaaS (Database as a Service) projects.
Read more about Liviu Nedov

View More author details
Right arrow

7. Data Aggregation

Overview

This chapter introduces you to the concept of aggregation and its implementation in MongoDB. You will learn how to identify the parameters and structure of the aggregate command, combine and manipulate data using the primary aggregation stages, work with large datasets using advanced aggregation stages, and optimize and configure your aggregation to get the best performance out of your queries.

Introduction

In the previous chapters, we learned the fundamentals of interacting with MongoDB. With these basic operations (insert, update, and delete), we can now begin exploring and manipulating our data as we would with any other database. We also observed how, by fully leveraging the find command options, we can use operators to answer more specific questions about our data. We can also sort, limit, skip, and project on our query to create useful result sets.

In more straightforward situations, these result sets may be enough to answer your desired business question or satisfy a use case. However, more complex problems require more complex queries to answer. Solving such problems with just the find command would be highly challenging and would likely require multiple queries or some processing on the client side to organize or link the data.

The basic limitation is where you have data contained in two separate collections. To find the correct data, you would have to run...

aggregate Is the New find

The aggregate command in MongoDB is similar to the find command. You can provide the criteria for your query in the form of JSON documents, and it outputs a cursor containing the search result. Sounds simple, right? That's because it is. Although aggregations can become very large and complex, at their core, they are relatively simple.

The key element in aggregation is called the pipeline. We will cover it in detail shortly, but at a high level, a pipeline is a series of instructions, where the input to each instruction is the output of the previous one. Simply put, aggregation is a method for taking a collection and, in a procedural way, filtering, transforming, and joining data from other collections to create new, meaningful datasets.

Aggregate Syntax

The aggregate command operates on a collection like the other Create, Read, Update, Delete (CRUD) commands, like so:

use sample_mflix;
var pipeline = [] // The pipeline is an array of stages...

Manipulating Data

Most of our activities and examples can be reduced to the following: there is a document or documents in a collection that should return some or all the documents in an easy-to-digest format. At their core, the find command and aggregation pipeline are just about identifying and fetching the correct document. However, the capability of the aggregation pipeline is much more robust and broader than that of the find command.

Using some of the more advanced stages and techniques in the pipeline allows us to transform our data, derive new data, and generate insights across a broader scope. This more extensive implementation of the aggregate command is more common than merely rewriting a find command as a pipeline. If you want to answer complex questions or extract the highest possible value from your data, you'll need to know how to achieve the aggregation part of your aggregation pipelines.

After all, we haven't even begun to aggregate any data yet. In...

Working with Large Datasets

So far, we've been working with a relatively small number of documents. The movies collection has roughly 23,500 documents in it. This may be a considerable number for a human to work with, but for large production systems, you may be working on a scale of millions instead of thousands. So far, we have also been focusing strictly on a single collection at a time, but what if the scope of our aggregation grows to include multiple collections?

In the first topic, we briefly discussed how you could use the projection stage while developing your pipelines to create more readable output as well as simplify your results for debugging. However, we didn't cover how you can improve performance when working on much, much larger datasets, both while developing and for your final production-ready queries. In this topic, we'll discuss a few of the aggregation stages that you need to master when working with large, multi-collection datasets.

Sampling...

Getting the Most from Your Aggregations

In the last three topics, we have learned about the structure of aggregation as well as the key stages required to build up complicated queries. We can search large multi-collection datasets with given criteria, manipulate that data to create new insights, and output our results into a new or existing collection.

These fundamentals will allow you to solve most of the problems you will encounter in an aggregation pipeline. However, there are several other stages and patterns for getting the most out of your aggregations. We won't cover them all in this book, but in this topic, we'll discuss a few of the odds and ends that will help you fine-tune your pipelines as well as some other odds and ends that we simply haven't covered so far. We'll be looking at aggregation options using Explain to analyze your aggregation.

Tuning Your Pipelines

In an earlier topic, we timed the execution of our pipeline by outputting the time...

Summary

In this chapter, we have covered all the essential components that you need to understand, write, comprehend, and improve MongoDB aggregations. This new functionality will help you to answer more complex and difficult questions about your data. By creating multi-stage pipelines that join multiple collections, you can increase the scope of your queries to the entire database instead of a single collection. We also looked at how to write the results into a new collection to enable further exploration or manipulation of the data.

In the final section, we covered the importance of ensuring that your pipelines are written with scalability, readability, and performance in mind. By focusing on these aspects, your pipelines will continue to deliver value in the future and can act as a basis for further aggregations.

However, what we have covered here is just the beginning of what you can accomplish with the aggregation feature. It is critical that you keep exploring, experimenting...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
MongoDB Fundamentals
Published in: Dec 2020Publisher: PacktISBN-13: 9781839210648
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (4)

author image
Amit Phaltankar

Amit Phaltankar is a software developer and a blogger experienced in building lightweight and efficient software components. He specializes in wiring web-based applications and handling large-scale data sets using traditional SQL, NoSQL, and big data technologies. He is experienced in many technology stacks and loves learning and adapting to newer technology trends. Amit is passionate about improving his skill set and loves guiding and grooming his peers and contributing to blogs. He is also an author of MongoDB Fundamentals.
Read more about Amit Phaltankar

author image
Juned Ahsan

Juned Ahsan is a software professional with more than 14 years of experience. He has built software products and services for companies and clients such as Cisco, Nuamedia, IBM, Nokia, Telstra, Optus, Pizzahut, AT&T, Hughes, Altran, and others. Juned has a vast experience in building software products and architecting platforms of different sizes from scratch. He loves to help and mentor others and is a top 1% contributor on StackOverflow. He is passionate about cognitive CX, cloud computing, artificial intelligence, and NoSQL databases.
Read more about Juned Ahsan

author image
Michael Harrison

Michael Harrison started his career at the Australian telecommunications leader Telstra. He worked across their networks, big data, and automation teams. He is now a lead software developer and the founding member of Southbank Software, a Melbourne based startup that builds tools for the next generation of database technologies.
Read more about Michael Harrison

author image
Liviu Nedov

Liviu Nedov is a senior consultant with more than 20 years of experience in database technologies. He has provided professional and consulting services to customers in Australia and Europe. Throughout his career, he has designed and implemented large enterprise projects for customers like Wotif Group, Xstrata Copper/Glencore, and the University of Newcastle and Energy, Queensland. He is currently working at Data Intensity, which is the largest multi-cloud service provider for applications, databases, and business intelligence. In recent years, he is actively involved in MongoDB NoSQL database projects, database migrations, and cloud DBaaS (Database as a Service) projects.
Read more about Liviu Nedov