You're reading from MongoDB Fundamentals

Product typeBook

Published inDec 2020

PublisherPackt

ISBN-139781839210648

Edition1st Edition

Tools

MongoDB

Concepts

Databases

Authors (4):

Amit Phaltankar

Juned Ahsan

Michael Harrison

Liviu Nedov

View More author details

7. Data Aggregation

Overview

This chapter introduces you to the concept of aggregation and its implementation in MongoDB. You will learn how to identify the parameters and structure of the aggregate command, combine and manipulate data using the primary aggregation stages, work with large datasets using advanced aggregation stages, and optimize and configure your aggregation to get the best performance out of your queries.

Introduction

In the previous chapters, we learned the fundamentals of interacting with MongoDB. With these basic operations (insert, update, and delete), we can now begin exploring and manipulating our data as we would with any other database. We also observed how, by fully leveraging the find command options, we can use operators to answer more specific questions about our data. We can also sort, limit, skip, and project on our query to create useful result sets.

In more straightforward situations, these result sets may be enough to answer your desired business question or satisfy a use case. However, more complex problems require more complex queries to answer. Solving such problems with just the find command would be highly challenging and would likely require multiple queries or some processing on the client side to organize or link the data.

The basic limitation is where you have data contained in two separate collections. To find the correct data, you would have to run...

aggregate Is the New find

The aggregate command in MongoDB is similar to the find command. You can provide the criteria for your query in the form of JSON documents, and it outputs a cursor containing the search result. Sounds simple, right? That's because it is. Although aggregations can become very large and complex, at their core, they are relatively simple.

The key element in aggregation is called the pipeline. We will cover it in detail shortly, but at a high level, a pipeline is a series of instructions, where the input to each instruction is the output of the previous one. Simply put, aggregation is a method for taking a collection and, in a procedural way, filtering, transforming, and joining data from other collections to create new, meaningful datasets.

Aggregate Syntax

The aggregate command operates on a collection like the other Create, Read, Update, Delete (CRUD) commands, like so:

use sample_mflix;
var pipeline = [] // The pipeline is an array of stages...

Manipulating Data

Most of our activities and examples can be reduced to the following: there is a document or documents in a collection that should return some or all the documents in an easy-to-digest format. At their core, the find command and aggregation pipeline are just about identifying and fetching the correct document. However, the capability of the aggregation pipeline is much more robust and broader than that of the find command.

Using some of the more advanced stages and techniques in the pipeline allows us to transform our data, derive new data, and generate insights across a broader scope. This more extensive implementation of the aggregate command is more common than merely rewriting a find command as a pipeline. If you want to answer complex questions or extract the highest possible value from your data, you'll need to know how to achieve the aggregation part of your aggregation pipelines.

After all, we haven't even begun to aggregate any data yet. In...

Working with Large Datasets

So far, we've been working with a relatively small number of documents. The movies collection has roughly 23,500 documents in it. This may be a considerable number for a human to work with, but for large production systems, you may be working on a scale of millions instead of thousands. So far, we have also been focusing strictly on a single collection at a time, but what if the scope of our aggregation grows to include multiple collections?

In the first topic, we briefly discussed how you could use the projection stage while developing your pipelines to create more readable output as well as simplify your results for debugging. However, we didn't cover how you can improve performance when working on much, much larger datasets, both while developing and for your final production-ready queries. In this topic, we'll discuss a few of the aggregation stages that you need to master when working with large, multi-collection datasets.

Sampling...

Getting the Most from Your Aggregations

In the last three topics, we have learned about the structure of aggregation as well as the key stages required to build up complicated queries. We can search large multi-collection datasets with given criteria, manipulate that data to create new insights, and output our results into a new or existing collection.

These fundamentals will allow you to solve most of the problems you will encounter in an aggregation pipeline. However, there are several other stages and patterns for getting the most out of your aggregations. We won't cover them all in this book, but in this topic, we'll discuss a few of the odds and ends that will help you fine-tune your pipelines as well as some other odds and ends that we simply haven't covered so far. We'll be looking at aggregation options using Explain to analyze your aggregation.

Tuning Your Pipelines

In an earlier topic, we timed the execution of our pipeline by outputting the time...

Summary

In this chapter, we have covered all the essential components that you need to understand, write, comprehend, and improve MongoDB aggregations. This new functionality will help you to answer more complex and difficult questions about your data. By creating multi-stage pipelines that join multiple collections, you can increase the scope of your queries to the entire database instead of a single collection. We also looked at how to write the results into a new collection to enable further exploration or manipulation of the data.

In the final section, we covered the importance of ensuring that your pipelines are written with scalability, readability, and performance in mind. By focusing on these aspects, your pipelines will continue to deliver value in the future and can act as a basis for further aggregations.

However, what we have covered here is just the beginning of what you can accomplish with the aggregation feature. It is critical that you keep exploring, experimenting...

The rest of the chapter is locked

You have been reading a chapter from

MongoDB Fundamentals

Published in: Dec 2020Publisher: PacktISBN-13: 9781839210648

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (4)

Amit Phaltankar

Amit Phaltankar is a software developer and a blogger experienced in building lightweight and efficient software components. He specializes in wiring web-based applications and handling large-scale data sets using traditional SQL, NoSQL, and big data technologies. He is experienced in many technology stacks and loves learning and adapting to newer technology trends. Amit is passionate about improving his skill set and loves guiding and grooming his peers and contributing to blogs. He is also an author of MongoDB Fundamentals.
Read more about Amit Phaltankar

Juned Ahsan

Juned Ahsan is a software professional with more than 14 years of experience. He has built software products and services for companies and clients such as Cisco, Nuamedia, IBM, Nokia, Telstra, Optus, Pizzahut, AT&T, Hughes, Altran, and others. Juned has a vast experience in building software products and architecting platforms of different sizes from scratch. He loves to help and mentor others and is a top 1% contributor on StackOverflow. He is passionate about cognitive CX, cloud computing, artificial intelligence, and NoSQL databases.
Read more about Juned Ahsan

Michael Harrison

Michael Harrison started his career at the Australian telecommunications leader Telstra. He worked across their networks, big data, and automation teams. He is now a lead software developer and the founding member of Southbank Software, a Melbourne based startup that builds tools for the next generation of database technologies.
Read more about Michael Harrison

Liviu Nedov

Liviu Nedov is a senior consultant with more than 20 years of experience in database technologies. He has provided professional and consulting services to customers in Australia and Europe. Throughout his career, he has designed and implemented large enterprise projects for customers like Wotif Group, Xstrata Copper/Glencore, and the University of Newcastle and Energy, Queensland. He is currently working at Data Intensity, which is the largest multi-cloud service provider for applications, databases, and business intelligence. In recent years, he is actively involved in MongoDB NoSQL database projects, database migrations, and cloud DBaaS (Database as a Service) projects.
Read more about Liviu Nedov

Other recommended products

Related to this chapter

MongoDB Administrator's Guide

MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations. Packed with many features that have become essential for many different types of software professional and incredibly easy to use, this cookbook contains more than 100 recipes to address the everyday challenges of working with MongoDB, as well as guidance on effective techniques for efficient querying and administration in MongoDB. This book will help you will understand the indexing aspects of MongoDB. It also includes practical recipes on how you can optimize your database query performance, perform diagnostics, and query debugging. By the end of this book, you will have all the information you need to implement a high-performance MongoDB solution.

BookOct 2017226 pages

MongoDB 4 Quick Start Guide

MongoDB has grown to become the de facto NoSQL database with millions of users, from small start-ups to Fortune 500 companies. It can solve problems that are considered difficult, if not impossible, for aging RDBMS technologies. Written for version 4 of MongoDB, this book is the easiest way to get started with MongoDB.

BookSep 2018192 pages

Mastering MongoDB 4.x

This book will help you build expert proficiency in developing large-scale applications using MongoDB 4.x. You will master CRUD operations and perform tasks such as indexing, aggregation, monitoring, sharding, cluster management, and administration. You take building and administering scalable MongoDB applications to the next level.

BookMar 2019394 pages

Mastering MongoDB 3.x

MongoDB has gone from being a niche database to the king of NoSQL databases in a short time and this is no small feat. Mastering MongoDB will help you gain proficiency in developing apps using MongoDB. This book covers a range of topics such as CRUD operations, Indexing, aggregation, monitoring, sharding, cluster operations, and more. If you are a developer, architect, or DBA using MongoDB and want to be more productive when designing and administering MongoDB-backed applications, then this book can take you there in the minimum time.

BookNov 2017342 pages

Learn MongoDB 4.x

This book covers the latest release of MongoDB. You'll learn how to master various tasks related to the development and administration of a MongoDB database, along with best practices to optimize the workflow. The book also covers multiple financial and practical use cases that will enable you to use MongoDB for commercial data storage.

BookSep 2020610 pages

Seven NoSQL Databases in a Week

This book will help you understand the fundamentals of seven of the most popular NoSQL databases. You will see how the functionalities of each of them differ, while still giving you the same result - a database solution with speed, high performance, and accuracy.

BookMar 2018308 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages