Reader small image

You're reading from  Practical MongoDB Aggregations

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781835080641
Edition1st Edition
Tools
Right arrow
Author (1)
Paul Done
Paul Done
author image
Paul Done

Paul Done is a Field CTO at MongoDB Inc., having been a Solutions Architect for the past decade at MongoDB. He has previously held roles in various software disciplines, including engineering, consulting, and pre-sales, at companies like Oracle, Novell, and BEA Systems. Paul specializes in databases and middleware, focusing on resiliency, scalability, transactions, event processing, and applying evolvable data model approaches. He spent most of the early 2000s building Java EE (J2EE) transactional systems on WebLogic, integrated with relational databases like Oracle RAC and messaging systems like MQ Series.
Read more about Paul Done

Right arrow

Array Manipulation Examples

Storing and processing documents with fields that contain arrays of sub-documents is an essential aspect of MongoDB. The aggregation framework offers rich operators to process array contents, which is important to avoid performance penalties from needlessly unwinding and regrouping arrays. The aggregation pipeline's ability to handle nested operations makes even the most challenging array manipulation tasks feasible. However, these solutions can sometimes appear complicated. In this chapter, you will learn how to break down array manipulation problems into manageable pieces, streamlining your assembly of solutions.

This chapter will cover the following topics:

  • Generating summaries of array contents, such as highs and lows
  • Sorting and changing the orientation of the contents of arrays
  • Calculating averages and percentiles for arrays
  • Grouping elements of arrays by common keys
  • Joining elements from different array fields together...

Summarizing arrays for first, last, minimum, maximum, and average values

Financial time-series data, such as the values of currency pairs (e.g., euro-to-US dollar), fluctuates due to changing market factors. Financial institutions need rapid analytics to identify highs, lows, and averages. Even basic immediate insight such as these enable institutions to make fast, informed trading decisions, capitalizing on opportunities and avoiding potential risks. In this example, you will discover how to generate summary data for fluctuating currency-pair values.

Note

This example requires MongoDB version 4.4 or above. This is because you'll be using the $first and $last array operators, which were first introduced in version 4.4.

Scenario

You want to generate daily summaries for the exchange rates of foreign currency pairs. You need to analyze an array of persisted hourly rates for each currency pair for each day. You will output a daily summary of the open (first), close (last...

Pivoting array items by a key

In some scenarios, an array field within documents contains a sequence of elements where some of the array's elements logically relate to each other. In this example, you will explore how to construct a pipeline that restructures arrays to represent these inherent groupings.

Scenario

You have a set of geographically dispersed weather station zones where each zone has multiple sensor devices collecting readings such as temperature, humidity, and pressure. Each weather station assembles readings from its devices and once per hour transmits this set of measurements to a central database to store. The set of persisted readings is randomly ordered measurements for different devices in the zone. You need to take the mix of readings and group them by device, so the weather data is easier to consume by downstream dashboards and applications.

Note

This example's pipeline relies on some of the more difficult-to-understand array operator expressions...

Array sorting and percentiles

The need to sort arrays and calculate summary data, such as the 99th percentile, is common. Recent versions of the MongoDB aggregation framework offer enhanced capabilities in this area. This example will guide you through implementing this using both an earlier and a more recent version of MongoDB.

Scenario

You've conducted performance testing of an application with the results of each test run captured in a database. Each record contains a set of response times for the test run. You want to analyze the data from multiple runs to identify the slowest ones. You calculate the median (50th percentile) and 90th percentile response times for each test run and only keep results where the 90th percentile response time is greater than 100 milliseconds.

Note

For MongoDB version 5.0 and earlier, the example will use a macro function for inline sorting of arrays. Adopting this approach avoids the need for you to use the combination of the $unwind...

Array element grouping

Grouping data from documents in MongoDB is one of the most common tasks you will use the aggregation framework for. However, sometimes, you only need to group elements in an array field within each document in isolation rather than grouping data from different documents. Let's find out how we can do this efficiently in this example.

Scenario

You want to provide a report for your online game showing the total coin rewards each gaming user has accumulated. The challenge is that the source collection captures each time the game awards a user with a type of coin in a growing array field containing many elements. However, for each gamer, you want to show the total for each coin type in an array instead. An extra complication exists in that you cannot know ahead of time what all the possible coin types can be when developing the solution. For example, the game could introduce different coin types in the future (e.g., tungsten coins).

Populating the sample...

Array fields joining

Joining documents between two collections is an aggregation topic you explored extensively in Chapter 7, Joining Data Examples. However, sometimes, you only need to efficiently join two fields within the same document where at least one of the fields is an array. Let's look at how you can achieve this.

Scenario

You are developing a new dating website using a database to hold the profiles of all registered users. For each user profile, you will persist a set of the user's specified hobbies, each with a description of how the user says they conduct their pursuit. Each user's profile also captures what they prefer to do depending on their mood (e.g., happy, sad, chilling, etc.). When you show the user profiles on the website to a person searching for a date, you want to describe how each candidate user conducts their hobbies for each mood to help the person spot their ideal match.

Populating the sample data

First, drop any old versions of...

Comparison of two arrays

Frequently, when analyzing data and computing summary values for different time periods, you also want to compare what has changed in the data between those periods. Such last-mile comparisons can be left to the client application to perform using the programming language available there. However, as you will discover in this example, an aggregation can invariably perform this action for you.

Note

For this example, you require MongoDB version 4.4 or above. This is because you'll be using the $first array operator introduced in MongoDB version 4.4.

Scenario

You are an IT administrator managing virtual machine deployments in a data center to host a critical business application in a few environments (e.g., production and QA). A database collection captured the configuration state of each virtual machine across two days. You want to generate a report showing what configuration changes people made to the virtual machines (if any) between these...

Jagged array condensing

Some data is best represented as an array of arrays, where each sub-array has an uneven length. These are called jagged arrays. Sometimes, you will need to condense the array of arrays into a single array of averages instead.

Scenario

You are developing a healthcare IT system to track the results of clinical trials for different types of emerging medical drugs, where each clinical trial will involve multiple patients. In most cases, doctors will administer the medication to each patient over several sessions, measuring its effectiveness soon after each administered session to understand each patient's response. You want to capture the results of each clinical trial as a record in a database, including an array of participating patients. Each patient array element will contain an array of their administered drug sessions, showing the resulting effectiveness of each session. Essentially, you need to store a jagged array because some patients in the...

Summary

In this chapter, you explored examples of dealing with complex array manipulation challenges by breaking down array processing into manageable pieces for various use cases.

In the next chapter, you will learn how to build aggregation pipelines to perform full-text searches on the text fields of documents.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical MongoDB Aggregations
Published in: Sep 2023Publisher: PacktISBN-13: 9781835080641
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Paul Done

Paul Done is a Field CTO at MongoDB Inc., having been a Solutions Architect for the past decade at MongoDB. He has previously held roles in various software disciplines, including engineering, consulting, and pre-sales, at companies like Oracle, Novell, and BEA Systems. Paul specializes in databases and middleware, focusing on resiliency, scalability, transactions, event processing, and applying evolvable data model approaches. He spent most of the early 2000s building Java EE (J2EE) transactional systems on WebLogic, integrated with relational databases like Oracle RAC and messaging systems like MQ Series.
Read more about Paul Done