Reader small image

You're reading from  Practical MongoDB Aggregations

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781835884362
Edition1st Edition
Tools
Right arrow
Author (1)
Paul Done
Paul Done
author image
Paul Done

Paul Done is a Field CTO at MongoDB Inc., having been a Solutions Architect for the past decade at MongoDB. He has previously held roles in various software disciplines, including engineering, consulting, and pre-sales, at companies like Oracle, Novell, and BEA Systems. Paul specializes in databases and middleware, focusing on resiliency, scalability, transactions, event processing, and applying evolvable data model approaches. He spent most of the early 2000s building Java EE (J2EE) transactional systems on WebLogic, integrated with relational databases like Oracle RAC and messaging systems like MQ Series.
Read more about Paul Done

Right arrow

Fixing and Generating Data Examples

This chapter provides you with tools and techniques to cleanse the data within your dataset. Sometimes, collections may store number and boolean fields as strings, or date fields as text without essential details such as the applicable time zone. Without proper typing, it can be almost impossible for users to execute range-based queries or ask for the results from querying the data to be sorted. You will also learn how to generate new sample data from scratch to help with your testing.

This chapter covers the following:

  • Converting text fields to strongly typed fields
  • Fixing incomplete date strings
  • Generating new mock data

Strongly typed conversion

It's not uncommon for someone to import data into a MongoDB collection and neglect to apply strong typing for the date, number, and boolean fields and store them as strings. This situation is likely to cause friction for subsequent users of the data. This example will show you how to restore these fields to their proper types.

Scenario

A third party has imported a set of retail orders into a MongoDB collection but with all data typing lost (they have stored all field values as strings). You want to reestablish correct typing for all the documents and copy them into a new cleaned collection. You can incorporate such transformation logic in the aggregation pipeline because you know each field's type in the original record structure.

Note

Unlike most examples in this book, in this example, the aggregation pipeline writes its output to a collection rather than streaming the results back to the calling application.

Populating the sample...

Converting incomplete date strings

Sometimes, you will encounter datasets with dates stored as strings and lacking critical details such as the century and time zone. As with the prior example, this poses challenges for database users. The next example will demonstrate how to amend these dates and add the missing information.

Scenario

An application is ingesting payment documents into a MongoDB collection where each document's payment date field contains a string looking vaguely like a date-time, such as "01-JAN-20 01.01.01.123000000". When aggregating the payments, you want to convert each payment date into a valid BSON (BSON is a binary encoding for JSON data types, making it easier and more performant for MongoDB to process and enabling support for more data types than the JSON standard) date type. However, the payment date fields contain only some of the information required to determine the exact date-time accurately. Therefore, you cannot use the date operator...

Generating mock test data

The ability to generate test data is necessary for most IT projects, but this can be quite a tedious and time-consuming process. The MongoDB aggregation framework provides operators that a pipeline can include to make generating mock test data easy for certain types of test scenarios.

Note

For this example, you require MongoDB version 6.0 or above. This is because you'll be using the $densify and $fill stages introduced in version 6.0.

Scenario

You want to generate a load of sample data into a MongoDB collection so you can subsequently educate yourself by experimenting with MongoDB Query Language and defining indexes to determine how to improve the response time of your test queries. You don't have much time, so you want to use a low-effort way to quickly produce a collection of half a million documents using an aggregation pipeline. The specific fields you want each sample document to have include the following:

  • A monotonically...

Summary

In this chapter, you learned techniques for fixing existing data in your database, helping to fill in missing bits, and converting fields to be strongly typed.

In the next chapter, you will explore examples for analyzing datasets to pinpoint trends, categories, and relationships.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical MongoDB Aggregations
Published in: Mar 2024Publisher: PacktISBN-13: 9781835884362
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Paul Done

Paul Done is a Field CTO at MongoDB Inc., having been a Solutions Architect for the past decade at MongoDB. He has previously held roles in various software disciplines, including engineering, consulting, and pre-sales, at companies like Oracle, Novell, and BEA Systems. Paul specializes in databases and middleware, focusing on resiliency, scalability, transactions, event processing, and applying evolvable data model approaches. He spent most of the early 2000s building Java EE (J2EE) transactional systems on WebLogic, integrated with relational databases like Oracle RAC and messaging systems like MQ Series.
Read more about Paul Done