Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Tableau Prep Cookbook

You're reading from  Tableau Prep Cookbook

Product type Book
Published in Mar 2021
Publisher Packt
ISBN-13 9781800563766
Pages 288 pages
Edition 1st Edition
Languages
Author (1):
Hendrik Kleine Hendrik Kleine
Profile icon Hendrik Kleine

Table of Contents (11) Chapters

Preface Chapter 1: Getting Started with Tableau Prep Chapter 2: Extract and Load Processes Chapter 3: Cleaning Transformations Chapter 4: Data Aggregation Chapter 5: Combining Data Chapter 6: Pivoting Data Chapter 7: Creating Powerful Calculations Chapter 8: Data Science in Tableau Prep Builder Chapter 9: Creating Prep Flows in Various Business Scenarios Other Books You May Enjoy

Chapter 5: Combining Data

Besides cleaning up your data inputs, Tableau Prep can be used to increase the value of your new dataset by augmenting it with complementary data. This can be done by extending the dataset vertically, adding more rows, or horizontally, by adding new data columns. Performing such data preparation tasks within Tableau Prep allows you to create a dataset that includes key data from multiple inputs, making the end result a comprehensive dataset for analysis.

In this chapter, you will learn how to combine different datasets by using a variety of different methods. Combining data is one of the most common actions in data preparation. Most organizations source data from multiple systems and combining that data into a holistic dataset allows more insightful analysis than looking at each dataset in isolation.

In this chapter, you'll find the following recipes to help you combine your data for analytics:

  • Combining data with Union
  • Combining data...

Technical requirements

To follow along with the recipes in this chapter, you will require Tableau Prep Builder. The recipes in this chapter use sample data files that you can download from the book's GitHub repository at https://github.com/PacktPublishing/Tableau-Prep-Cookbook.

Combining data with Union

Data is typically produced by multiple systems and certain systems may produce similar data that needs to be combined vertically. That is, the rows need to be stacked on top of one another. A use case we'll use in this recipe is combining sales data from two different sales systems, in order to get the total sales dataset prepared. This is a typical scenario you may encounter when your organization is operating multiple systems, is migrating from one system to another, is maintaining legacy systems, or is integrating systems from a partner or acquired company. In this recipe, we'll combine multiple datasets using Union.

Getting ready

To follow along with this recipe, download the Sample Files 5.1 folder from this book's GitHub repository.

How to do it…

Start by opening Tableau Prep and perform the following steps:

  1. Connect to the DataExport_NOV_Sales.csv text file in order to create a brand-new flow with a data connection...

Combining data ingest and Union actions

When you have multiple data sources that need to be vertically stacked using a union, you may opt to perform that action during ingestion. This avoids you having to create a separate Union step in Tableau Prep. Furthermore, you can use wildcards in your input, such that the input becomes dynamic, and new data files can be ingested as they are added. A typical scenario for this would be an automated process that exports data on a recurring schedule, which you then need to union with prior data exports.

In this recipe, we'll use a special type of union that is part of the input step, rather than a step by itself. Using the Union functionality during input allows you to ingest and union multiple input files simultaneously.

Getting ready

To follow along with this recipe, download the Sample Files 5.2 folder from this book's GitHub repository. This folder contains sales data that has been exported every month, and so we have one...

Combining datasets using an inner join

We often store in multiple different places for a variety of reasons, including different systems, storage optimization and efficiency, security, costs, and so forth. When analyzing data, however, we often want to bring data from many different areas together to create a richer dataset for analysis. Doing so may result in a better understanding of the data and provide valuable business insights. We can use the Join functionality to combine data horizontally, that is, widen the dataset by adding fields from two or more sources together. There are a variety of different join types, as you can see by the recipes in this chapter. In this recipe, we'll look at creating an inner join.

An inner join is the end result of joining two data sources together and retaining only those rows that overlap. For example, let's assume we have order data for a B2B seller of office supplies. Their data may be segmented into a database containing order...

Combining datasets using a left or right join

In the Combining datasets using an inner join recipe, we combined data that was complementary and complete, orders, and associated customer information. However you may find a use case where complementary data is present for some records, and not for others. An example of this that we'll use in this recipe involves two data sources, one with sales data from a department store, and another data source with information from customers who checked out with a loyalty card. Of course, not all customers may have a loyalty card, and so we cannot expect to match every row in the data. This is where a left join comes into play.

In a left join, we pass through all the data from the first dataset, that is, the left data source, and only those records from the second, right data source that we were able to match. This means that any sales records that did not involve a loyalty card will still pass through, but the additional fields from the...

Expanding datasets using a full outer join

In the Combining data ingest and Union actions recipe, we created an inner join to return rows from two data sources that had a commonality. In the Combining datasets using a left or right join recipe, we created a left join to return all rows from a data source and enrich that data with information from a second source, whenever there was additional information available, without dropping any rows from the original source.

In this recipe, we'll look at a variation of the join, which is named the full outer join. In this case, we'll want to retrieve all rows from both data sources involved in the join, that is, even if there's no overlap. It's essentially doing a left and right join at the same time; you won't lose any data from either data source.

In the example that follows, we'll use a use case where a company is running several projects and each project may have a number of people assigned to it. However...

Expanding datasets using a not inner join

In this chapter, we have assumed in all join-related recipes that there was an overlap between two data sources. However, for analysis purposes, you may be interested in what data is not overlapping, so that you can take action appropriately.

Using the same data as we've used in the Expanding datasets using a full outer join recipe, where we have a data source with projects, and another data source with project staff, we may change our use case to focus solely on data that does not overlap. That is, we are only interested in projects without staff assigned to them, or staff members not currently assigned to work on any project.

Getting ready

To follow along with this recipe, download the Sample Files 5.6 folder from this book's GitHub repository.

How to do it…

Start by opening Tableau Prep and perform the following steps to create a not inner join:

  1. Connect to the Projects.xlsx Excel file in order to create...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Tableau Prep Cookbook
Published in: Mar 2021 Publisher: Packt ISBN-13: 9781800563766
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}