Reader small image

You're reading from  Tableau Prep Cookbook

Product typeBook
Published inMar 2021
PublisherPackt
ISBN-139781800563766
Edition1st Edition
Tools
Right arrow
Author (1)
Hendrik Kleine
Hendrik Kleine
author image
Hendrik Kleine

Hendrik Kleine is an advanced analytics leader with 15 years of experience in the analytics space, including in data architecture, engineering, and visualization. He specializes in translating vast amounts of data into easy-to-understand visual communications that provide actionable intelligence. He is an avid innovator and a listed author of multiple data-related inventions. Before COVID-19, he was a speaker at the most recent Tableau conference in San Francisco.
Read more about Hendrik Kleine

Right arrow

Chapter 3: Cleaning Transformations

All of Tableau Prep Builder's functionality is designed around the purpose of combining, shaping, and cleaning your data for downstream analysis. In this chapter, we'll look at the various transformations available to you to shape and clean your data.

In this chapter, you'll find the following recipes, which will help you transform your data:

  • Renaming columns
  • Filtering your dataset
  • Changing data types
  • Auto-validating data
  • Validating data with a custom reference list
  • Splitting fields with multiple values

Technical requirements

To follow along with the recipes in this chapter, you will require Tableau Prep Builder. We'll use sample Excel files supplied in the book's GitHub repository, so there's no need to connect to a database. In each recipe, however, you can replace the suggested sample input data source with any connection type that suits your scenario.

The recipes in this chapter use sample data files that you can download from the book's GitHub repository: https://github.com/PacktPublishing/Tableau-Prep-Cookbook.

Renaming columns

When it comes to cleaning data, one of the simplest yet most powerful actions might be simply renaming your fields to a more user-friendly format.

Tableau Prep steps can be categorized into three items: inputs, transformations, and outputs. During the first two, the input and transformation steps, we'll always have the ability to change any field name as desired.

Getting ready

To follow along with this recipe, download the Sample Files 3.1 folder from this book's GitHub repository.

How to do it…

Open up Tableau Prep Builder and connect to the December 2016 Sales.xlsx file:

  1. Drag the Sales_Data sheet onto the flow canvas:

    Figure 3.1 – Connecting to Sales_Data

    When you select the Sales_Data input step, the bottom pane will show a preview of all the fields in the source data, along with their data type and some sample values:

    Figure 3.2 – These field names are not user-friendly

  2. In order to change any of the field...

Filtering your dataset

When preparing your data for analysis, it's good practice to provide the least amount of data required to perform that analysis. Our data inputs frequently contain data that is not required and that you may want to remove immediately during an input step.

Alternatively, you may transform your data in a Tableau Prep flow, and as a result, a field may become redundant at some point after the input step itself.

In Tableau Prep, there are three methods you can use to filter your data. In this recipe, we'll perform filter actions using all three methods: calculation filters, selected values filters, and regular filters.

Getting ready

To follow along with this recipe, download the Sample Files 3.2 folder from this book's GitHub repository.

How to do it…

Open up Tableau Prep Builder and connect to the Superstore Sales.hyper extract file, then follow the steps:

  1. With an input step, the only method to filter our data is...

Changing data types

With Tableau Prep's ability to connect to an incredible number of different data sources comes the challenge of data type management. Every data source technology handles data types slightly differently, or stores values differently. The wrong data type may limit the number of functions you can perform with that field. For example, you cannot aggregate a number if its data type is text, nor can you filter for a date range if the data type is not a date.

Tableau Prep does a phenomenal job of automatically detecting the appropriate data type. Tableau Prep data types are listed as follows:

  • Number (decimal)
  • Number (whole)
  • Date & Time
  • Date
  • String

There are times when Tableau is unable to determine the correct data type, and times when it is unable to set it to your desired type because the values in the data are not compatible. We'll look at both cases in this recipe and how to address them.

Getting ready

To follow...

Auto-validating data

Data validation can be a time-consuming task where we have to determine whether a value is accurate or not. One of the most typical data validation issues relates to misspelling and labeling the same thing differently. For example, the city of New York might be present in your data more than once, with different labels:

  • New York
  • NY
  • NYC
  • New York, NY
  • New York, New York
  • New York, US
  • And so on…

To make the process of validating data easier, Tableau Prep uses data roles. A data role compares your data against a list of known values or specific patterns. This allows us to quickly identify problematic values in our data and take action to resolve them.

Getting ready

To follow along with this recipe, download the Sample Files 3.4 folder from this book's GitHub repository.

How to do it…

Open up Tableau Prep and connect to the User List.csv file from the Sample Files 3.4 folder and follow the steps:

...

Validating data with a custom reference list

Tableau Prep provides out-of-the-box data validation data roles for email addresses, URLs, and a variety of geographic fields. However, the real power of data roles comes from creating custom data roles specific to your environment and data. In this recipe, we'll create a custom data role and publish it to Tableau Server or Tableau Online. By leveraging your server, the custom data role can be made available to your colleagues, ensuring everyone is using a single reference list.

Getting ready

To follow along with this recipe, download the Sample Files 3.5 folder from this book's GitHub repository. In this recipe, we'll create a custom data role, for which you will need to be signed in to your instance of Tableau Server or Tableau Online.

How to do it…

Open up Tableau Prep and connect to the User List.csv file from the Sample Files 3.5 folder and follow the steps:

  1. To assign a data role to any field...

Splitting fields with multiple values

It's not uncommon for a single field to contain multiple values. For example, a name field may contain both a user's first and last name. Separating these can be done with Tableau Prep's Split Values function. Split Values facilitates the automatic creation of calculated fields with the necessary logic to split up a field based on your requirements.

Getting ready

To follow along with this recipe, download the Sample Files 3.6 folder from this book's GitHub repository.

How to do it…

Open up Tableau Prep and connect to the December 2016 Sales.xlsx file from the Sample Files 3.6 folder and follow the steps:

  1. Click the + icon on the input step and select Clean Step to add a cleaning step to your flow.

    In the profile pane, we can see that the field named Cashier contains an employee's name. We want to split that value into two separate fields, first name and last name:

    Figure 3.35 – The Cashier...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Tableau Prep Cookbook
Published in: Mar 2021Publisher: PacktISBN-13: 9781800563766
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Hendrik Kleine

Hendrik Kleine is an advanced analytics leader with 15 years of experience in the analytics space, including in data architecture, engineering, and visualization. He specializes in translating vast amounts of data into easy-to-understand visual communications that provide actionable intelligence. He is an avid innovator and a listed author of multiple data-related inventions. Before COVID-19, he was a speaker at the most recent Tableau conference in San Francisco.
Read more about Hendrik Kleine