Reader small image

You're reading from  Tableau 10 Business Intelligence Cookbook

Product typeBook
Published inNov 2016
PublisherPackt
ISBN-139781786465634
Edition1st Edition
Tools
Right arrow
Authors (2):
Donabel Santos
Donabel Santos
author image
Donabel Santos

Donabel Santos is a self-confessed data geek. She loves working with data, writing queries, and developing reports on her SQL Server databases, and exploring and visualizing data with Tableau. She is the principal and senior business intelligence architect at QueryWorks Solutions, a Tableau Learning and Alliance partner in Vancouver, BC, Canada, providing consulting and training services. She has spent years in consulting and has developed a variety of solutions for clients in different verticals—finance, manufacturing, healthcare, legal, higher education, and local government. Donabel is a multi-year Microsoft Data Platform MVP (previously known as SQL Server MVP) and has extensive experience in the SQL server in different areas, such as development, administration, data warehouse, reporting (SSRS), tuning, troubleshooting, XML, CLR, and integration with ERPs and CRMs using PowerShell, C#, SSIS, and Power BI. One of Donabel's passions is teaching and sharing her love for data. She is a Tableau Certified Professional and a Tableau accredited trainer, delivering Tableau public and on-site client training. She is also the lead instructor for a number of courses at British Columbia Institute of Technology (BCIT), including Applied Database Administration and Design (ADAD) and Applied Data Analytics (ADA) programs. She teaches SQL server administration, development, integration (SSIS), data warehouse foundations, and visual analytics with Tableau. Donabel has also authored three other books with Packt Publishing: SQL Server 2012 with PowerShell V3 Cookbook, PowerShell for SQL Server Essentials, and SQL Server 2014 with PowerShell V5 Cookbook. She also contributed a chapter to Manning Publications' PowerShell Deep Dives.
Read more about Donabel Santos

Paul Banoub
Paul Banoub
author image
Paul Banoub

Contacted on: 18/12/2015 for Mastering Tableau 10 [video]
Read more about Paul Banoub

View More author details
Right arrow

Chapter 7. Data Preparation

In this chapter, we will cover the following topics:

  • Using the Data Interpreter and pivot

  • Using the legacy Jet driver

  • Using schema.ini to resolve data type issues

  • Pivoting columns

  • Using union

  • Using join

  • Using blend

Introduction


In a perfect world, we wouldn't even need to have this chapter. In a perfect world, we would have perfect, clean data that we could easily analyze in Tableau. But, alas, in reality, the data that we need to use will most likely need to be cleaned, transformed, and managed before we can effectively use it in Tableau.

There are tools that exclusively help clean and re-shape data. Many refer to these as ETL (Extract, Transform, and Load) tools. While Tableau is not an ETL tool, it has the ability to help clean or prepare data if it is not possible to clean or prepare it at the data source. Calculated fields, discussed in Appendix A, Calculated Fields Primer, are also invaluable in the data cleaning process.

Using the Data Interpreter and pivot


In this recipe, we will clean up the following spreadsheet on Canada international student permits and ready it for Tableau:

Getting ready

To follow this recipe, download the file from the Citizenship and Immigration Canada website using the following URL:

http://www.cic.gc.ca/opendata-donneesouvertes/data/IRCC_IS_0004_E.xls

How to do it...

Here are the steps to clean up the file:

  1. Connect to the Excel file in this recipe. Make sure you choose Excel from the To a File section:

  2. Check the checkbox beside Use Data Interpreter. Note that when this checkbox is checked, the label changes to Cleaned with Data Interpreter:

  3. Select all fields except for Destination.

  4. While the fields are selected, right-click and choose Pivot:

  5. Right-click the new fields to rename them:

    • Change Pivot Field Names to Period

    • Change Pivot Field Values to International Students

  6. Click on Add underneath Filters:

  7. In the Select a field: option, choose Period:

  8. In the filter window for Period, under the...

Using the legacy Jet driver


In this recipe, we will use the New York Restaurant Inspections Excel file and use the legacy Jet driver to shape the file so we can have both the inspection date and grade date in the same column. This will allow us to visualize how many restaurants were inspected and graded for a specific date:

Getting ready

To follow this recipe, download the file from the New York City Open Data website using the following URL:

https://nycopendata.socrata.com/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/xx67-kt59/data

Once you have downloaded the data, save the file as DOHMH_New_York_City_Restaurant_Inspection_Results.xls (Microsoft Excel 97-2003 Worksheet). Note that the records may have been updated between the time of writing and the time of your download.

How to do it...

Here are the steps to prepare the Excel file:

  1. Click on New Data Source icon, and choose Excel:

  2. Choose DOHMH_New_York_City_Restaurant_Inspection_Results.xls, and select Open with Legacy Connection...

Using schema.ini to resolve data type issues


In this recipe, we will use a schema.ini file to resolve the data types when we connect to a fixed width text file data source with four columns.

Getting ready

To follow this recipe, download this chapter's files from the Packt website and use the file called Fixed Width Sample.txt.

This is what the file looks like when opened in a text editor showing special characters:

Note that this file does not have any column headers. In addition, note the following:

  • The first column should be text

  • The second column should be integers

  • The third column should be text

  • The fourth column should be dates

How to do it...

Here are the steps to use the .ini file with the fixed width text file:

  1. Create a text file with the following contents:

  2. Save the file as schema.ini and save it in the same directory as the Fixed Width Sample.txt file.

  3. Connect to the text file in Tableau:

  4. Confirm that there are four fields in the Tableau preview window, with the same configuration as specified...

Pivoting columns


In this recipe, we will prepare the data set that we used in the Creating a population pyramid recipe of Chapter 2, Advanced Charts. The original file, which is a comma separated value (CSV) file, looks like the following:

Getting ready

To follow this recipe, download the file from BCStats using the following URL:

http://www.bcstats.gov.bc.ca/StatisticsBySubject/Demography/PopulationProjections.aspx.

When you download, make the following selections and click on Generate Output:

  • Select British Columbia for Region

  • Select all the years

  • Select totals

  • Select 5-Year Age Groups

Beside the result pane, click on the CSV icon at the top-right corner of the result pane to download the .csv file. Save the file as Population_Projections.csv.

How to do it...

Here are the steps to prepare the .csv file:

  1. Click on New Data Source icon and connect to the text file in this recipe:

  2. Select all the age groups that are presented as individual columns.

  3. While all the age group columns are selected, right-click...

Using union


In this recipe, we will combine a number of comma separated value (CSV) files into a single data set in Tableau.

Getting ready

To follow this recipe, download the business license files from the City of Vancouver's website from http://data.vancouver.ca/datacatalogue/businessLicence.htm:

Download the CSV version, and save all the files in a local directory in your computer:

How to do it...

Here are the steps to prepare the data files:

  1. Click on New Data Source icon and connect to business_licenses.csv which contains the most recent year's records:

  2. Drag New Union to just underneath the business_licenses.csv until you see the Drag table to union message:

  3. Select all other CSV files from the Files pane and drag them to the Union window:

  4. Click on OK after you confirm that all the files have been added to the Union window:

  5. Add a new sheet and create your visualization using this data set.

How it works...

A union operation allows multiple sets of data to be appended to each other, that is, new records...

Using join


In this recipe, we will combine the fields in two different Excel worksheets into one:

Getting ready

To follow this recipe, download this chapter's files from the Packt website and use the file called Worlds Busiest Airports—Join.xls.

How to do it...

Here are the steps to combine the fields from the two worksheets:

  1. Connect to the Excel file in this recipe. Make sure you choose Excel from the To a File section:

  2. Drag Busiest Airports 2015 from the sheets section to the data connection window:

  3. Drag Airport Geocode to the right of Busiest Airports 2015 in the data connection window.

  4. In the Join window that comes up, choose Airport from Busiest Airports 2015 to match up to the Airport Code field from the Airport Geocode sheet:

  5. Add a new sheet and create your visualization using this data set.

How it works...

A join is primarily a relational database concept that allows you to combine records from different tables using common fields. When data sets are joined, all fields are combined based on...

Using blend


In this recipe, we will combine the records from a text file and an Excel file using a blend.

Getting ready

To follow this recipe, download this chapter's files from the Packt website and use the following files:

  • The Airport Geocode—Blend.csv file

  • The Worlds Busiest Airports—Blend.xlsx file

This is the content of the Airport Geocode—Blend.csv file:

These are the records in the Worlds Busiest Airports - Blend.xlsx file:

How to do it...

Here are the steps to blend the two data sources:

  1. Connect to the Excel file in this recipe. Make sure you choose Excel from the To a File section:

  2. Go to new worksheet:

  3. Click on the New Data Source icon, and this time connect to a Text file. Connect to the text file in this recipe:

  4. If you are directed back to the initial connection screen, go back to sheet 1.

  5. Under the Data menu, click on Edit Relationships:

  6. While Airport Geocode—Blend is selected as the Primary data source, click on Custom and match up the Airport Code field to Airport:

  7. Click on OK when done.

  8. While...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Tableau 10 Business Intelligence Cookbook
Published in: Nov 2016Publisher: PacktISBN-13: 9781786465634
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Donabel Santos

Donabel Santos is a self-confessed data geek. She loves working with data, writing queries, and developing reports on her SQL Server databases, and exploring and visualizing data with Tableau. She is the principal and senior business intelligence architect at QueryWorks Solutions, a Tableau Learning and Alliance partner in Vancouver, BC, Canada, providing consulting and training services. She has spent years in consulting and has developed a variety of solutions for clients in different verticals—finance, manufacturing, healthcare, legal, higher education, and local government. Donabel is a multi-year Microsoft Data Platform MVP (previously known as SQL Server MVP) and has extensive experience in the SQL server in different areas, such as development, administration, data warehouse, reporting (SSRS), tuning, troubleshooting, XML, CLR, and integration with ERPs and CRMs using PowerShell, C#, SSIS, and Power BI. One of Donabel's passions is teaching and sharing her love for data. She is a Tableau Certified Professional and a Tableau accredited trainer, delivering Tableau public and on-site client training. She is also the lead instructor for a number of courses at British Columbia Institute of Technology (BCIT), including Applied Database Administration and Design (ADAD) and Applied Data Analytics (ADA) programs. She teaches SQL server administration, development, integration (SSIS), data warehouse foundations, and visual analytics with Tableau. Donabel has also authored three other books with Packt Publishing: SQL Server 2012 with PowerShell V3 Cookbook, PowerShell for SQL Server Essentials, and SQL Server 2014 with PowerShell V5 Cookbook. She also contributed a chapter to Manning Publications' PowerShell Deep Dives.
Read more about Donabel Santos

author image
Paul Banoub

Contacted on: 18/12/2015 for Mastering Tableau 10 [video]
Read more about Paul Banoub