Tableau Data Visualization Cookbook

5 (2 reviews total)
By Ashutosh Nandeshwar
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Connecting to Data Sources

About this book

You know the feeling when you are asked to change or add a certain data point in your graph at the last minute. Usually, you have to scramble to complete the project and risk accuracy; this is not so with Tableau, however. Tableau is a revolutionary toolkit that lets you simply and effectively create high quality data visualizations.

"Tableau Data Visualization Cookbook" will show you the exact steps required to generate simple to complex graphics. Whether they are pie charts or box plots, you can create such graphics with ease and confidence; no more searching for scripts or laborious Excel hacks. This book will help you make the most of Tableau and show you how to finish your projects quicker using this toolkit.
In this book you’ll start with getting your data into Tableau, move onto generating progressively complex graphics, and end with the finishing touches and packaging your work for distribution.

This book is filled with practical recipes to help you create filled maps, use custom markers, add slider selectors, and create dashboards. You will learn how to manipulate data in various ways by applying various filters, logic, and calculating various aggregate measures. Then, we will create animated graphs and provide search box and drop-down selectors to users. This book will help you to create stunning graphics in very short amount of time.

If you want to effortlessly create beautiful visualizations of data then "Tableau Data Visualization Cookbook" is for you!

Publication date:
August 2013
Publisher
Packt
Pages
172
ISBN
9781849689786

 

Chapter 1. Connecting to Data Sources

We will cover the following topics in this chapter:

  • Connecting to text files

  • Connecting to Excel files

  • Connecting to Access databases

  • Connecting to a SQL Server

  • Pasting from a clipboard

  • Connecting to other databases

  • Connecting to Windows Azure Marketplace

  • Understanding dimensions and measures

  • Changing data types

  • Applying filters

  • Merging multiple data sources

 

Introduction


This chapter will cover the basics to get Tableau connected with various data sources, such as text files, Excel/Access files, SQL Server, ODBC sources, and the clipboard. We will cover simplistic versions of data files, where data is clean and ready-to-use. This chapter also covers how to apply filters to reduce the available data for analysis as well as merging two different data sources.

 

Connecting to text files


When you open Tableau for the first time, you should see a screen similar to the one shown in the following screenshot. This image shows the various data sources available for analysis. Tableau provides you with two sample data sources, Sample - Coffee Chain (Access) and Sample - Superstore Sales (Excel), as shown in the following screenshot:

Getting ready

To prepare for the recipe, download and save titanic.txt from http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt on your local hard drive. Remember this location, as we will use this file for this recipe. This file lists all the passengers (and their details) that boarded Titanic on its disastrous voyage.

Tip

Downloading the example code

You can download the example code files for all the Packt books you have purchased from your account at http://www.packtpub.com. If you have purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

How to do it...

Once you have downloaded the text file, perform the following steps to get the data in Tableau:

  1. Click on the Connect to data link to expand that area as shown in the following screenshot:

  2. Click on Text File under the In a file section to launch the following screen:

  3. Find and select titanic.txt in the given Open dialog box.

  4. As Tableau loads the data, it will prefill some of the options. For example, the field separator or delimiter and the header row. In this case, the field separator is a comma and the first row does have field names in it. So, hit OK on the dialog box as shown in the previous screenshot.

  5. Tableau provides three options to allow you to interact with the data, which is a text file in this recipe. These three options are shown in the following screenshot. By using the Connect live option, we can use the file connection as it is, and by using the Import all data or Import some data option, we can speed up the analysis by importing the data in Tableau's own format. In this case, let's just use the Connect Live option to load all 1,313 rows in the titanic.txt file.

  6. As you can see, Tableau determined the data types and put some fields from the text file in the Dimensions section and others in Measures. Tableau determines data types of various fields using the Microsoft Jet Database Engine driver. Due to the driver's limitations, however, some fields are at times misinterpreted as measures when they should be detected as dimensions and vice versa. Since the field survived shows up as a Measures section, but contains a binary value of zero and one (no and yes), it would make sense to convert that field to a Dimensions section. To do so, simply drag the field over to the Dimensions section or right-click on the field and click on Convert to Dimension.

How it works…

We used a text file as a data source and connected to it using Tableau's data source connection options. Although most of the time Tableau can determine data types accurately, sometimes you need to pay attention to changing the data types to reflect the actual data type. In this case, we converted a binary field (containing zero and one) from the Measures field to the Dimensions field.

There's more...

In its online Knowledge Base, Tableau discusses how to handle situations where data types are misinterpreted because of Microsoft's Jet Database Engine's limitations. You can find that article at http://kb.tableausoftware.com/articles/knowledgebase/jet-incorrect-data-type-issues.

 

Connecting to Excel files


Since Microsoft Excel is a very commonly used tool for analyzing data, Tableau makes it easy for the users to connect to Excel files.

Getting ready

To use an Excel file as a data source, let's use the sample file that comes with the Tableau installation. Unless you have customized your Tableau installation, you should find the Sample – Superstore Sales (Excel).xls file when you navigate to My Documents | My Tableau Repository | Datasources.

How to do it...

Once you have identified the presence of the sample Excel file, perform the following steps to connect to the Excel file:

  1. From Tableau's main screen, click on Connect to Data as shown in the following screenshot.

  2. Under the In a file option, select Microsoft Excel as the connection option.

  3. Browse and select the file Sample - Superstore Sales (Excel).xls.

  4. Tableau will determine the number of sheets in the file and provide an option to import a single worksheet (also called tabs or sheets) or multiple worksheets.

  5. Select the Orders sheet and hit OK.

  6. Use the Connect live option to get the data loaded as is from the Excel file.

  7. You'll see that Tableau determines the field types based on the available data.

How it works...

When Tableau connects to Excel files, it provides options for connecting to a single worksheet or multiple worksheets. Custom SQL commands can also be written to access data from multiple worksheets of an Excel file. In this recipe, we connected to a well-formatted worksheet from the sample Excel file. As was the case with plain text files, Tableau determines the data types of the fields using Microsoft's Jet Engine Driver.

There's more...

Although we used a well-formatted Excel file for this recipe, we know that analysts spend a lot time cleaning and manipulating data before any analysis. Before connecting to Tableau, we have to make sure the Excel files are formatted according to what Tableau is expecting as a data source. The Preparing Excel Files for Analysis article in the Knowledge Base section at the following link provides more information on how to prepare Excel files to be used in Tableau for analysis:

http://kb.tableausoftware.com/articles/knowledgebase/preparing-excel-files-analysis

 

Connecting to Access databases


Microsoft Access is a good tool to store smaller datasets in a relational database format without purchasing and installing complete data storage solutions, such as Microsoft SQL Server, Oracle, or MySQL. Tableau provides an option to connect to Access databases.

Getting ready

Let's use the Access database (Sample - Coffee Chain.mdb) that came with the standard installation. As with the Excel file used in the previous recipe, unless you made any customization during installation, the database file should be found by navigating to Documents | My Tableau Repository | Datasources.

How to do it...

Once you have located the Access database file, perform the following steps to connect to the sample Access database file:

  1. From Tableau's main screen, click on Connect to Data.

  2. Under the In a file option, select Microsoft Access as the connection option.

  3. Browse and select the file Sample - Coffee Chain.mdb.

  4. Tableau will determine and list tables found in the database and will ask whether to analyze a single table or a query. Select CoffeeChain Query and hit OK.

  5. Use the Connect live option to connect to the data from the Access database.

  6. You'll see that Tableau loaded the query from the Access database and also determined the data types using Microsoft's Jet Engine Driver.

How it works...

Using Microsoft's Jet Engine Driver, Tableau connects to Microsoft Access and determines the data types of the fields of a table. Just like the connection to Excel files, Tableau allows the user to select a single table, multiple tables, or write custom SQL commands.

 

Connecting to a SQL Server


Although it is pretty easy to connect to Access, Excel, and other flat files, data is frequently stored on some sort of relational database on a server, such as on the SQL Server or Oracle. Tableau offers connections to various data stores too. Here, we'll focus on Microsoft SQL Server.

Getting ready

Security roles, server connections, authentication issues, port and firewall details, and other factors can create problems while trying to access data from a server. The solutions to these problems are out of the scope of this book but you should make sure that you are able to access the server database from the same computer on which Tableau is installed.

How to do it...

Once you have made sure you have access to the database server and the database, perform the following steps to connect to a Microsoft SQL Server table:

  1. From Tableau's main screen, click on Connect to Data.

  2. Under the On a server option, click on Microsoft SQL Server.

  3. In the Microsoft SQL Server Connection dialog box, enter the server name as shown in the following screenshot.

  4. Click on the Connect button under the Establish the connection option.

  5. Select AdventureWorks2012 as a database on the server.

  6. Under Define the connection, select Single Table and then Person (Person.Person), as shown in the following screenshot:

  7. Hit OK.

  8. Select the Connect Live option to connect to the SQL Server database directly.

How it works...

Using the the Connect Live option in Tableau, we connected to a SQL Server database directly. This option allows users to create visualizations that will be refreshed as the underlying data changes. If connected live, Tableau will create results based on the SQL Server's settings, which are usually set to maximize performance.

 

Pasting from a clipboard


Sometimes it is easier to just paste data from the clipboard than pasting it to Excel or CSV files and then importing them again in Tableau. Tableau does provide a quick import method from the clipboard.

Getting ready

Let's use the Titanic dataset that we used in the Connecting to the text files recipe. The file is at http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt.

How to do it...

Once you have determined that the Titanic dataset file is on your local drive, perform the following steps to copy the data to Tableau using a clipboard:

  1. Open the file in Notepad.

  2. Select all the data and copy everything from the file to the clipboard (Ctrl + A and then Ctrl + C on Windows), and perform the following steps:

    1. In Tableau, navigate to File | New to open a new blank workbook.

    2. Click on any open area on the workbook and paste the copied data by going to Data | Paste Data (Ctrl + V on Windows). Note the top-left area of the workbook, where the data connections are shown. It should show text such as Clipboard_timestamp, where timestamp is the time and date when the paste occurred.

  3. As you can see from the workbook, the data was improperly imported. To fix this problem, right-click on the data connection named as Clipboard_timestamp under Data and click on Edit Connection.

  4. From the Field separator drop-down menu, select Comma; we do this because the original file was separated by commas and Tableau used tab as the separator.

  5. After making your selection, hit OK and you'll see that all the fields from the Titanic data text file are shown in Dimensions and Measures.

How it works...

While copying data from the clipboard, Tableau uses tab as the default separator of data. This causes improper import of data when the data is separated by other delimiters such as a comma. You can easily correct this problem by editing the connection to the clipboard file saved in Windows' temporary folder.

There's more...

If you created visualizations based on the data from the clipboard and you need to regularly update this visualization, you will find the Tableau online article, Editing Pasted Data, in the Knowledge Base section at the following link:

http://kb.tableausoftware.com/articles/knowledgebase/editing-pasted-data

This article explains how to save the data source of the clipboard and modify the data source.

 

Connecting to other databases


Connecting to most of the databases on a server is straightforward. Providing the server name and authentication details is usually sufficient. There are at times, however, when either Tableau does not provide a direct connection to that database server or you want to use an ODBC connection.

Getting ready

The easiest way to make a connection to a database server is using an ODBC connection. To use this type of connection, we must first set up a Data Source Name (DSN). For this recipe, we will set up a DSN to connect to a database server.

In Control Panel, click on System and Security and then on Administrative Tools. Double-click on Data Source (ODBC). You'll see a Data Source Administrator window. Under User DSN, click on Add and follow the steps to create a DSN for your database. If you have a SQL Server instance installed and the Adventure Works database populated, select SQL Server Native Client and hit Finish. In the Name field, enter a name that you'll remember easily; remember that it cannot contain spaces. I chose adventureworkscnxn. Under Server, either enter the database server name or select Local. Continue with the default selections until you see a Finish button.

How to do it...

Once the DSN is set up, open a new worksheet in Tableau and perform the following steps to connect to a database using ODBC:

  1. Click on Connect to data and select Other Databases (ODBC).

  2. In the DSN dropdown, select the DSN that you created earlier.

  3. Click on Connect to test the data connection.

  4. Under Owner, select Person.

  5. Among the Table selection radio buttons, select Single Table; search for a table name by clicking on the magnifying glass icon.

  6. Under the Give the connection a name … textbox, enter a name and hit OK.

  7. If you can see the Connect live, Import all data, and Import some data options in the Data Connection page, you were able to successfully connect to the SQL Server using ODBC and DSN.

How it works...

With ODBC, Tableau provides an option to connect to the data sources that otherwise do not have native support in Tableau. This option provides flexibility to connect any data source that has an ODBC drive. Although SQL Server is supported directly by Tableau, in this recipe, we saw how easy it is to create a data connection using an ODBC driver and DSN.

There's more...

Microsoft's online support provides an excellent overview of ODBC connectivity at http://msdn.microsoft.com/en-us/library/windows/desktop/ms710252%28v=vs.85%29.aspx.

 

Connecting to Windows Azure Marketplace


Microsoft created an online platform called Windows Azure Marketplace for trading Software as a Service applications and data. Users can choose to buy and sell various datasets, and that makes it a great place to use datasets hosted on the cloud with Tableau.

Getting ready

To access the Azure Marketplace datasets, you'll need to create or already have an account with Microsoft (which was earlier called Windows Live ID). Once you are logged in with your Live ID, you'll also need to complete the registration process.

How to do it...

Once you have completed the registration process for the Azure Marketplace, perform the following steps to get the data from the Marketplace to Tableau:

  1. Log in to Windows Azure Marketplace (datamarket.azure.com) with your account credentials.

  2. Click on Data on the top navigation menu.

  3. Click on the US Air Carrier Flight Delays dataset.

  4. Click on Sign-up to subscribe to the dataset, and perform all the required steps.

  5. Once the subscription process is complete, click on the Explore this Dataset link.

  6. In the Build Query area, select On_Time_Performance from the Query field and hit Run Query.

  7. You should see a URL for the currently expressed query, which would look something like https://api.datamarket.azure.com/Data.ashx/oakleaf/US_Air_Carrier_Flight_Delays_Incr/v1/On_Time_Performance?$top=100.

  8. Copy this URL.

  9. In Tableau, go to the Connect to Data options screen.

  10. Click on Windows Azure Marketplace DataMarket.

  11. In the OData Connection pop-up box, shown in the following image, under the Step 1 input box, enter the copied URL:

  12. In Step 2 of this process, either select the Account key or Username option and enter the credentials (you'll find the Account key value below the OData URL information; you need to click on Show to make the key visible).

  13. In Step 3, click on the Connect button. If you do not see any message, you will be able to connect to the data file.

  14. In Step 4, enter a name for the connection and hit OK.

How it works....

Microsoft Azure Marketplace offers a data market for users to explore datasets, which are usually scattered everywhere on the Web. This marketplace creates a central repository of datasets, and with Tableau's integration of this marketplace, it is very easy to analyze various datasets.

There's more...

Microsoft's online support provides detailed information for users, who could be consumers or publishers of the data, as well as developers of Azure applications and services. You can read it at http://msdn.microsoft.com/en-us/library/windowsazure/gg315539.aspx.

 

Understanding dimensions and measures


Tableau divides the data in two main types: dimensions and measures. Dimensions are usually those fields that cannot be aggregated; measures, as its name suggests, are those fields that can be measured, aggregated, or used for mathematical operations. Dimension fields are usually used for row or column headings; measures are usually used for plotting or giving values to the sizes of markers.

When you import the data for the first time, Tableau determines whether to consider a field as a dimension or a measure. This determination involves considering fields with all text (nominal or other text) values and fields with numeric values. Depending on the data source, Tableau also uses Microsoft's Jet Engine Driver to classify fields into dimensions and measures.

Tableau visualizations are heavily dependent on the structure of dimensions and measures. Thus, organizing data properly into dimensions and measures is important, and if Tableau's determinations are wrong about the field data types, it is easy to convert these fields to the other category. Simply dragging the field to the pane works just like right-clicking on the field and clicking on Convert to Dimension or Convert to Measure.

 

Changing data types


Depending on the data source and connection, Tableau tries to determine the field data type. Most often, the field data types are identified correctly; sometimes, however, changing data types becomes necessary.

Getting ready

We will use the sample superstore sales saved data source for this exercise. Open a new worksheet and connect to the Sample – Superstore Sales (Excel) data source.

How to do it....

Once the sample file is loaded on the worksheet, perform the following steps to convert data types:

  1. In the Dimensions pane, right-click on Order Date.

  2. You'll notice some data types in Tableau: Number, String, and Date. Date & time is also a type, which is suited for data with a timestamp.

  3. Select String as the data type for this field. Next to this field name, you'll notice a symbol with letters (Abc); this symbol indicates that this field contains data of type String.

  4. Drag the Order Date field from the Dimensions to the Measures pane.

  5. You'll notice in the Measures pane that the field Order Date has an aggregation of Count.

  6. Right-click on Order Date (Count) in the Measures pane and select Change Data Type. Select Date as the new data type.

  7. You'll notice that the Order Date field is back in the Dimensions pane.

How it works...

Since the data type and role of a field (dimension or measure) determines how the data will be used in the visualizations, it is critical to have the right data type for fields in the data. You will notice that, if you convert a field in the Measures pane to a Date type, that field will be moved to Dimensions. If a field from the Dimensions pane is converted to Number, it will stay in the Dimensions pane. If a field from the Measures pane is converted to String, the default aggregation changes to Count.

 

Applying filters


If you want to reduce the amount of data available for visualizations or restrict the data for a particular field value, applying filters is a very good solution. This recipe will provide a basic overview of filters, and later in the book you'll see some other uses of filters.

Getting ready

We will use the sample superstore sales saved data source for this exercise. Open a new worksheet and connect to the Sample – Superstore Sales (Excel) data source.

How to do it...

Once the sample file is loaded on the worksheet, perform the following steps to explore the Filters feature in Tableau:

  1. Drag-and-drop Sub-category from Dimensions into the Rows shelf.

  2. Then drag-and-drop Profit from Measures into the Text Marks box. You can also right-click on Profit and click on Add to Sheet. Your worksheet should look like the following screenshot:

  3. If we want to see the subcategories that generated profit of more than $50,000, right-click on the Text box from the Marks pane box where it says SUM(Profit), and click on Filter

  4. In the Filter dialog box, click on the At Least option and either drag the slide to 50,000 or type 50,000 in the input box and hit OK.

  5. Once the filter is applied, you'll see eight subcategories that generated profit of more than $50,000.

  6. We can continue to filter this information further. For example, to remove Office Furnishings as a subcategory, right-click on Office Furnishings and click on Exclude. Now only seven rows of subcategories are visible.

  7. Both the filters will now show up in the Filters pane, as shown in the following screenshot:

  8. To remove the filters, right-click on the Filters pane and select Clear Shelf.

  9. To change a filter, right-click on the Sub-Category filter from the Filters pane and select Filter.

  10. In the Filter window, you'll see tabs such as General, Wildcard, Condition, and Top.

  11. In the General tab, you can type or select a value from the field. In the Wildcard tab, you can enter approximate string values to match certain patterns. In the Condition tab, you can enter conditions by the Fields or Formula values. In the Top tab, you can select the top n or bottom n items by a field or using a formula.

How it works...

Filters are a great way to manipulate the data on a worksheet. Depending on the field data type, various types of filters can be applied to a field. These filters can be numeric conditions to limit a numeric field or text patterns to limit a string field.

 

Merging multiple data sources


Often, our data is stored in different formats or different files. In relational databases, if two different tables have a common field, we can join these two tables with this field and pull the data in one single query. Tableau supports joins within a single data source connection; however, to merge multiple data source connections, Tableau uses a concept called data blending. In this recipe, we will look at how to blend two different data sources.

Getting ready

Download the following Google Spreadsheet, which contains the U.S. population by states, after signing in:

http://bit.ly/12rUIh3

Download it as a CSV on your local hard drive and name it USStatesPopulation.csv.

How to do it...

Once you have downloaded the CSV file, create a new worksheet in Tableau and perform the following steps to merge the CSV file and an Excel file:

  1. In a new workbook, connect to the Sample – Superstore Sales (Excel) data source.

  2. Once the data is loaded and you can see Dimensions and Measures populated, click on Connect to Data in Data and select the text file USStatesPopulation.csv.

  3. Accept all the defaults in the Text File Connection dialog box and hit OK.

  4. Choose the Connect Live option in the next dialog box.

  5. Tableau will match field names, and if it finds the same field names in both the data sources, it will create relationships between those common fields. To manually create relationships, click on Data and select Edit Relationships.

  6. In the Relationships dialog box, select Sample – Superstore Sales (Excel) as the Primary data source. Tableau will make USStatesPopulation.csv a secondary data source file.

  7. Click on the Custom radio button and select State from the left-hand side column and State from the right-hand side column and hit OK.

  8. To see profit by state, drag-and-drop the State value from the Sample – Superstore Sales (Excel) data source into the Rows shelf and the Profit measure into the Text Marks box.

  9. Click on the USStatesPopulation#csv data source in the Data pane, and right-click on Census population_April 1, 2010 from the Measures pane and select Add to Sheet.

  10. As shown in the following screenshot, you should see three measure values in the Measure Values pane, Measure Names in the Columns shelf and Census population_April 1, 2010 and Profit in the datasheet:

How it works...

Tableau can merge two or more different data sources in the same worksheet by creating relationships among common fields of these data sources. You can customize the blending operation by specifying the common fields in the data sources in the relationships. You should also note that this blending is different from joining two tables, because when we join tables, we create row-level joins and we can add fields from both the tables. Whereas, in blending, we merely show different fields from different data sources in a single visualization.

There's more...

Since the blending or merging of multiple data sources can prove challenging, it might be easier to understand this concept better by watching somebody actually do it. A YouTube user named James Wright uploaded a video of blending data at http://youtu.be/-G0lIz7y6y0.

About the Author

  • Ashutosh Nandeshwar

    Ashutosh R. Nandeshwar has extensive experience in data mining, machine learning, and information visualization. He is one of the few analytics professionals in the higher education industry who have developed analytical solutions for all stages of the student lifecycle (from recruitment to giving). He enjoys speaking to technical and non-technical audiences about the power of data as well as ranting about data professionals' chasing of "interesting" things. He received his PhD/MS from West Virginia University and BE from Nagpur University, all in Industrial Engineering. You can follow him on Twitter, @n_ashutosh, and on his website, http://www.nandeshwar.info.

    Browse publications by this author

Latest Reviews

(2 reviews total)
Very simple process, no problems!
Excellent
Book Title
Access this book, plus 7,500 other titles for FREE
Access now