We will cover the following topics in this chapter:
Connecting to text files
Connecting to Excel files
Connecting to Access databases
Connecting to a SQL Server
Pasting from a clipboard
Connecting to other databases
Connecting to Windows Azure Marketplace
Understanding dimensions and measures
Changing data types
Merging multiple data sources
This chapter will cover the basics to get Tableau connected with various data sources, such as text files, Excel/Access files, SQL Server, ODBC sources, and the clipboard. We will cover simplistic versions of data files, where data is clean and ready-to-use. This chapter also covers how to apply filters to reduce the available data for analysis as well as merging two different data sources.
When you open Tableau for the first time, you should see a screen similar to the one shown in the following screenshot. This image shows the various data sources available for analysis. Tableau provides you with two sample data sources, Sample - Coffee Chain (Access) and Sample - Superstore Sales (Excel), as shown in the following screenshot:
To prepare for the recipe, download and save
titanic.txt from http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt on your local hard drive. Remember this location, as we will use this file for this recipe. This file lists all the passengers (and their details) that boarded Titanic on its disastrous voyage.
Downloading the example code
You can download the example code files for all the Packt books you have purchased from your account at http://www.packtpub.com. If you have purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Once you have downloaded the text file, perform the following steps to get the data in Tableau:
Find and select
titanic.txtin the given Open dialog box.
As Tableau loads the data, it will prefill some of the options. For example, the field separator or delimiter and the header row. In this case, the field separator is a comma and the first row does have field names in it. So, hit OK on the dialog box as shown in the previous screenshot.
Tableau provides three options to allow you to interact with the data, which is a text file in this recipe. These three options are shown in the following screenshot. By using the Connect live option, we can use the file connection as it is, and by using the Import all data or Import some data option, we can speed up the analysis by importing the data in Tableau's own format. In this case, let's just use the Connect Live option to load all 1,313 rows in the
As you can see, Tableau determined the data types and put some fields from the text file in the Dimensions section and others in Measures. Tableau determines data types of various fields using the Microsoft Jet Database Engine driver. Due to the driver's limitations, however, some fields are at times misinterpreted as measures when they should be detected as dimensions and vice versa. Since the field survived shows up as a Measures section, but contains a binary value of zero and one (no and yes), it would make sense to convert that field to a Dimensions section. To do so, simply drag the field over to the Dimensions section or right-click on the field and click on Convert to Dimension.
We used a text file as a data source and connected to it using Tableau's data source connection options. Although most of the time Tableau can determine data types accurately, sometimes you need to pay attention to changing the data types to reflect the actual data type. In this case, we converted a binary field (containing zero and one) from the Measures field to the Dimensions field.
In its online Knowledge Base, Tableau discusses how to handle situations where data types are misinterpreted because of Microsoft's Jet Database Engine's limitations. You can find that article at http://kb.tableausoftware.com/articles/knowledgebase/jet-incorrect-data-type-issues.
To use an Excel file as a data source, let's use the sample file that comes with the Tableau installation. Unless you have customized your Tableau installation, you should find the
Sample â Superstore Sales (Excel).xls file when you navigate to My Documents | My Tableau Repository | Datasources.
Once you have identified the presence of the sample Excel file, perform the following steps to connect to the Excel file:
From Tableau's main screen, click on Connect to Data as shown in the following screenshot.
Under the In a file option, select Microsoft Excel as the connection option.
Browse and select the file
Sample - Superstore Sales (Excel).xls.
Tableau will determine the number of sheets in the file and provide an option to import a single worksheet (also called tabs or sheets) or multiple worksheets.
Use the Connect live option to get the data loaded as is from the Excel file.
You'll see that Tableau determines the field types based on the available data.
When Tableau connects to Excel files, it provides options for connecting to a single worksheet or multiple worksheets. Custom SQL commands can also be written to access data from multiple worksheets of an Excel file. In this recipe, we connected to a well-formatted worksheet from the sample Excel file. As was the case with plain text files, Tableau determines the data types of the fields using Microsoft's Jet Engine Driver.
Although we used a well-formatted Excel file for this recipe, we know that analysts spend a lot time cleaning and manipulating data before any analysis. Before connecting to Tableau, we have to make sure the Excel files are formatted according to what Tableau is expecting as a data source. The Preparing Excel Files for Analysis article in the Knowledge Base section at the following link provides more information on how to prepare Excel files to be used in Tableau for analysis:
Microsoft Access is a good tool to store smaller datasets in a relational database format without purchasing and installing complete data storage solutions, such as Microsoft SQL Server, Oracle, or MySQL. Tableau provides an option to connect to Access databases.
Let's use the Access database (
Sample - Coffee Chain.mdb) that came with the standard installation. As with the Excel file used in the previous recipe, unless you made any customization during installation, the database file should be found by navigating to Documents | My Tableau Repository | Datasources.
Once you have located the Access database file, perform the following steps to connect to the sample Access database file:
From Tableau's main screen, click on Connect to Data.
Under the In a file option, select Microsoft Access as the connection option.
Browse and select the file
Sample - Coffee Chain.mdb.
Tableau will determine and list tables found in the database and will ask whether to analyze a single table or a query. Select CoffeeChain Query and hit OK.
Use the Connect live option to connect to the data from the Access database.
You'll see that Tableau loaded the query from the Access database and also determined the data types using Microsoft's Jet Engine Driver.
Although it is pretty easy to connect to Access, Excel, and other flat files, data is frequently stored on some sort of relational database on a server, such as on the SQL Server or Oracle. Tableau offers connections to various data stores too. Here, we'll focus on Microsoft SQL Server.
Security roles, server connections, authentication issues, port and firewall details, and other factors can create problems while trying to access data from a server. The solutions to these problems are out of the scope of this book but you should make sure that you are able to access the server database from the same computer on which Tableau is installed.
Once you have made sure you have access to the database server and the database, perform the following steps to connect to a Microsoft SQL Server table:
From Tableau's main screen, click on Connect to Data.
Under the On a server option, click on Microsoft SQL Server.
In the Microsoft SQL Server Connection dialog box, enter the server name as shown in the following screenshot.
Click on the Connect button under the Establish the connection option.
Under Define the connection, select Single Table and then Person (Person.Person), as shown in the following screenshot:
Using the the Connect Live option in Tableau, we connected to a SQL Server database directly. This option allows users to create visualizations that will be refreshed as the underlying data changes. If connected live, Tableau will create results based on the SQL Server's settings, which are usually set to maximize performance.
Sometimes it is easier to just paste data from the clipboard than pasting it to Excel or CSV files and then importing them again in Tableau. Tableau does provide a quick import method from the clipboard.
Let's use the Titanic dataset that we used in the Connecting to the text files recipe. The file is at http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt.
Open the file in Notepad.
Select all the data and copy everything from the file to the clipboard (Ctrl + A and then Ctrl + C on Windows), and perform the following steps:
In Tableau, navigate to File | New to open a new blank workbook.
Click on any open area on the workbook and paste the copied data by going to Data | Paste Data (Ctrl + V on Windows). Note the top-left area of the workbook, where the data connections are shown. It should show text such as Clipboard_timestamp, where timestamp is the time and date when the paste occurred.
From the Field separator drop-down menu, select Comma; we do this because the original file was separated by commas and Tableau used tab as the separator.
While copying data from the clipboard, Tableau uses tab as the default separator of data. This causes improper import of data when the data is separated by other delimiters such as a comma. You can easily correct this problem by editing the connection to the clipboard file saved in Windows' temporary folder.
If you created visualizations based on the data from the clipboard and you need to regularly update this visualization, you will find the Tableau online article, Editing Pasted Data, in the Knowledge Base section at the following link:
This article explains how to save the data source of the clipboard and modify the data source.
Connecting to most of the databases on a server is straightforward. Providing the server name and authentication details is usually sufficient. There are at times, however, when either Tableau does not provide a direct connection to that database server or you want to use an ODBC connection.
The easiest way to make a connection to a database server is using an ODBC connection. To use this type of connection, we must first set up a Data Source Name (DSN). For this recipe, we will set up a DSN to connect to a database server.
In Control Panel, click on System and Security and then on Administrative Tools. Double-click on Data Source (ODBC). You'll see a Data Source Administrator window. Under User DSN, click on Add and follow the steps to create a DSN for your database. If you have a SQL Server instance installed and the
Adventure Works database populated, select SQL Server Native Client and hit Finish. In the Name field, enter a name that you'll remember easily; remember that it cannot contain spaces. I chose
adventureworkscnxn. Under Server, either enter the database server name or select Local. Continue with the default selections until you see a Finish button.
Once the DSN is set up, open a new worksheet in Tableau and perform the following steps to connect to a database using ODBC:
Click on Connect to data and select Other Databases (ODBC).
In the DSN dropdown, select the DSN that you created earlier.
Click on Connect to test the data connection.
Under Owner, select Person.
Among the Table selection radio buttons, select Single Table; search for a table name by clicking on the magnifying glass icon.
Under the Give the connection a name â¦ textbox, enter a name and hit OK.
With ODBC, Tableau provides an option to connect to the data sources that otherwise do not have native support in Tableau. This option provides flexibility to connect any data source that has an ODBC drive. Although SQL Server is supported directly by Tableau, in this recipe, we saw how easy it is to create a data connection using an ODBC driver and DSN.
Microsoft's online support provides an excellent overview of ODBC connectivity at http://msdn.microsoft.com/en-us/library/windows/desktop/ms710252%28v=vs.85%29.aspx.
Microsoft created an online platform called Windows Azure Marketplace for trading Software as a Service applications and data. Users can choose to buy and sell various datasets, and that makes it a great place to use datasets hosted on the cloud with Tableau.
To access the Azure Marketplace datasets, you'll need to create or already have an account with Microsoft (which was earlier called Windows Live ID). Once you are logged in with your Live ID, you'll also need to complete the registration process.
Once you have completed the registration process for the Azure Marketplace, perform the following steps to get the data from the Marketplace to Tableau:
Log in to Windows Azure Marketplace (datamarket.azure.com) with your account credentials.
Click on Data on the top navigation menu.
Click on the US Air Carrier Flight Delays dataset.
Click on Sign-up to subscribe to the dataset, and perform all the required steps.
Once the subscription process is complete, click on the Explore this Dataset link.
In the Build Query area, select On_Time_Performance from the Query field and hit Run Query.
You should see a URL for the currently expressed query, which would look something like https://api.datamarket.azure.com/Data.ashx/oakleaf/US_Air_Carrier_Flight_Delays_Incr/v1/On_Time_Performance?$top=100.
Copy this URL.
In Tableau, go to the Connect to Data options screen.
Click on Windows Azure Marketplace DataMarket.
In Step 2 of this process, either select the Account key or Username option and enter the credentials (you'll find the Account key value below the OData URL information; you need to click on Show to make the key visible).
In Step 4, enter a name for the connection and hit OK.
Microsoft Azure Marketplace offers a data market for users to explore datasets, which are usually scattered everywhere on the Web. This marketplace creates a central repository of datasets, and with Tableau's integration of this marketplace, it is very easy to analyze various datasets.
Microsoft's online support provides detailed information for users, who could be consumers or publishers of the data, as well as developers of Azure applications and services. You can read it at http://msdn.microsoft.com/en-us/library/windowsazure/gg315539.aspx.
Tableau divides the data in two main types: dimensions and measures. Dimensions are usually those fields that cannot be aggregated; measures, as its name suggests, are those fields that can be measured, aggregated, or used for mathematical operations. Dimension fields are usually used for row or column headings; measures are usually used for plotting or giving values to the sizes of markers.
When you import the data for the first time, Tableau determines whether to consider a field as a dimension or a measure. This determination involves considering fields with all text (nominal or other text) values and fields with numeric values. Depending on the data source, Tableau also uses Microsoft's Jet Engine Driver to classify fields into dimensions and measures.
Tableau visualizations are heavily dependent on the structure of dimensions and measures. Thus, organizing data properly into dimensions and measures is important, and if Tableau's determinations are wrong about the field data types, it is easy to convert these fields to the other category. Simply dragging the field to the pane works just like right-clicking on the field and clicking on Convert to Dimension or Convert to Measure.
Depending on the data source and connection, Tableau tries to determine the field data type. Most often, the field data types are identified correctly; sometimes, however, changing data types becomes necessary.
Once the sample file is loaded on the worksheet, perform the following steps to convert data types:
In the Dimensions pane, right-click on Order Date.
You'll notice some data types in Tableau: Number, String, and Date. Date & time is also a type, which is suited for data with a timestamp.
Drag the Order Date field from the Dimensions to the Measures pane.
You'll notice in the Measures pane that the field Order Date has an aggregation of Count.
Right-click on Order Date (Count) in the Measures pane and select Change Data Type. Select Date as the new data type.
You'll notice that the Order Date field is back in the Dimensions pane.
Since the data type and role of a field (dimension or measure) determines how the data will be used in the visualizations, it is critical to have the right data type for fields in the data. You will notice that, if you convert a field in the Measures pane to a Date type, that field will be moved to Dimensions. If a field from the Dimensions pane is converted to Number, it will stay in the Dimensions pane. If a field from the Measures pane is converted to String, the default aggregation changes to Count.
If you want to reduce the amount of data available for visualizations or restrict the data for a particular field value, applying filters is a very good solution. This recipe will provide a basic overview of filters, and later in the book you'll see some other uses of filters.
We will use the sample superstore sales saved data source for this exercise. Open a new worksheet and connect to the Sample â Superstore Sales (Excel) data source.
Once the sample file is loaded on the worksheet, perform the following steps to explore the Filters feature in Tableau:
Drag-and-drop Sub-category from Dimensions into the Rows shelf.
If we want to see the subcategories that generated profit of more than $50,000, right-click on the Text box from the Marks pane box where it says SUM(Profit), and click on Filter
In the Filter dialog box, click on the At Least option and either drag the slide to 50,000 or type
50,000in the input box and hit OK.
We can continue to filter this information further. For example, to remove Office Furnishings as a subcategory, right-click on Office Furnishings and click on Exclude. Now only seven rows of subcategories are visible.
To change a filter, right-click on the Sub-Category filter from the Filters pane and select Filter.
In the Filter window, you'll see tabs such as General, Wildcard, Condition, and Top.
In the General tab, you can type or select a value from the field. In the Wildcard tab, you can enter approximate string values to match certain patterns. In the Condition tab, you can enter conditions by the Fields or Formula values. In the Top tab, you can select the top n or bottom n items by a field or using a formula.
Often, our data is stored in different formats or different files. In relational databases, if two different tables have a common field, we can join these two tables with this field and pull the data in one single query. Tableau supports joins within a single data source connection; however, to merge multiple data source connections, Tableau uses a concept called data blending. In this recipe, we will look at how to blend two different data sources.
Download the following Google Spreadsheet, which contains the U.S. population by states, after signing in:
Download it as a CSV on your local hard drive and name it
Once you have downloaded the CSV file, create a new worksheet in Tableau and perform the following steps to merge the CSV file and an Excel file:
In a new workbook, connect to the Sample â Superstore Sales (Excel) data source.
Once the data is loaded and you can see Dimensions and Measures populated, click on Connect to Data in Data and select the text file
Accept all the defaults in the Text File Connection dialog box and hit OK.
Choose the Connect Live option in the next dialog box.
Tableau will match field names, and if it finds the same field names in both the data sources, it will create relationships between those common fields. To manually create relationships, click on Data and select Edit Relationships.
In the Relationships dialog box, select Sample â Superstore Sales (Excel) as the Primary data source. Tableau will make
USStatesPopulation.csva secondary data source file.
Click on the USStatesPopulation#csv data source in the Data pane, and right-click on Census population_April 1, 2010 from the Measures pane and select Add to Sheet.
As shown in the following screenshot, you should see three measure values in the Measure Values pane, Measure Names in the Columns shelf and Census population_April 1, 2010 and Profit in the datasheet:
Tableau can merge two or more different data sources in the same worksheet by creating relationships among common fields of these data sources. You can customize the blending operation by specifying the common fields in the data sources in the relationships. You should also note that this blending is different from joining two tables, because when we join tables, we create row-level joins and we can add fields from both the tables. Whereas, in blending, we merely show different fields from different data sources in a single visualization.
Since the blending or merging of multiple data sources can prove challenging, it might be easier to understand this concept better by watching somebody actually do it. A YouTube user named James Wright uploaded a video of blending data at http://youtu.be/-G0lIz7y6y0.