Tableau Prep Builder allows you to connect to a large variety of industry-leading data platforms.
When Tableau Prep was initially launched, the only outputs supported were flat files and Tableau native formats TDE and Hyper. However, since the 2020.2.3 release in August 2020, you can output to database platforms such as SQL Server as well, making Tableau Prep a much more appealing tool. It allows you to do the following:
In this chapter, you will find recipes to ingest and output the data you require from and to a wide variety of data types:
If you are connecting a BI tool such as Tableau Desktop to your data, selecting the appropriate output type may impact your Tableau workbook performance.
To follow along with the recipes in this chapter, you will require Tableau Prep Builder version 2020.2.3 or later. If you'd like to follow along and connect or write to a database, you must have the appropriate database set up and have an account with the appropriate access privileges.
The recipes in this chapter use sample data files, which you can download from the book's GitHub repository: https://github.com/PacktPublishing/Tableau-Prep-Cookbook.
In this recipe, we'll connect to a Comma-Separated Values (CSV) file containing sales transactions and create a second connection to multiple Excel files. These connection types are very similar and so we'll cover them in one recipe. However, there are key features to both, which we'll highlight.
To follow along with the recipe, download the Sample Files 2.1 folder from this book's GitHub repository.
To get started, ensure you have the sample CSV and/or Excel file(s) ready on your computer and open up Tableau Prep Builder:
Once selected, Tableau Prep will automatically create a new flow with the data connection in it:
Since the data connection is automatically selected, all options onscreen now relate to that particular connection. On the left-hand side, you can see the Tables pane. Note that there is only one table, equal to the filename. Since text files do not contain tables, this is by design. Tableau Prep Builder will always provide a generic user interface for data connections whenever possible. Once you're comfortable with one connection type, others should be easier to master:
In the bottom pane, you can find a summary of all data fields identified in your text file, along with the automatically determined data type. In Chapter 3, Cleaning Transformations, we'll dive into the cleaning options you can perform here:
Finally, Character Set and Locale are typically identified appropriately but you can alter them here as needed:
Once we've selected the file, Tableau Prep Builder does not automatically show another data connection in the flow, as it did for our CSV file. This is the default behavior for any data connection that has multiple tables. In the case of Microsoft Excel, each Excel sheet is considered a table:
The colors do not denote any kind of functionality:
As a result, this step will now ingest all files in our subfolder named Archive and combine the data:
By completing these steps, you have learned how to connect Tableau Prep to text and Excel files.
Tableau Prep text files and Microsoft Excel connections automatically detect most settings very well, so, in most cases, a couple of clicks will get you up and running. The most powerful feature is undoubtedly the ability to ingest multiple files at the same time. You can ingest hundreds of files at the same time using this method, using a single data connection.
In this recipe, we'll connect to a PDF file containing text and a table with data. Tableau Prep has an exciting feature that can automatically detect the presence of tables in PDF files and extract the data for you.
To follow along with the recipe, download the Sample Files 2.2 folder from the book's GitHub repository.
To get started, ensure you have the sample PDF file ready on your computer, and open Tableau Prep Builder:
In this recipe, you have learned how to connect to PDF files and extract data for processing in Tableau Prep.
Tableau Prep converts each table in a PDF document into a data table when ingesting the file into a new flow. As such, Tableau Prep removes the complexity of parsing PDF documents and allows you to treat this like any other data connection.
In this recipe, we'll connect to a statistical file. Tableau Prep offers fantastic integration with popular statistical files from SAS (.sas7bdat), SPSS (.sav), and R (.rdata, .rda).
I advocate the use of open file formats such as CSV or commonly used standards such as Excel. However, if you are unable to obtain your data in such a format from your data science partner, this connector may offer a solution.
In this recipe, we'll connect to an R file using the statistical file connector. In order to follow along, download the Sample Files 2.3 folder from the book's GitHub repository.
To get started, ensure you have the sample RData file available on your computer. From the Tableau Prep home screen follow these steps:
And with just these few steps, Tableau Prep Builder has added the statistical file source to a new flow:
Most options in the bottom pane are identical to those when processing Excel files. However, there is a small but important feature absent. You cannot alter the data type of the fields in the statistical file connection step. In order to do this, you have to use a cleaning step, which we'll discuss in Chapter 3, Cleaning Transformations:
In this recipe, you have learned how to add Tableau Prep to a data science workflow by connecting to data produced by popular statistics applications.
Tableau Prep unpacks statistical files when you connect to them and, from that moment on, allows you to leverage them like any other connection.
There are some limitations when it comes to connecting to statistical files. If you run into any connection issues, I recommend you refer to the following section of the Tableau documentation online: https://help.tableau.com/current/pro/desktop/en-us/examples_statfile.htm.
In this recipe, we'll connect to a Microsoft SQL Server database. The many connectors provided out of the box by Tableau Prep allow you to connect to databases almost as easily as to file connections, allowing you to quickly start an advanced flow.
In order to follow along with this recipe, download the Sample Files 2.4 folder from the book's GitHub repository. This folder contains the Wide World Importers sample database backup (.BAK) file, which you can restore to your SQL Server instance.
Note that the provided BAK file is suitable for SQL Server 2016 SP1 or later. If you're running an earlier version or need instructions on installation, please consult Microsoft's support page at https://docs.microsoft.com/en-us/sql/samples/wide-world-importers-oltp-install-configure.
To get started, ensure you have Tableau Prep Builder open, then follow these steps:
Important note
If you are using a macOS computer, the Sign In button may remain disabled even though you have populated the appropriate connection details. This could be the result of a driver missing on your device. To find the drivers needed, go to the Tableau Driver Download web page at https://www.tableau.com/en-us/support/drivers?edition=pro&lang=en-us&platform=mac.
With the table on the canvas, we can now continue building out this flow as with any other data connection type.
Tableau Prep has a number of built-in database connections that remove the complexity of connecting by configuring connections such as ODBC manually. The number of supported data connection types is continually expanding, too. If you do not see your database listed, you can always opt to use an ODBC connection instead. See the recipe titled Connecting to JDBC or ODBC data sources in this chapter for more information.
Database tables can relate to each other and be joined to each other to create insightful datasets. We'll cover joins in detail in Chapter 5, Combining Data. For now, I want to highlight a great feature in Tableau Prep that indicates the table primary key and foreign keys in the bottom pane. You can find this information in the Linked Keys section as shown in the following screenshot:
There are three types of keys: primary, foreign, and keys representing both primary and foreign. Tableau Prep shows an icon indicating the key type accordingly:
By completing the steps in this recipe, you have connected Tableau Prep to an on-premises database.
In this recipe, we'll connect to a local Amazon AWS Athena database. Just like on-premises data connections, Tableau has made it as easy as possible to connect securely to cloud data sources. You'll find many connections for popular cloud providers including Microsoft, Google, and Amazon. Each data connection dialog has been customized to the technology you're attempting to connect to. This means you won't see irrelevant fields for the selected connection type, reducing the complexity of cloud connections.
In order to follow along with this recipe, you must have data stored and have access to that data in Amazon AWS Athena.
Tip
Getting set up on AWS Athena is beyond the scope of this book. However, if you wish to explore this option, the simplest way to get started is to create an account at https://aws.amazon.com/, then upload data to S3, and make it available to Athena by using AWS Glue. To use the same sample data as this recipe, download the Sample files 2.5 folder from the book's GitHub repository.
To get started, ensure you have Tableau Prep Builder open, then follow these steps:
a) The Server field for Athena needs to be populated with the region information. The format for this is athena.[region].amazonaws.com. For example, athena.us-east-1.amazonaws.com or athena.eu-west-1.amazonaws.com.
b) The staging directory is where your Athena results are stored in AWS S3 and follows the format s3://[s3 bucket]/[s3 folder]. For example, s3://company/orders.
c) Finally, you'll need your AWS access key information. For information on how to obtain this, see the AWS documentation at https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.
d) You'll also need to install the Amazon Athena JDBC driver, which Tableau provides on its download page at https://www.tableau.com/support/drivers.
By following the steps in this recipe, you are now able to connect Tableau Prep to cloud databases.
Similar to on-premises data connections, Tableau Prep provides a simplified user interface on top of the database driver, so you can easily configure the connection. In this recipe, we've used the Athena JDBC driver in the background and configuring it is as easy as any other connections.
The following screenshot shows the clear mapping between the Athena web interface and the Tableau Prep UI:
Let's move on to the next recipe!
Tableau has two popular proprietary data types, Tableau Data Extract (.tde) and Tableau Hyper Extract (.hyper). Neither format can easily be read, if at all, by most data pipeline and ETL tools. With Tableau Prep Builder, however, you can easily use a Tableau data extract as an input into your flow. In this recipe, we'll connect to a hyper extract. The steps are identical when connecting to a TDE extract.
To follow along with this recipe, download Sample Folder 2.6 from the book's GitHub repository.
To get started, ensure you have Tableau Prep Builder open, then follow these steps:
By following the steps in this recipe, you have learned how to connect to a Tableau extract.
As you've seen in this recipe, connecting to Tableau extracts is very straightforward, as you might expect from the company's own data source type. If you're fully into the Tableau ecosystem and using products such as Prep, Desktop, and Server, extracts are a great way to manage data and performance.
Although Tableau Prep provides out-of-the-box connections to many popular data sources, there's always a chance you might be using another data source type. If this is the case, you can use JDBC and ODBC connections instead.
In order to use such connections, you must first install the requisite driver on the machine running Tableau Prep Builder and configure the connection outside of Tableau Prep. If you intend to publish your flow to Tableau Server, you must also create the same connection on the server itself, using the same DSN.
Important note
Tableau considers JDBC and ODBC unsupported connections and the success results may vary.
In this recipe, we'll walk through using an ODBC connection. The high-level steps for JDBC are largely the same.
In order to follow along with this recipe, you must have installed the appropriate driver for the connection type of your choice and configured the ODBC connection in your operating system.
To get started, ensure you have Tableau Prep Builder open, then follow these steps:
Important note
For ease of demonstration purposes, this recipe shows how to connect to Microsoft SQL Server using ODBC. However, when available, you should always aim to use a supported connection type in Tableau Prep rather than JDBC or ODBC. This ensures optimal performance and reduces the chance of unforeseen errors.
Using JDBC and ODBC tells Tableau Prep to go to your system's respective data source connections, as referred to by the DSN name, and leverage the connection as configured. However, Tableau Prep cannot control these connections, its settings, or the drivers used. Therefore, they are not supported and should be used with caution.
In this recipe, we'll create an output to a file. There are two file outputs supported by Tableau Prep, Comma-Separated Values (.csv) files and Tableau extracts (.hyper). When you're planning to perform downstream analysis with Tableau Desktop, I recommend using Tableau extracts as they have great performance benefits. If, however, you're utilizing it for any other purposes, CSV is a great open format to utilize.
Follow along with the steps in this recipe by downloading the Sample Files 2.8 folder from the book's GitHub repository.
To follow along with the recipe, open up a new flow in Tableau Prep Builder and configure a data input connection, using the Superstore Sales.hyper sample file.
Then, follow along with these steps:
Important note
Saving to Tableau Server: If you'd like to save a hyper extract output to Tableau Server, you can change the owutput from File to Published Data Source. This will then write the hyper file to Tableau Server. This is only possible for hyper files and not for CSV files.
Using the steps in this recipe, you have learned how to write data from Tableau Prep to CSV and hyper files.
Using the Output tool, you can easily write data to CSV and hyper files for use in other analytics applications. If you save your output to Tableau Server, this is a great way to maintain those outputs.
When Tableau Prep was launched, it was only able to output data to files, including hyper extracts. Thankfully, Tableau introduced functionality to write to external databases in release 2020.3. With it, you can write the output of your flow directly to a database.
At the time of writing, the supported output types are SQL Server, Oracle, PostgreSQL, MySQL, Teradata, Snowflake, and Amazon Redshift.
In this recipe, we'll write data to SQL Server. You can write to any of the supported types listed previously. Ensure that you have the appropriate database details and write privileges before you continue.
To get started, open up Tableau Prep Builder and open the Superstore sample flow from the home screen, then follow these steps:
Doing so will raise an error. This is expected, as the new output location has no default configuration and therefore the flow wouldn't work if we were to run it now. As we configure the connection, the error will disappear:
Using the steps in this recipe, you have learned how to write data from Tableau Prep to a database.
Using the Output tool, you can write data to a variety of database platforms. Doing so can add significant value to your use of Tableau Prep and allow you to prepare data for use with tools outside the Tableau ecosystem.
Your flow may process a significant amount of data whenever it runs, taking up system resources, impacting database performance, and taking time to run. Much of your input data may be processed repeatedly as you run your flow. For example, your flow may process data from an order system. Running the flow daily might process all data just to capture the most recently placed orders.
In order to make your flow more efficient, reduce the burden on input databases, and minimize flow runtime, Tableau Prep allows you to set up an incremental refresh. In the example described, an incremental refresh would only process orders that have not previously been processed by Tableau Prep. To achieve this, Tableau Prep compares the data in the flow output to the flow input.
In this recipe, we'll configure a flow to achieve this.
To follow along, open up Tableau Prep Builder and, from the home screen, select the Superstore sample flow.
To get started, select the orders (USCA) input step, and then follow these steps:
Tableau Prep needs to know three bits of information in the input step to configure a incremental refresh.
Next, we need to tell Tableau Prep in which output it can find a field to compare the selected input field with, to determine whether a row is new or not. In this case, the fields are named identically, and so Tableau Prep has automatically selected Order Date as the output field in the Superstore Sales output, which is exactly what we want. No further changes are needed; your incremental refresh for this input is now configured. If you have multiple inputs, an incremental refresh must be configured for each input separately.
Important note
Replacing Output with Incremental Data Only: When you select the Create 'Superstore Sales.hyper', output step notice the Incremental Refresh dropdown in the settings area. There are two options here. By default, Tableau Prep will append data, meaning only the newly processed rows are added. However, you can change this to Create Table to replace any existing output with new output containing only those newly processed rows.
Tableau has achieved a marvelously easy method to process data incrementally by comparing the existing output to the input for a particular field only. This method can save you hours of unnecessarily processing data that's already been processed previously.
Once you have created a flow, you may wish to publish it to Tableau Server or Tableau Online. When you publish a flow, you and others can execute it on-demand with a single click from the Tableau Server interface.
Furthermore, you can increase the transparency of your data flow for your report users, as they will be able to see a diagram of the data flow.
In this recipe, we'll modify a sample flow to output data as a Hyper Extract to Tableau Server, publish the flow to the server, and execute it from the web interface.
To follow along with this recipe, you need access to a Tableau Server instance, either on-premises or with Tableau Online, with the appropriate privileges to publish data sources and flows.
Important note
If you're not sure whether you have the appropriate privileges, you will get an error message when attempting to publish your flow. Contact your Tableau administrator to request that your permissions be modified to allow you to publish data sources and Tableau Prep flows.
Start by opening Tableau, then follow these steps:
Note that Tableau Prep will not show any confirmation dialog to indicate a successful sign in. However, you can go back to the Server menu and, if you are signed in successfully, you will see the URL of your server in the menu instead of the Sign In option, as shown in the following screenshot:
With these steps completed, you've successfully completed this recipe.
In this recipe, we learned how to connect Tableau Prep Builder to your Tableau Server or Tableau Online instance. We published a flow and ran it on the server, with two outputs being written to the server. Publishing flows and data sources to your server is an excellent step toward offering transparency into the data lineage and transformation process to the users of your outputs and all subsequent reports created in Tableau.
Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.
If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.
Please Note: Packt eBooks are non-returnable and non-refundable.
Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:
If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:
Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.
You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.
Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.
When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.
For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.