Home Data Tableau Prep Cookbook

Tableau Prep Cookbook

By Hendrik Kleine
books-svg-icon Book
eBook $29.99 $20.98
Print $43.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $29.99 $20.98
Print $43.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 2: Extract and Load Processes
About this book
Tableau Prep is a tool in the Tableau software suite, created specifically to develop data pipelines. This book will describe, in detail, a variety of scenarios that you can apply in your environment for developing, publishing, and maintaining complex Extract, Transform and Load (ETL) data pipelines. The book starts by showing you how to set up Tableau Prep Builder. You’ll learn how to obtain data from various data sources, including files, databases, and Tableau Extracts. Next, the book demonstrates how to perform data cleaning and data aggregation in Tableau Prep Builder. You’ll also gain an understanding of Tableau Prep Builder and how you can leverage it to create data pipelines that prepare your data for downstream analytics processes, including reporting and dashboard creation in Tableau. As part of a Tableau Prep flow, you’ll also explore how to use R and Python to implement data science components inside a data pipeline. In the final chapter, you’ll apply the knowledge you’ve gained to build two use cases from scratch, including a data flow for a retail store to prepare a robust dataset using multiple disparate sources and a data flow for a call center to perform ad hoc data analysis. By the end of this book, you’ll be able to create, run, and publish Tableau Prep flows and implement solutions to common problems in data pipelines.
Publication date:
March 2021
Publisher
Packt
Pages
288
ISBN
9781800563766

 

Chapter 2: Extract and Load Processes

Tableau Prep Builder allows you to connect to a large variety of industry-leading data platforms.

When Tableau Prep was initially launched, the only outputs supported were flat files and Tableau native formats TDE and Hyper. However, since the 2020.2.3 release in August 2020, you can output to database platforms such as SQL Server as well, making Tableau Prep a much more appealing tool. It allows you to do the following:

  • Ingest and output to flat files and a variety of database platforms
  • Learn how to work with Tableau extracts

In this chapter, you will find recipes to ingest and output the data you require from and to a wide variety of data types:

  • Connecting to text and Excel files
  • Connecting to PDF files
  • Connecting to SAS, SPSS, and R files
  • Connecting to on-premises databases
  • Connecting to cloud databases
  • Connecting to Tableau extracts
  • Connecting to JDBC or ODBC data sources
  • Writing data to CSV and Hyper files
  • Writing data to databases
  • Setting up an incremental refresh

If you are connecting a BI tool such as Tableau Desktop to your data, selecting the appropriate output type may impact your Tableau workbook performance.

 

Technical requirements

To follow along with the recipes in this chapter, you will require Tableau Prep Builder version 2020.2.3 or later. If you'd like to follow along and connect or write to a database, you must have the appropriate database set up and have an account with the appropriate access privileges.

The recipes in this chapter use sample data files, which you can download from the book's GitHub repository: https://github.com/PacktPublishing/Tableau-Prep-Cookbook.

 

Connecting to text and Excel files

In this recipe, we'll connect to a Comma-Separated Values (CSV) file containing sales transactions and create a second connection to multiple Excel files. These connection types are very similar and so we'll cover them in one recipe. However, there are key features to both, which we'll highlight.

Getting ready

To follow along with the recipe, download the Sample Files 2.1 folder from this book's GitHub repository.

How to do it…

To get started, ensure you have the sample CSV and/or Excel file(s) ready on your computer and open up Tableau Prep Builder:

  1. From the Tableau Prep Builder home screen, click the Connect to Data button and subsequently select Text file from the Connect pane:
    Figure 2.1 – Connect to a text file

    Figure 2.1 – Connect to a text file

  2. Tableau Prep will bring up the file selection window next. From here, navigate to our sample file, December 2016 Sales.csv, and open it:

    Figure 2.2 – Select December 2016 Sales.csv

    Figure 2.2 – Select December 2016 Sales.csv

    Once selected, Tableau Prep will automatically create a new flow with the data connection in it:

    Figure 2.3 – A new flow is created when selecting any text file

    Figure 2.3 – A new flow is created when selecting any text file

    Since the data connection is automatically selected, all options onscreen now relate to that particular connection. On the left-hand side, you can see the Tables pane. Note that there is only one table, equal to the filename. Since text files do not contain tables, this is by design. Tableau Prep Builder will always provide a generic user interface for data connections whenever possible. Once you're comfortable with one connection type, others should be easier to master:

    Figure 2.4 – Text files such as CSV always have a single table

    Figure 2.4 – Text files such as CSV always have a single table

    In the bottom pane, you can find a summary of all data fields identified in your text file, along with the automatically determined data type. In Chapter 3, Cleaning Transformations, we'll dive into the cleaning options you can perform here:

    Figure 2.5 – Field summary

    Figure 2.5 – Field summary

  3. In the same bottom pane, you can configure the data connection settings. For text files, you'll always want to verify the Text Options section. Tableau Prep will automatically set these values as best as possible, but I recommend you verify them before you continue. The word header refers to the first row in your dataset. If you do not have headers in your dataset, you can select Generate field names automatically, which will create headers named F1, F2, F3, and so on. You can rename those fields later on. Field Separator tells Tableau how columns are defined in a CSV file, which is usually a comma or pipe symbol. Text Qualifier tells Tableau Prep which characters indicate the start and end of a value or string.

    Finally, Character Set and Locale are typically identified appropriately but you can alter them here as needed:

    Figure 2.6 – Connection settings

    Figure 2.6 – Connection settings

  4. Now that we have connected a CSV file, let's create a second connection to an Excel file, in the same flow. To start, click the + icon in the Connections pane and select Microsoft Excel:
    Figure 2.7 – Adding a second data connection

    Figure 2.7 – Adding a second data connection

  5. Identical to the selection of a text file, browse to and select our sample file named December 2016 Sales.xlsx.

    Once we've selected the file, Tableau Prep Builder does not automatically show another data connection in the flow, as it did for our CSV file. This is the default behavior for any data connection that has multiple tables. In the case of Microsoft Excel, each Excel sheet is considered a table:

    Figure 2.8 – Connections with tables require table selection before you can continue

    Figure 2.8 – Connections with tables require table selection before you can continue

  6. In order to continue, we must drag a table, or sheet, onto the flow canvas to finalize the data connection. Proceed by dragging in the Sales_Data table. Once added, you'll notice the color of this connection is different from the text file connection we made earlier. Tableau Prep Builder randomly assigns a color to the various data flows for easy recognition.

    The colors do not denote any kind of functionality:

    Figure 2.9 – A flow with multiple data connections

    Figure 2.9 – A flow with multiple data connections

  7. Once connected, you'll notice the options specific to text file connections no longer appear. However, the layout remains the same. A function common to both text and Excel files is the ability to ingest multiple files simultaneously. Select the Multiple Files tab for this function:
    Figure 2.10 – Multiple Files tab

    Figure 2.10 – Multiple Files tab

  8. From here, select Wildcard union to reveal the options:
    Figure 2.11 – Multiple Files options

    Figure 2.11 – Multiple Files options

  9. Here, we can opt to include files in subfolders from the selected folder, which defaults to the folder where our Excel file is located. Select the Include subfolders option to enable this. Let's assume we want to include all sheets named Sales_Data, in all files ending in 2016 Sales.xlsx. To do so, we can use the asterisk symbol as a wildcard and set the file Matching Pattern property to *2016 Sales.xlx and the sheet Matching Pattern property to Sales_Data:
Figure 2.11 – Multiple Files options

Figure 2.12 – Using wildcards to ingest multiple files at the same time

As a result, this step will now ingest all files in our subfolder named Archive and combine the data:

Figure 2.13 – All sample files here will be ingested at the same time using a single connection

Figure 2.13 – All sample files here will be ingested at the same time using a single connection

By completing these steps, you have learned how to connect Tableau Prep to text and Excel files.

How it works…

Tableau Prep text files and Microsoft Excel connections automatically detect most settings very well, so, in most cases, a couple of clicks will get you up and running. The most powerful feature is undoubtedly the ability to ingest multiple files at the same time. You can ingest hundreds of files at the same time using this method, using a single data connection.

 

Connecting to PDF files

In this recipe, we'll connect to a PDF file containing text and a table with data. Tableau Prep has an exciting feature that can automatically detect the presence of tables in PDF files and extract the data for you.

Getting ready

To follow along with the recipe, download the Sample Files 2.2 folder from the book's GitHub repository.

How to do it…

To get started, ensure you have the sample PDF file ready on your computer, and open Tableau Prep Builder:

  1. Tableau Prep Builder will not show us the entire PDF document, so it's best to open it in a PDF viewer and review what data we want to extract from our PDF. In our example document here, we have a single table and so we expect a table in Tableau Prep with the headers Department and Amount:
    Figure 2.14 – Sample PDF file with a table embedded in it

    Figure 2.14 – Sample PDF file with a table embedded in it

  2. In Tableau Prep Builder, select the Connect to Data button, followed by PDF file to open the file browse dialog and select our sample PDF file, Sales Summary.pdf:
    Figure 2.15 – Select PDF file from the Connect pane

    Figure 2.15 – Select PDF file from the Connect pane

  3. Once connected, Tableau Prep Builder will automatically detect the tables within the PDF file. In our sample, we can see the Department and Amount fields coming through as expected:

    Figure 2.16 – PDF tables are automatically extracted

    Figure 2.16 – PDF tables are automatically extracted

  4. Each table is listed separately in the Tables part of the Connections pane to the left. This allows you to digest PDF files with multiple tables within them just as easily. The name of the table is automatically generated and refers to the page number in the PDF file and its position on the page:
Figure 2.17 – Tableau Prep can detect multiple tables in a single PDF file

Figure 2.17 – Tableau Prep can detect multiple tables in a single PDF file

In this recipe, you have learned how to connect to PDF files and extract data for processing in Tableau Prep.

How it works…

Tableau Prep converts each table in a PDF document into a data table when ingesting the file into a new flow. As such, Tableau Prep removes the complexity of parsing PDF documents and allows you to treat this like any other data connection.

 

Connecting to SAS, SPSS, and R files

In this recipe, we'll connect to a statistical file. Tableau Prep offers fantastic integration with popular statistical files from SAS (.sas7bdat), SPSS (.sav), and R (.rdata, .rda).

I advocate the use of open file formats such as CSV or commonly used standards such as Excel. However, if you are unable to obtain your data in such a format from your data science partner, this connector may offer a solution.

Getting ready

In this recipe, we'll connect to an R file using the statistical file connector. In order to follow along, download the Sample Files 2.3 folder from the book's GitHub repository.

How to do it…

To get started, ensure you have the sample RData file available on your computer. From the Tableau Prep home screen follow these steps:

  1. Click the Connect to Data button and select Statistical file.
  2. From the browse file window, locate and open our statistical file named December 2016 Sales.Rdata.

    And with just these few steps, Tableau Prep Builder has added the statistical file source to a new flow:

Figure 2.18 – Flow with a statistical file connection

Figure 2.18 – Flow with a statistical file connection

Most options in the bottom pane are identical to those when processing Excel files. However, there is a small but important feature absent. You cannot alter the data type of the fields in the statistical file connection step. In order to do this, you have to use a cleaning step, which we'll discuss in Chapter 3, Cleaning Transformations:

Figure 2.19 – Data types cannot be altered directly in the statistical file connection step

Figure 2.19 – Data types cannot be altered directly in the statistical file connection step

In this recipe, you have learned how to add Tableau Prep to a data science workflow by connecting to data produced by popular statistics applications.

How it works…

Tableau Prep unpacks statistical files when you connect to them and, from that moment on, allows you to leverage them like any other connection.

There's more…

There are some limitations when it comes to connecting to statistical files. If you run into any connection issues, I recommend you refer to the following section of the Tableau documentation online: https://help.tableau.com/current/pro/desktop/en-us/examples_statfile.htm.

 

Connecting to on-premises databases

In this recipe, we'll connect to a Microsoft SQL Server database. The many connectors provided out of the box by Tableau Prep allow you to connect to databases almost as easily as to file connections, allowing you to quickly start an advanced flow.

Getting ready

In order to follow along with this recipe, download the Sample Files 2.4 folder from the book's GitHub repository. This folder contains the Wide World Importers sample database backup (.BAK) file, which you can restore to your SQL Server instance.

Note that the provided BAK file is suitable for SQL Server 2016 SP1 or later. If you're running an earlier version or need instructions on installation, please consult Microsoft's support page at https://docs.microsoft.com/en-us/sql/samples/wide-world-importers-oltp-install-configure.

How to do it…

To get started, ensure you have Tableau Prep Builder open, then follow these steps:

  1. From the home screen, click the Connect to Data button to bring up the Connect pane. From here, select the search field and type in SQL to instantly filter the available connections.
  2. From the filtered selection of connections, select Microsoft SQL Server. This will bring up the Connection dialog.
  3. In this dialog, enter your connection details. Depending on your server, these details will vary. If you're not sure about these details, please contact your database administrator. Click Sign In to continue.

    Important note

    If you are using a macOS computer, the Sign In button may remain disabled even though you have populated the appropriate connection details. This could be the result of a driver missing on your device. To find the drivers needed, go to the Tableau Driver Download web page at https://www.tableau.com/en-us/support/drivers?edition=pro&lang=en-us&platform=mac.

  4. Once the connection has been established, Tableau Prep will show a dropdown in the Connections pane of all databases on the server you've selected. Select your database to reveal the available tables:
    Figure 2.20 – Database selection

    Figure 2.20 – Database selection

  5. From the list of available tables, we need to select the table we'd like to ingest into our flow and drag it onto the canvas. Let's drag in the table named Order:
Figure 2.21 – Drag a table onto the canvas

Figure 2.21 – Drag a table onto the canvas

With the table on the canvas, we can now continue building out this flow as with any other data connection type.

How it works…

Tableau Prep has a number of built-in database connections that remove the complexity of connecting by configuring connections such as ODBC manually. The number of supported data connection types is continually expanding, too. If you do not see your database listed, you can always opt to use an ODBC connection instead. See the recipe titled Connecting to JDBC or ODBC data sources in this chapter for more information.

There's more…

Database tables can relate to each other and be joined to each other to create insightful datasets. We'll cover joins in detail in Chapter 5, Combining Data. For now, I want to highlight a great feature in Tableau Prep that indicates the table primary key and foreign keys in the bottom pane. You can find this information in the Linked Keys section as shown in the following screenshot:

Figure 2.22 – Linked Keys for database connections

Figure 2.22 – Linked Keys for database connections

There are three types of keys: primary, foreign, and keys representing both primary and foreign. Tableau Prep shows an icon indicating the key type accordingly:

Figure 2.23 – Database key types

Figure 2.23 – Database key types

By completing the steps in this recipe, you have connected Tableau Prep to an on-premises database.

 

Connecting to cloud databases

In this recipe, we'll connect to a local Amazon AWS Athena database. Just like on-premises data connections, Tableau has made it as easy as possible to connect securely to cloud data sources. You'll find many connections for popular cloud providers including Microsoft, Google, and Amazon. Each data connection dialog has been customized to the technology you're attempting to connect to. This means you won't see irrelevant fields for the selected connection type, reducing the complexity of cloud connections.

Getting ready

In order to follow along with this recipe, you must have data stored and have access to that data in Amazon AWS Athena.

Tip

Getting set up on AWS Athena is beyond the scope of this book. However, if you wish to explore this option, the simplest way to get started is to create an account at https://aws.amazon.com/, then upload data to S3, and make it available to Athena by using AWS Glue. To use the same sample data as this recipe, download the Sample files 2.5 folder from the book's GitHub repository.

How to do it…

To get started, ensure you have Tableau Prep Builder open, then follow these steps:

  1. From the home screen, click the Connect to Data button, then search for Athena in the Connect pane. Select Amazon Athena to continue.
  2. In the Connection dialog, enter the details for your AWS Athena instance and click Sign In to continue:

    a) The Server field for Athena needs to be populated with the region information. The format for this is athena.[region].amazonaws.com. For example, athena.us-east-1.amazonaws.com or athena.eu-west-1.amazonaws.com.

    b) The staging directory is where your Athena results are stored in AWS S3 and follows the format s3://[s3 bucket]/[s3 folder]. For example, s3://company/orders.

    c) Finally, you'll need your AWS access key information. For information on how to obtain this, see the AWS documentation at https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

    d) You'll also need to install the Amazon Athena JDBC driver, which Tableau provides on its download page at https://www.tableau.com/support/drivers.

  3. Next, select the appropriate Catalogue from the dropdown. In Athena terminology, this is the data source:
    Figure 2.24 – Select the Athena data source from the Catalogue dropdown

    Figure 2.24 – Select the Athena data source from the Catalogue dropdown

  4. In the last step, select the database of your choice and drag the table you need onto the flow canvas. In our example, I've selected a database named opssalesdb and dragged a table named results onto the flow canvas:
Figure 2.25 – Selecting an Athena table

Figure 2.25 – Selecting an Athena table

By following the steps in this recipe, you are now able to connect Tableau Prep to cloud databases.

How it works…

Similar to on-premises data connections, Tableau Prep provides a simplified user interface on top of the database driver, so you can easily configure the connection. In this recipe, we've used the Athena JDBC driver in the background and configuring it is as easy as any other connections.

There's more…

The following screenshot shows the clear mapping between the Athena web interface and the Tableau Prep UI:

Figure 2.26 – Mapping Athena terminology to Tableau Prep

Figure 2.26 – Mapping Athena terminology to Tableau Prep

Let's move on to the next recipe!

 

Connecting to Tableau extracts

Tableau has two popular proprietary data types, Tableau Data Extract (.tde) and Tableau Hyper Extract (.hyper). Neither format can easily be read, if at all, by most data pipeline and ETL tools. With Tableau Prep Builder, however, you can easily use a Tableau data extract as an input into your flow. In this recipe, we'll connect to a hyper extract. The steps are identical when connecting to a TDE extract.

Getting ready

To follow along with this recipe, download Sample Folder 2.6 from the book's GitHub repository.

How to do it…

To get started, ensure you have Tableau Prep Builder open, then follow these steps:

  1. From the home screen, click the Connect to Data button. From the Connect pane, select Tableau extract. This connection type is suited to both TDE and hyper extracts.
  2. From the file browse dialog, select and open our Hyper file named Superstore Sales.hyper.
  3. When the hyper extract has a single table, Tableau Prep will automatically add that table to our flow. If the extract has multiple tables, all we need to do is drag the desired table onto our flow canvas to complete the connection. In this example, the extract contains a single table and is added to the flow by Tableau, completing our input configuration:
Figure 2.27 – Completed connection to Tableau extract

Figure 2.27 – Completed connection to Tableau extract

By following the steps in this recipe, you have learned how to connect to a Tableau extract.

How it works…

As you've seen in this recipe, connecting to Tableau extracts is very straightforward, as you might expect from the company's own data source type. If you're fully into the Tableau ecosystem and using products such as Prep, Desktop, and Server, extracts are a great way to manage data and performance.

 

Connecting to JDBC or ODBC data sources

Although Tableau Prep provides out-of-the-box connections to many popular data sources, there's always a chance you might be using another data source type. If this is the case, you can use JDBC and ODBC connections instead.

In order to use such connections, you must first install the requisite driver on the machine running Tableau Prep Builder and configure the connection outside of Tableau Prep. If you intend to publish your flow to Tableau Server, you must also create the same connection on the server itself, using the same DSN.

Important note

Tableau considers JDBC and ODBC unsupported connections and the success results may vary.

In this recipe, we'll walk through using an ODBC connection. The high-level steps for JDBC are largely the same.

Getting ready

In order to follow along with this recipe, you must have installed the appropriate driver for the connection type of your choice and configured the ODBC connection in your operating system.

How to do it…

To get started, ensure you have Tableau Prep Builder open, then follow these steps:

  1. From the home screen, click the Connect to Data button and scroll to the very bottom of the Connect pane to select Other Databases (ODBC):
    Figure 2.28 – Selecting the ODBC connection type

    Figure 2.28 – Selecting the ODBC connection type

  2. The connection dialog for ODBC connections is very simple as the connection details are already captured in the DSN you have pre-configured on your system. All you need to do is select the DSN of your choice, and enter your credentials prior to selecting Sign In.
  3. Once connected, the options on the flow screen will depend on your data source type. In this example, our ODBC connects to Microsoft SQL Server and shows the expected selection options, including Database, Schema, and Tables. Once you've added your desired table to the flow, you can treat this ODBC as any other connection type.

    Important note

    For ease of demonstration purposes, this recipe shows how to connect to Microsoft SQL Server using ODBC. However, when available, you should always aim to use a supported connection type in Tableau Prep rather than JDBC or ODBC. This ensures optimal performance and reduces the chance of unforeseen errors.

How it works…

Using JDBC and ODBC tells Tableau Prep to go to your system's respective data source connections, as referred to by the DSN name, and leverage the connection as configured. However, Tableau Prep cannot control these connections, its settings, or the drivers used. Therefore, they are not supported and should be used with caution.

 

Writing data to CSV and Hyper files

In this recipe, we'll create an output to a file. There are two file outputs supported by Tableau Prep, Comma-Separated Values (.csv) files and Tableau extracts (.hyper). When you're planning to perform downstream analysis with Tableau Desktop, I recommend using Tableau extracts as they have great performance benefits. If, however, you're utilizing it for any other purposes, CSV is a great open format to utilize.

Getting ready

Follow along with the steps in this recipe by downloading the Sample Files 2.8 folder from the book's GitHub repository.

How to do it…

To follow along with the recipe, open up a new flow in Tableau Prep Builder and configure a data input connection, using the Superstore Sales.hyper sample file.

Then, follow along with these steps:

  1. Hover your mouse over the data input step in your flow and click the + icon. The context menu allows you to select a step to be added to your flow in order to build your pipeline. In this recipe, we're focusing solely on the output, so select the Output option:
    Figure 2.29 – Adding an Output step to your flow

    Figure 2.29 – Adding an Output step to your flow

  2. Tableau Prep will instantly add the Output step and select it, which brings up the bottom pane where the output configuration is visible, as well as a data preview.
  3. In the output settings, the default configuration is always File and the type is Tableau Data Extract (.hyper). We can change the output Name, which is the filename, Location, and Output type properties here. The only other available type is Comma-Separated Values (.csv). Let's change the location to the same folder as our input file:

    Figure 2.30 – File output settings

    Figure 2.30 – File output settings

    Important note

    Saving to Tableau Server: If you'd like to save a hyper extract output to Tableau Server, you can change the owutput from File to Published Data Source. This will then write the hyper file to Tableau Server. This is only possible for hyper files and not for CSV files.

  4. The only difference in settings between these two output types is the ability to append an existing file. This option is only available for Tableau data extracts and will write the data as new rows to an existing hyper file:
    Figure 2.31 – Append to table is only available for Tableau hyper extracts

    Figure 2.31 – Append to table is only available for Tableau hyper extracts

  5. We can adjust our output settings as desired. In this example, let's leave all the default settings as they are. When ready, click Run Flow to execute your flow and generate the output. When done, Tableau Prep will show a success message.
  6. Let's browse our filesystem and verify that the output has been produced:

    Figure 2.32 – The output generated by Tableau Prep

    Figure 2.32 – The output generated by Tableau Prep

  7. Anytime you've created a hyper extract, you can easily validate the extract by connecting to it in Tableau Desktop, using it as a data source:
Figure 2.33 – Using Tableau Prep hyper output as a data source in Tableau Desktop

Figure 2.33 – Using Tableau Prep hyper output as a data source in Tableau Desktop

Using the steps in this recipe, you have learned how to write data from Tableau Prep to CSV and hyper files.

How it works…

Using the Output tool, you can easily write data to CSV and hyper files for use in other analytics applications. If you save your output to Tableau Server, this is a great way to maintain those outputs.

 

Writing data to databases

When Tableau Prep was launched, it was only able to output data to files, including hyper extracts. Thankfully, Tableau introduced functionality to write to external databases in release 2020.3. With it, you can write the output of your flow directly to a database.

At the time of writing, the supported output types are SQL Server, Oracle, PostgreSQL, MySQL, Teradata, Snowflake, and Amazon Redshift.

Getting ready

In this recipe, we'll write data to SQL Server. You can write to any of the supported types listed previously. Ensure that you have the appropriate database details and write privileges before you continue.

How to do it…

To get started, open up Tableau Prep Builder and open the Superstore sample flow from the home screen, then follow these steps:

  1. The Superstore flow has two output steps. Delete the second output by right-clicking the step and selecting Remove:
    Figure 2.34 – Removing a step from a flow

    Figure 2.34 – Removing a step from a flow

  2. Select the remaining output step, Create 'Superstore Sales.hyper', to bring up the bottom pane with the configuration options:
    Figure 2.35 – Output configuration options

    Figure 2.35 – Output configuration options

  3. Change the default output type from File to Database table:
    Figure 2.36 – Changing the output type

    Figure 2.36 – Changing the output type

    Doing so will raise an error. This is expected, as the new output location has no default configuration and therefore the flow wouldn't work if we were to run it now. As we configure the connection, the error will disappear:

    Figure 2.37 – Incomplete output configurations cause an error

    Figure 2.37 – Incomplete output configurations cause an error

  4. From the Select a Server dropdown, select your server type. For this recipe, let's select Microsoft SQL Server.
  5. When you've selected a database type, you'll be presented with the same Connection dialog as this type would show for an input step. Populate the dialog with your server details and click Sign In to continue.
  6. Once signed in, the Database dropdown becomes visible. From here, select the database to which you have write privileges. In my example, I will select Test Database, which I created for testing purposes.
  7. Next, you can select an existing table to write to from the Table dropdown or create a brand-new table. When creating a new table, you can use the format [schema].[table] to ensure you create the table in the appropriate schema. In this example, I'll create a new table, superstore.test:
    Figure 2.38 – Use [schema].[table] format to create a new table

    Figure 2.38 – Use [schema].[table] format to create a new table

  8. Finally, select the desired Refresh option. You can choose from Create table, Append to table, and Replace data. Make sure you carefully select the option appropriate to you, to prevent accidental deletion of database data. In this example, I'll select Append to table, which will create my superstore.test table in the process as it does not yet exist:
    Figure 2.39 – Table refresh options

    Figure 2.39 – Table refresh options

  9. When you're ready, click Run Flow to execute the flow and write the output to the database:
    Figure 2.40 – Output successfully written to database

    Figure 2.40 – Output successfully written to database

  10. Using your favorite IDE, verify that the database table now exists, and that data has been written to it. I'm using Azure Data Studio with the query SELECT TOP(100) * FROM [superstore.test]. If all went well, your output will be successful:
Figure 2.41 – Verifying the output in the database

Figure 2.41 – Verifying the output in the database

Using the steps in this recipe, you have learned how to write data from Tableau Prep to a database.

How it works…

Using the Output tool, you can write data to a variety of database platforms. Doing so can add significant value to your use of Tableau Prep and allow you to prepare data for use with tools outside the Tableau ecosystem.

 

Setting up an incremental refresh

Your flow may process a significant amount of data whenever it runs, taking up system resources, impacting database performance, and taking time to run. Much of your input data may be processed repeatedly as you run your flow. For example, your flow may process data from an order system. Running the flow daily might process all data just to capture the most recently placed orders.

In order to make your flow more efficient, reduce the burden on input databases, and minimize flow runtime, Tableau Prep allows you to set up an incremental refresh. In the example described, an incremental refresh would only process orders that have not previously been processed by Tableau Prep. To achieve this, Tableau Prep compares the data in the flow output to the flow input.

In this recipe, we'll configure a flow to achieve this.

Getting ready

To follow along, open up Tableau Prep Builder and, from the home screen, select the Superstore sample flow.

How to do it…

To get started, select the orders (USCA) input step, and then follow these steps:

  1. From the bottom pane, select the Settings tab, then scroll to the bottom to reveal the Incremental Refresh setting and check the Enable incremental refresh box. This will result in an error message, which will disappear as we configure the incremental refresh in the next steps:
    Figure 2.42 – Incremental Refresh settings

    Figure 2.42 – Incremental Refresh settings

    Tableau Prep needs to know three bits of information in the input step to configure a incremental refresh.

  2. Firstly, which field indicates whether or not a row in the data is new. In this example, we want to identify new Superstore rows by Order Date. Select this from the Input field dropdown to reveal the additional settings:
Figure 2.43 – Incremental Refresh field settings

Figure 2.43 – Incremental Refresh field settings

Next, we need to tell Tableau Prep in which output it can find a field to compare the selected input field with, to determine whether a row is new or not. In this case, the fields are named identically, and so Tableau Prep has automatically selected Order Date as the output field in the Superstore Sales output, which is exactly what we want. No further changes are needed; your incremental refresh for this input is now configured. If you have multiple inputs, an incremental refresh must be configured for each input separately.

Important note

Replacing Output with Incremental Data Only: When you select the Create 'Superstore Sales.hyper', output step notice the Incremental Refresh dropdown in the settings area. There are two options here. By default, Tableau Prep will append data, meaning only the newly processed rows are added. However, you can change this to Create Table to replace any existing output with new output containing only those newly processed rows.

How it works…

Tableau has achieved a marvelously easy method to process data incrementally by comparing the existing output to the input for a particular field only. This method can save you hours of unnecessarily processing data that's already been processed previously.

 

Publishing a flow to Tableau Server

Once you have created a flow, you may wish to publish it to Tableau Server or Tableau Online. When you publish a flow, you and others can execute it on-demand with a single click from the Tableau Server interface.

Furthermore, you can increase the transparency of your data flow for your report users, as they will be able to see a diagram of the data flow.

In this recipe, we'll modify a sample flow to output data as a Hyper Extract to Tableau Server, publish the flow to the server, and execute it from the web interface.

Getting ready

To follow along with this recipe, you need access to a Tableau Server instance, either on-premises or with Tableau Online, with the appropriate privileges to publish data sources and flows.

Important note

If you're not sure whether you have the appropriate privileges, you will get an error message when attempting to publish your flow. Contact your Tableau administrator to request that your permissions be modified to allow you to publish data sources and Tableau Prep flows.

How to do it…

Start by opening Tableau, then follow these steps:

  1. Before we get to work, we have to connect to Tableau Server. To do this, select Sign In... from the Server menu and then enter your server URL or select Tableau Online. In both cases, you'll be prompted to enter your server credentials. Enter your credentials and click Sign In:
    Figure 2.44 – Sign in to Tableau Server or Tableau Online

    Figure 2.44 – Sign in to Tableau Server or Tableau Online

    Note that Tableau Prep will not show any confirmation dialog to indicate a successful sign in. However, you can go back to the Server menu and, if you are signed in successfully, you will see the URL of your server in the menu instead of the Sign In option, as shown in the following screenshot:

    Figure 2.45 – Sign in to Tableau Server or Tableau Online

    Figure 2.45 – Sign in to Tableau Server or Tableau Online

  2. From the Tableau Prep welcome screen, open the Superstore flow from the Sample Flows section in the bottom left. This flow has two outputs, Create 'Superstore Sales.hyper' and Create 'Annual Regional Performance.hyper', as shown in the following screenshot:
    Figure 2.46 – Open up the Superstore Sample flow

    Figure 2.46 – Open up the Superstore Sample flow

  3. By default, these two outputs write a Hyper output to a location on your computer. In this recipe, we'll change these outputs to write data to Tableau Server instead. To do this, select each output step and change the Save output to setting to Published data source, as shown in the following screenshot. Because we are already signed in to Tableau Server, the Server address is populated instantly. We can select a project from the Project dropdown. In this example, we'll use the default project as our destination:
    Figure 2.47 – Change both outputs to Published data source

    Figure 2.47 – Change both outputs to Published data source

  4. Next, let's save our flow to Tableau Server. From the Server menu, select Publish Flow. Set the project set to default and the flow name to Superstore. Click Publish when ready, to save your flow to the server:
    Figure 2.48 – Publish your flow to Tableau Server

    Figure 2.48 – Publish your flow to Tableau Server

  5. When the publishing process completes, Tableau Prep will automatically launch your default browser and open up the flow. As shown in the following screenshot, a visual of the flow is present to help users understand the data flow at a glance. Click the ellipsis next to the flow name to open up the flow context menu. In the menu, select Run Now. When prompted with a confirmation dialog, select Run Now again to execute your flow:
    Figure 2.49 – Run your flow on Tableau Server

    Figure 2.49 – Run your flow on Tableau Server

  6. The Superstore sample flow does not consume many resources and depending on your server's configuration should take a couple of minutes to complete. Refresh the page to see the most recent status. When the flow has completed, each output will display a Succeeded status, as shown in the following screenshot:
    Figure 2.50 – View the status of your flow

    Figure 2.50 – View the status of your flow

  7. You can navigate to the default project folder by selecting default in the top-left corner. In the project folder, you'll now see the two outputs generated by our flow, Annual regional performance and Superstore Sales, as shown in the following screenshot:
Figure 2.51 – Confirm the outputs have been created

Figure 2.51 – Confirm the outputs have been created

With these steps completed, you've successfully completed this recipe.

How it works…

In this recipe, we learned how to connect Tableau Prep Builder to your Tableau Server or Tableau Online instance. We published a flow and ran it on the server, with two outputs being written to the server. Publishing flows and data sources to your server is an excellent step toward offering transparency into the data lineage and transformation process to the users of your outputs and all subsequent reports created in Tableau.

About the Author
  • Hendrik Kleine

    Hendrik Kleine is an advanced analytics leader with 15 years of experience in the analytics space, including in data architecture, engineering, and visualization. He specializes in translating vast amounts of data into easy-to-understand visual communications that provide actionable intelligence. He is an avid innovator and a listed author of multiple data-related inventions. Before COVID-19, he was a speaker at the most recent Tableau conference in San Francisco.

    Browse publications by this author
Latest Reviews (1 reviews total)
well written with access to mages important
Tableau Prep Cookbook
Unlock this book and the full library FREE for 7 days
Start now