Building a Data Mart with Pentaho Data Integration [Video]

Diethard Steiner

Building a Data Mart with Pentaho Data Integration [Video]
Downloadable video: $39.99
save 15%!

Packt Video. Stream online or download for unrestricted offline use. Learn more

Course Contents
The Author
Sample Clip
  • New! Packt Video courses: practical screencast-based tutorials that show you how to get the job done. Bite sized chunks, hands on instructions, and powerful results.
  • Learn how to create ETL transformations to populate a star schema in a short span of time
  • Create a fully-functional ETL process using a practical approach
  • Follow the step-by-step instructions for creating an ETL based on a fictional company – get your hands dirty and learn fast

Video Details

Language : English
Release Date : Tuesday, December 31, 2013
Course Length : 1 hour and 50 minutes
ISBN : 178216863X
ISBN 13 : 9781782168638
Author(s) : Diethard Steiner
Topics and Technologies : Video, e-Learning, Open Source

Table of Contents

  1. Getting Started  [18:25 minutes]
    • The Second-hand Lens Store Sample Data
    • The Derived Star Schema
    • Setting up Our Development Environment

  2. Agile BI – Creating ETLs to Prepare Joined Data Set [12:27 minutes]
    • Importing Raw Data
    • Exporting Data Using the Standard Table Output Step
    • Exporting Data Using the Dedicated Bulk Loading Step

  3. Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL Improvements [11:29 minutes]
    • Creating a Pentaho Analysis Model
    • Analyzing the Data Using the Pentaho Analyzer
    • Improving Your ETL for Better Data Quality

  4. Slowly Changing Dimensions [17:03 minutes]
    • Creating a Slowly Changing Dimension of Type 1 Using the Insert/Update Step
    • Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update Step
    • Creating a Slowly Changing Dimension Type 2

  5. Populating Data Dimension [16:10 minutes]
    • Defining Start and End date Parameters
    • Auto-generating Daily rows for a Given Date Period
    • Auto-generating Year, Month, Day and so on.

  6. Creating the Fact Transformation [14:28 minutes]
    • Sourcing Raw Data for Fact Table
    • Look up Slowly Changing Dimension of the Type 1 Key
    • Look up Slowly Changing Dimension of the Type 2 key

  7. Orchestration [10:29 minutes]
    • Loading Dimensions in Parallel
    • Creating Master Jobs

  8. ID-based Change Data Capture [9:46 minutes]
    • Implementing Change Data Capture (CDC)
    • Creating a CDC Job Flow

  9. Final Touches: Logging and Scheduling [11:14 minutes]
    • Setting up a Dedicated DB Schema
    • Setting up Built-in Logging
    • Scheduling on the Command Line

Diethard Steiner

Diethard Steiner, currently working as an independent Senior Consultant in London, U.K, has specialized in the field of open source business intelligence solutions for many years. Diethard has been very passionate about his work, regularly publishing tutorials on his blog, which over the years has gained a loyal following.

He has implemented end-to-end solutions (from data integration to reporting and dashboards) for several clients and projects, and has gained a deep understanding of the requirements and challenges of such solutions.

Sorry, we don't have any reviews for this video yet.

Sorry, there are currently no downloads available for this video.

Code Downloads

Download the code and support files for this video.

Support, complaints and feedback.

Packt is committed to making Packt Video courses a valuable, useful way for IT professionals to learn new skills. We have made every effort to ensure that this course reaches the required standard and will work on our customer's devices. Please go to our support page.

What you will learn from this video course

  • Create a star schema
  • Populate and maintain slowly changing dimensions type 1 and type 2
  • Load fact and dimension tables in an efficient manner
  • Use a columnar database to store the data for the star schema
  • Analyze the quality of the data in an agile manner
  • Implement logging and scheduling for the ETL process
  • Get an overview of the whole process: from source data to the end user analyzing the data
  • Learn how to auto-generate data for a date dimension

Who this video course is for

If you are are eager to learn how to create an ETL process to populate a star schema, and at the end of the course you want to be in a position to apply your new knowledge to your specific business requirements, then "Building a Data Mart with Pentaho Data Integration" is for you. You need to have a basic understanding of star schemas and Pentaho Data Integration to take the next step: setting everything into practice.

In Detail

Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.

Building a Data Mart with Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.

Building a Data Mart with Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.

By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.


Packt video courses are designed to cover the breadth of the topic in short, hands-on, task-based videos. Each course is divided into short manageable sections, so you can watch the whole thing or jump to the bit you need. The focus is on practical instructions and screencasts showing you how to get the job done.

Follow carefully organized sequences of instructions that outline how to leverage the power of Pentaho Data Integration in a simple and practical approach.

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software