Free Sample
+ Collection

Building a Data Mart with Pentaho Data Integration [Video]

Diethard Steiner

A step-by-step tutorial that takes you through the creation of an ETL process to populate a Kimball-style star schema
$25.50
RRP $84.99

Want this title & more?

$12.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Video Details

ISBN 139781782168638
Course Length1 hour and 50 minutes

About This Video

  • Learn how to create ETL transformations to populate a star schema in a short span of time
  • Create a fully-functional ETL process using a practical approach
  • Follow the step-by-step instructions for creating an ETL based on a fictional company – get your hands dirty and learn fast

Who This Video Is For

If you are are eager to learn how to create an ETL process to populate a star schema, and at the end of the course you want to be in a position to apply your new knowledge to your specific business requirements, then "Building a Data Mart with Pentaho Data Integration" is for you. You need to have a basic understanding of star schemas and Pentaho Data Integration to take the next step: setting everything into practice.

Table of Contents

Getting Started
The Second-hand Lens Store
The Derived Star Schema
Setting up Our Development Environment
Agile BI – Creating ETLs to Prepare Joined Data Set
Importing Raw Data
Exporting Data Using the Standard Table Output
Exporting Data Using the Dedicated Bulk Loading
Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL Improvements
Creating a Pentaho Analysis Model
Analyzing Data Using Pentaho Analyzer
Improving Your ETL for Better Data Quality
Slowly Changing Dimensions
Creating a Slowly Changing Dimension of Type 1 Using Insert/Update
Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update
Creating a Slowly Changing Dimension Type 2
Populating Data Dimension
Defining Start and End Date Parameters
Auto-generating Daily Rows for a Given Period
Auto-generating Year, Month, and Day
Creating the Fact Transformation
Sourcing Raw Data for Fact Table
Lookup Slowly Changing Dimension of the Type 1 Key
Lookup Slowly Changing Dimension of the Type 2 key
Orchestration
Loading Dimensions in Parallel
Creating Master Jobs
ID-based Change Data Capture
Implementing Change Data Capture (CDC)
Creating a CDC Job Flow
Final Touches: Logging and Scheduling
Setting up a Dedicated DB Schema
Setting up Built-in Logging
Scheduling on the Command Line

What You Will Learn

  • Create a star schema
  • Populate and maintain slowly changing dimensions type 1 and type 2
  • Load fact and dimension tables in an efficient manner
  • Use a columnar database to store the data for the star schema
  • Analyze the quality of the data in an agile manner
  • Implement logging and scheduling for the ETL process
  • Get an overview of the whole process: from source data to the end user analyzing the data
  • Learn how to auto-generate data for a date dimension

In Detail

Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.

Building a Data Mart with Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.

Building a Data Mart with Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.

By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.

Authors

Screenshots

Read More

Recommended for You