Building a Data Mart with Pentaho Data Integration [Video]
Packt Video. Stream online or download for unrestricted offline use. Learn more
- New! Packt Video courses: practical screencast-based tutorials that show you how to get the job done. Bite sized chunks, hands on instructions, and powerful results.
- Learn how to create ETL transformations to populate a star schema in a short span of time
- Create a fully-functional ETL process using a practical approach
- Follow the step-by-step instructions for creating an ETL based on a fictional company – get your hands dirty and learn fast
Video DetailsLanguage : English
Release Date : Tuesday, December 31, 2013
Course Length : 1 hour and 50 minutes
ISBN : 178216863X
ISBN 13 : 9781782168638
Author(s) : Diethard Steiner
Topics and Technologies : Video, e-Learning, Open Source
Table of Contents
- Getting Started [18:25 minutes]
- The Second-hand Lens Store Sample Data
- The Derived Star Schema
- Setting up Our Development Environment
- Agile BI – Creating ETLs to Prepare Joined Data Set [12:27 minutes]
- Importing Raw Data
- Exporting Data Using the Standard Table Output Step
- Exporting Data Using the Dedicated Bulk Loading Step
- Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL Improvements [11:29 minutes]
- Creating a Pentaho Analysis Model
- Analyzing the Data Using the Pentaho Analyzer
- Improving Your ETL for Better Data Quality
- Slowly Changing Dimensions [17:03 minutes]
- Creating a Slowly Changing Dimension of Type 1 Using the Insert/Update Step
- Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update Step
- Creating a Slowly Changing Dimension Type 2
- Populating Data Dimension [16:10 minutes]
- Defining Start and End date Parameters
- Auto-generating Daily rows for a Given Date Period
- Auto-generating Year, Month, Day and so on.
- Creating the Fact Transformation [14:28 minutes]
- Sourcing Raw Data for Fact Table
- Look up Slowly Changing Dimension of the Type 1 Key
- Look up Slowly Changing Dimension of the Type 2 key
- Orchestration [10:29 minutes]
- Loading Dimensions in Parallel
- Creating Master Jobs
- ID-based Change Data Capture [9:46 minutes]
- Implementing Change Data Capture (CDC)
- Creating a CDC Job Flow
- Final Touches: Logging and Scheduling [11:14 minutes]
- Setting up a Dedicated DB Schema
- Setting up Built-in Logging
- Scheduling on the Command Line
Sorry, there are currently no downloads available for this video.
Download the code and support files for this video.
Support, complaints and feedback.
Packt is committed to making Packt Video courses a valuable, useful way for IT professionals to learn new skills. We have made every effort to ensure that this course reaches the required standard and will work on our customer's devices. Please go to our support page.
What you will learn from this video course
- Create a star schema
- Populate and maintain slowly changing dimensions type 1 and type 2
- Load fact and dimension tables in an efficient manner
- Use a columnar database to store the data for the star schema
- Analyze the quality of the data in an agile manner
- Implement logging and scheduling for the ETL process
- Get an overview of the whole process: from source data to the end user analyzing the data
- Learn how to auto-generate data for a date dimension
Who this video course is for
If you are are eager to learn how to create an ETL process to populate a star schema, and at the end of the course you want to be in a position to apply your new knowledge to your specific business requirements, then "Building a Data Mart with Pentaho Data Integration" is for you. You need to have a basic understanding of star schemas and Pentaho Data Integration to take the next step: setting everything into practice.
Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.
Building a Data Mart with Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.
Building a Data Mart with Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.
By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.
Packt video courses are designed to cover the breadth of the topic in short, hands-on, task-based videos. Each course is divided into short manageable sections, so you can watch the whole thing or jump to the bit you need. The focus is on practical instructions and screencasts showing you how to get the job done.
Follow carefully organized sequences of instructions that outline how to leverage the power of Pentaho Data Integration in a simple and practical approach.