Learning Pentaho Data Integration 8 CE - Third Edition

Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide
Preview in Mapt

Learning Pentaho Data Integration 8 CE - Third Edition

María Carina Roldán

Get up and running with the Pentaho Data Integration tool using this hands-on, easy-to-read guide

Quick links: > What will you learn?> Table of content

Mapt Subscription
FREE
$20.83/m after trial
eBook
$28.00
RRP $39.99
Save 29%
Print + eBook
$49.99
RRP $49.99
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$28.00
$49.99
$29.99 p/m after trial
RRP $39.99
RRP $49.99
Subscription
eBook
Print + eBook
Start 14 Day Trial

Frequently bought together


Learning Pentaho Data Integration 8 CE - Third Edition Book Cover
Learning Pentaho Data Integration 8 CE - Third Edition
$ 39.99
$ 28.00
Pentaho 8 Reporting for Java Developers Book Cover
Pentaho 8 Reporting for Java Developers
$ 39.99
$ 28.00
Buy 2 for $35.00
Save $44.98
Add to Cart

Book Details

ISBN 139781788292436
Paperback500 pages

Book Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag-and-drop design and powerful Extract-Tranform-Load (ETL) capabilities. This book shows and explains the new interactive features of Spoon, the revamped look and feel, and the newest features of the tool including transformations and jobs Executors and the invaluable Metadata Injection capability.

We begin with the installation of PDI software and then move on to cover all the key PDI concepts. Each of the chapter introduces new features, enabling you to gradually get practicing with the tool. First, you will learn to do all kind of data manipulation and work with simple plain files. Then, the book teaches you how you can work with relational databases inside PDI. Moreover, you will be given a primer on data warehouse concepts and you will learn how to load data in a data warehouse. During the course of this book, you will be familiarized with its intuitive, graphical and drag-and-drop design environment.

By the end of this book, you will learn everything you need to know in order to meet your data manipulation requirements. Besides, your will be given best practices and advises for designing and deploying your projects.

Table of Contents

Chapter 1: Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Installing PDI
Launching the PDI Graphical Designer - Spoon
Introducing transformations
Installing useful related software
Summary
Chapter 2: Getting Started with Transformations
Chapter 3: Creating Basic Task Flows
Introducing jobs
Designing and running jobs
Running transformations from a Job
Understanding and changing the flow of execution
Managing files
Knowing the basics about Kettle variables
Summary
Chapter 4: Reading and Writing Files
Reading data from files
Outputting data to files
Working with Big Data and cloud sources
Summary
Chapter 5: Manipulating PDI Data and Metadata
Manipulating simple fields
Working with complex structures
Summary
Chapter 6: Controlling the Flow of Data
Filtering data
Splitting streams unconditionally
Splitting the stream based on conditions
Merging streams in several ways
Looking up data
Summary
Chapter 7: Cleansing, Validating, and Fixing Data
Cleansing data
Validating data
Treating invalid data by splitting and merging streams
Summary
Chapter 8: Manipulating Data by Coding
Doing simple tasks with the JavaScript step
Parsing unstructured files with JavaScript
Doing simple tasks with the Java Class step
Getting the most out of the Java Class step
Avoiding coding using purpose-built steps
Summary
Chapter 9: Transforming the Dataset
Sorting data
Working on groups of rows
Converting rows to columns
Normalizing data
Going forward and backward across rows
Summary
Chapter 10: Performing Basic Operations with Databases
Connecting to a database and exploring its content
Previewing and getting data from a database
Inserting, updating, and deleting data
Verifying a connection, running DDL scripts, and doing other useful tasks
Looking up data in different ways
Summary
Chapter 11: Loading Data Marts with PDI
Preparing the environment
Introducing dimensional modeling
Loading dimensions with data
Loading fact tables
Summary
Chapter 12: Creating Portable and Reusable Transformations
Chapter 13: Implementing Metadata Injection
Introducing metadata injection
Discovering metadata and injecting it
Identifying use cases to implement metadata injection
Summary
Chapter 14: Creating Advanced Jobs
Chapter 15: Launching Transformations and Jobs from the Command Line
Using the Pan and Kitchen utilities
Supplying named parameters and variables
Using command-line arguments
Sending the output of executions to log files
Automating the execution
Summary
Chapter 16: Best Practices for Designing and Deploying a PDI Project
Setting up a new project
Best practices to design jobs and transformations
Maximizing the performance
Deploying the project in different environments
Summary

What You Will Learn

  • Explore the features and capabilities of Pentaho Data Integration 8 Community Edition
  • Install and get started with PDI
  • Learn the ins and outs of Spoon, the graphical designer tool
  • Learn to get data from all kind of data sources, such as plain files, Excel spreadsheets, databases, and XML files
  • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on relationaldatabases
  • Populate a data mart with Pentaho Data Integration
  • Use Pentaho Data Integration to organize files and folders, run daily processes, deal with errors, and more

Authors

Table of Contents

Chapter 1: Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Installing PDI
Launching the PDI Graphical Designer - Spoon
Introducing transformations
Installing useful related software
Summary
Chapter 2: Getting Started with Transformations
Chapter 3: Creating Basic Task Flows
Introducing jobs
Designing and running jobs
Running transformations from a Job
Understanding and changing the flow of execution
Managing files
Knowing the basics about Kettle variables
Summary
Chapter 4: Reading and Writing Files
Reading data from files
Outputting data to files
Working with Big Data and cloud sources
Summary
Chapter 5: Manipulating PDI Data and Metadata
Manipulating simple fields
Working with complex structures
Summary
Chapter 6: Controlling the Flow of Data
Filtering data
Splitting streams unconditionally
Splitting the stream based on conditions
Merging streams in several ways
Looking up data
Summary
Chapter 7: Cleansing, Validating, and Fixing Data
Cleansing data
Validating data
Treating invalid data by splitting and merging streams
Summary
Chapter 8: Manipulating Data by Coding
Doing simple tasks with the JavaScript step
Parsing unstructured files with JavaScript
Doing simple tasks with the Java Class step
Getting the most out of the Java Class step
Avoiding coding using purpose-built steps
Summary
Chapter 9: Transforming the Dataset
Sorting data
Working on groups of rows
Converting rows to columns
Normalizing data
Going forward and backward across rows
Summary
Chapter 10: Performing Basic Operations with Databases
Connecting to a database and exploring its content
Previewing and getting data from a database
Inserting, updating, and deleting data
Verifying a connection, running DDL scripts, and doing other useful tasks
Looking up data in different ways
Summary
Chapter 11: Loading Data Marts with PDI
Preparing the environment
Introducing dimensional modeling
Loading dimensions with data
Loading fact tables
Summary
Chapter 12: Creating Portable and Reusable Transformations
Chapter 13: Implementing Metadata Injection
Introducing metadata injection
Discovering metadata and injecting it
Identifying use cases to implement metadata injection
Summary
Chapter 14: Creating Advanced Jobs
Chapter 15: Launching Transformations and Jobs from the Command Line
Using the Pan and Kitchen utilities
Supplying named parameters and variables
Using command-line arguments
Sending the output of executions to log files
Automating the execution
Summary
Chapter 16: Best Practices for Designing and Deploying a PDI Project
Setting up a new project
Best practices to design jobs and transformations
Maximizing the performance
Deploying the project in different environments
Summary

Book Details

ISBN 139781788292436
Paperback500 pages
Read More

Read More Reviews

Recommended for You

Pentaho 8 Reporting for Java Developers Book Cover
Pentaho 8 Reporting for Java Developers
$ 39.99
$ 28.00
Instant Pentaho Data Integration Kitchen Book Cover
Instant Pentaho Data Integration Kitchen
$ 19.99
$ 14.00
DevOps with Kubernetes Book Cover
DevOps with Kubernetes
$ 39.99
$ 28.00
DevOps with Kubernetes Book Cover
DevOps with Kubernetes
$ 39.99
$ 28.00
C# 7.1 and .NET Core 2.0 – Modern Cross-Platform Development - Third Edition Book Cover
C# 7.1 and .NET Core 2.0 – Modern Cross-Platform Development - Third Edition
$ 31.99
$ 22.40
Learning Angular - Second Edition Book Cover
Learning Angular - Second Edition
$ 35.99
$ 25.20