Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Pentaho Data Integration Cookbook - Second Edition - Second Edition

You're reading from  Pentaho Data Integration Cookbook - Second Edition - Second Edition

Product type Book
Published in Dec 2013
Publisher Packt
ISBN-13 9781783280674
Pages 462 pages
Edition 2nd Edition
Languages

Table of Contents (21) Chapters

Pentaho Data Integration Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Working with Databases Reading and Writing Files Working with Big Data and Cloud Sources Manipulating XML Structures File Management Looking for Data Understanding and Optimizing Data Flows Executing and Re-using Jobs and Transformations Integrating Kettle and the Pentaho Suite Getting the Most Out of Kettle Utilizing Visualization Tools in Kettle Data Analytics Data Structures References Index

Introduction


Databases are broadly used by organizations to store and administer transactional data such as customer service history, bank transactions, purchases, sales, and so on. They are also used to store data warehouse data used for Business Intelligence solutions.

In this chapter, you will learn to deal with databases in Kettle. The first recipe tells you how to connect to a database, which is a prerequisite for all the other recipes. The rest of the chapter teaches you how to perform different operations and can be read in any order according to your needs.

Note

The focus of this chapter is on relational databases (RDBMS). Thus, the term database is used as a synonym for relational database throughout the recipes.

Sample databases

Through the chapter you will use a couple of sample databases. Those databases can be created and loaded by running the scripts available at the book's website. The scripts are ready to run under MySQL.

Note

If you work with a different DBMS, you may have to modify the scripts slightly.

For more information about the structure of the sample databases and the meaning of the tables and fields, please refer to Appendix A, Data Structures. Feel free to adapt the recipes to different databases. You could try some well-known databases; for example, Foodmart (available as part of the Mondrian distribution at http://sourceforge.net/projects/mondrian/) or the MySQL sample databases (available at http://dev.mysql.com/doc/index-other.html).

Pentaho BI platform databases

As part of the sample databases used in this chapter you will use the Pentaho BI platform Demo databases. The Pentaho BI Platform Demo is a preconfigured installation that lets you explore the capabilities of the Pentaho platform. It relies on the following databases:

Database name

Description

hibernate

Administrative information including user authentication and authorization data.

Quartz

Repository for Quartz; the scheduler used by Pentaho.

Sampledata

Data for Steel Wheels, a fictional company that sells all kind of scale replicas of vehicles.

By default, all those databases are stored in Hypersonic (HSQLDB). The script for creating the databases in HSQLDB can be found at http://sourceforge.net/projects/pentaho/files. Under Business Intelligence Server | 1.7.1-stable look for pentaho_sample_data-1.7.1.zip. While there are newer versions of the actual Business Intelligence Server, they all use the same sample dataset.

These databases can be stored in other DBMSs as well. Scripts for creating and loading these databases in other popular DBMSs for example, MySQL or Oracle can be found in Prashant Raju's blog, at http://www.prashantraju.com/projects/pentaho.

Beside the scripts you will find instructions for creating and loading the databases.

Note

Prashant Raju, an expert Pentaho developer, provides several excellent tutorials related to the Pentaho platform. If you are interested in knowing more about Pentaho, it's worth taking a look at his blog.

You have been reading a chapter from
Pentaho Data Integration Cookbook - Second Edition - Second Edition
Published in: Dec 2013 Publisher: Packt ISBN-13: 9781783280674
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}