Reader small image

You're reading from  Pentaho 8 Reporting for Java Developers

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781788298995
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Francesco Corti
Francesco Corti
author image
Francesco Corti

Francesco Corti is an enthusiastic consultant in software solutions and loves working in developer, sales, and customers teams. Proud of the role of a software engineer, he is often involved in pre-sales presentations, public speaking, and IT courses. Developing software, designing architectures, and defining solutions in ECM/BPM and BI are his favorite areas of interest. He has completed dozens of projects, from very small ones to more complex ones, in almost 20 years of experience. A product evangelist at Alfresco, Francesco represents the famous open source ECM in the developer community. In addition to helping developers adopt Alfresco technologies, he often helps Alfresco to improve the developer experience through talks, articles, blogging, user demonstrations, recorded demonstrations, or the creation of sample projects. He is the inventor and principal developer of Alflytics (previously named Alfresco Audit Analytics and Reporting), the main business intelligence solution over Alfresco ECM, entirely based on the Pentaho suite. He authored the Pentaho Reporting video course with more than 40 videos and courses on the Pentaho Reporting Designer and SDK. Francesco has specialty and principal experiences in enterprise content management solutions with Alfresco ECM and Hyland OnBase (he is an OnBase certified installer); business process management solutions with Activiti, JBPM, and Hyland OnBase; data capture solutions with Ephesoft, Hyland OnBase, and custom software; record management solutions with O'Neil software and custom software (using Alfresco ECM and Hyland OnBase); and portal and collaboration with Liferay and MS SharePoint.
Read more about Francesco Corti

View More author details
Right arrow

Chapter 6. Configuring JDBC and Other Data Sources

In Chapter 4Creating a Report with Report Designer, you saw how to create a report from scratch using Pentaho Report Designer. As an exercise, you created a data source defining the list of fields requested for viewing results, making charts, calculating formulas, and so on. In this chapter, you will go through the details of the founding concept of the data source, discovering all the available types and their specific features.

In this chapter, you will start with an introduction about data source management using the Pentaho report, diving deep into all the different data sources one by one, and learning how to configure them using Pentaho Report Designer. In particular, you will see:

  • The data source for a DBMS using JDBC
  • The data source for metadata in XML metadata interchange format
  • The data source for MongoDB
  • The data source for Pentaho Data Integration transformation
  • The data source for an OLAP schema (Mondrian)
  • The data source for XML...

Introducing data sources


We have already introduced the importance of data sources in report development. Indeed, data sources are the very first step in report development, using both the Report Designer and the Pentaho Reporting SDK. In short, data sources are used by Pentaho Reporting to define the dataset available in a report, using a table representation made of fields and rows. Each field is defined by a name and a type (that is, string, integer, and so on), and a row is a collection of values related to the fields. Fields and rows are the real content of a Pentaho report and they can be used for viewing results, making charts, calculating formulas, the definition of parameters and user inputs, and a lot of other things.

Even if the result of a data source is always a table, the types of sources can vary significantly. We can have relational DBMSs, NoSQL databases, different engines (like OLAP, Pentaho Data Integration, and so on), or even different formats of static data structures...

The JDBC data source


The first type of data source in the list, and the most used one, is the JDBC data source. JDBC is an API for the Java language, which defines how a client may access a database. It provides methods to query (and update) data in a database and is oriented towards relational databases. See https://en.wikipedia.org/wiki/Java_Database_Connectivity for further details.

Once selected, a modal window, similar to the following one, appears:

The window contains two main sections: the connections (on the left) and the queries (on the right). The next two sections will introduce all the features for both the panels.

Adding or editing a JDBC connection

Even if the connection panel contains some SampleData connections, you can see here how to add a new one or how to edit an existing one. In the connection panel of the window, you can see:

  • A pencil icon (
    ) used to edit an existing connection
  • A green plus icon (
    ) used to add a new connection to the list of the available ones
  • A red icon ...

The metadata data source


The metadata data source lets you inquire about a single file in XMI format. The XML Metadata Interchange (XMI) is an Object Management Group (OMG) standard for exchanging metadata information via XML. It can be used for any metadata whose metamodel can be expressed in a Meta-Object Facility (MOF). The most common use of XMI is as an interchange format for UML models, although it can also be used for serialization of models in other languages (metamodels). Some tools of the Pentaho suite (for example, Pentaho Data Integration) can generate XMI files to represent the metadata, and with this data source, they can be used to generate reports.

Once this data source type is selected, a modal window is shown, as follows:

As you can see at the top of the window, in this data source you can also define the dataset using a standard configuration or script. The script is similar to what you saw for the JDBC data source, and for this reason, we will concentrate the discussion...

The MongoDB data source


The MongoDB data source lets you natively connect to the famous NoSQL database. To do this, you must already have MongoDB database connection information, such as host name(s), port number(s), and authentication credentials.

Note

MongoDB (from humongous) is a free and open source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schema, stored as content.

Once selected, this data source type shows a modal window with two panels: the query list panel on the left and the query details panel on the right. The query list panel is similar to the one introduced for the JDBC data source, where you can see the icons to add queries (

) and delete queries (

). Clicking on the green plus (

) icon, a new Query 1 is added to the available ones, as shown in the following screenshot:

Once a query is selected (or a new one is created), the right panel shows the Name field, where you can change the meaningful...

The Pentaho Data Integration data source


Use the Pentaho Data Integration data source option if you want to create a report containing data from any step in a PDI transformation. As you probably know, Pentaho Data Integration is one of the core tools of the Pentaho suite and it is designed and developed to implement a fully featured ETL and data integration solution (for a complete introduction, checkout Chapter 16Using Reports in Pentaho Data Integration). This type of data source is particularly useful if you want to create a report that includes data from transformation steps such as Excel input or other PDI steps.

Note

If you are using this type of data source, remember to copy all the JAR files from <data-integration>/lib to <report-designer>/lib in order to access it through Pentaho Report Designer.

Once this data source type is selected, a modal window is shown, as follows:

On the left side of the window the list of queries is shown with the well-known green plus icon (

...

The OLAP data source


The Online Analytical Processing (OLAP) data source is used in case a multidimensional engine is involved.

Note

Quoting from Wikipedia about OLAP (https://en.wikipedia.org/wiki/Online_analytical_processing): a multidimensional structure is defined as a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data. Even when data is manipulated it remains easy to access and continues to constitute a compact database format. A multidimensional structure is quite popular for analytical databases that use OLAP applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly.

The Pentaho suite provides an open source multidimensional engine called Mondrian (http://community.pentaho.com/projects/mondrian). Mondrian is entirely written in Java and it executes queries written in the Multidimensional Expressions (MDX) language, reading data...

The XML data source


The XML data source is quite easy to understand and can be useful for exported data from external systems, if you don't want to use a database but want to use the data in a report. Once this data source type is selected, a modal window is shown, as follows:

Starting from the top, the first parameter is about pointing to an XML file. In this example, in particular, we are going to use the steelwheels.mondrian.xml file stored in the report-designer/samples folder.

Note

If you feel that the steelwheels.mondrian.xml file has not got a nice data structure and it's probably difficult to inquire, you are right! The steelwheels.mondrian.xml file contains the declaration of a Mondrian schema and does not contain data similar to a database export. Unfortunately, this is what we have available bundled in the Pentaho Reporting distribution and it's enough for our purpose to demonstrate an example of use of the XML data source. 

Below the selection of the XML file, you can see the list...

The table data source


The table data source is used if you have to manage static and predefined data, not stored in a mass storage or configuration file. Compared to the other data sources, this one defines its structure directly inside the Pentaho Report Designer. For this reason, the first thing to do is to declare before the query and define its structure after, starting with a two column table. The definition of the table is straightforward: you can specify the value for the columns, change the column name, and add (or remove) rows or columns.

Once this data source type is selected, a modal window is shown, as follows:

As you can see from the window, you can define the table structure starting from a Microsoft Excel sheet. To complete the features, the OK/Cancel buttons are the standard ones to confirm/cancel the current operation.

The advanced data source


Advanced data sources require advanced knowledge and skills, so only a high-level overview of each advanced data source is given in this section.

Scriptable

The scriptable data source lets the developer develop a source code generating a dataset in various languages. As in the previous data sources, you can create multiple queries and manage the addition/deletion of queries.

External

The external data source is used when a report is used in .xaction. You can refer to the official documentation for the (few) documents about this data source. Pentaho action sequence XML documents (.xaction) define activities such as database queries, report generation, and email actions, and the order in which they occur.

Sequence generator

The sequence generator develops a sequence to be used in your report. As in previous data sources, you can create multiple queries and manage the addition/deletion of queries. For each query, you can define the group (numeric or system), the sequence...

The Pentaho community data access data source


The Community Data Access (CDA) data source is an interesting type related to the so called Pentaho CDA. Pentaho CDA was designed to allow great flexibility for data sources. Most of the available tools can perform queries over a single data source and there's no easy way to combine data coming from different databases or in different languages (combining data from an SQL query with data from an MDX query). These tasks usually require an independent ETL job to synchronize different data sources and build a new database. Pentaho CDA was developed as an abstraction tool between database connections and Community Dashboard Framework (CDF), but in this context are used also in reporting.

Once this data source type is selected, a modal window is shown, as follows:

As you can see at the top of the window, a URL pointing to the Pentaho Business Analytics Platform is required (you will learnt more about it in Chapter 15, Using Reports in Pentaho Business...

Summary


In this chapter, you learnt everything Pentaho Report Designer can do with data sources, discovering all the available types and their specific features. You started with an introduction to how a data source looks in the user interface and how the fields are used in the layout of the report. Then you moved on to discover how to manage all the possible types of data sources using Report Designer, finding out all about:

  • The JDBC data source
  • The metadata data source
  • The MongoDB data source
  • The Pentaho Data Integration transformation data source
  • The OLAP data source in various versions (legacy and generic, normalized and denormalized, standard, and custom)
  • The XML data source
  • The table data source
  • The advanced data sources (scriptable, external, sequence generator, and OpenERP)
  • The Pentaho community data access data source

Now that you have read this chapter, you should feel comfortable with all the ways Pentaho Report Designer offers to connect to a source. This is the very first step to go through...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Pentaho 8 Reporting for Java Developers
Published in: Sep 2017Publisher: PacktISBN-13: 9781788298995
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Francesco Corti

Francesco Corti is an enthusiastic consultant in software solutions and loves working in developer, sales, and customers teams. Proud of the role of a software engineer, he is often involved in pre-sales presentations, public speaking, and IT courses. Developing software, designing architectures, and defining solutions in ECM/BPM and BI are his favorite areas of interest. He has completed dozens of projects, from very small ones to more complex ones, in almost 20 years of experience. A product evangelist at Alfresco, Francesco represents the famous open source ECM in the developer community. In addition to helping developers adopt Alfresco technologies, he often helps Alfresco to improve the developer experience through talks, articles, blogging, user demonstrations, recorded demonstrations, or the creation of sample projects. He is the inventor and principal developer of Alflytics (previously named Alfresco Audit Analytics and Reporting), the main business intelligence solution over Alfresco ECM, entirely based on the Pentaho suite. He authored the Pentaho Reporting video course with more than 40 videos and courses on the Pentaho Reporting Designer and SDK. Francesco has specialty and principal experiences in enterprise content management solutions with Alfresco ECM and Hyland OnBase (he is an OnBase certified installer); business process management solutions with Activiti, JBPM, and Hyland OnBase; data capture solutions with Ephesoft, Hyland OnBase, and custom software; record management solutions with O'Neil software and custom software (using Alfresco ECM and Hyland OnBase); and portal and collaboration with Liferay and MS SharePoint.
Read more about Francesco Corti