Pentaho Reporting is an easy-to-use, open source, lightweight suite of Java projects built for one purpose—report generation. In this book, you'll discover how easy it is to embed Pentaho Reporting into your Java projects, or use it as a standalone reporting platform. Pentaho Reporting's open source license—the GNU Lesser General Public License (LGPL )—gives developers the freedom to embed Pentaho Reporting into their open source and proprietary applications at no cost. An active community participates in the development and use of Pentaho Reporting, answering forum questions, fixing bugs, and implementing new features. While many proprietary reporting options are available, none can offer the openness and flexibility that Pentaho Reporting provides its users with.
As with most successful open source projects, Pentaho Reporting has a proven track record, along with a long list of features. Most of this history has been documented in open forums and in email threads, which are still available for folks to browse through and glean ideas from. Starting as a side hobby and turning into an enterprise reporting suite over the course of seven years, the Pentaho Reporting Engine and its suite of tools such as the Report Designer, Report Design Wizard, and Pentaho's web-based Ad Hoc Reporting user interface, are used as critical components in countless corporate, educational, governmental, and community-based information technology solutions.
In most business software applications, a reporting component is necessary, be it for summarizing data, generating large numbers of customized documents, or simply for making it easier to print information that would be useful in various output formats. With a complete set of features, including PDF, Excel, HTML, and RTF report generation, along with advanced reporting capabilities such as sub-reports and cross tabs, Pentaho Reporting can crack the simplest of problems quickly, along with solving the more advanced challenges when designing, generating and deploying reports.
Read on in this chapter to learn more about the typical uses, history and origins of Pentaho Reporting, along with a more detailed overview of the reporting functionality that Pentaho Reporting provides.
Business users need access to information in many different forms for many different reasons. Pentaho Reporting addresses the following typical uses of reporting, along with many other types that will be covered in this book.
One of the most commonly used forms of reporting is operational reporting . When a developer or an IT organization decides to generate reports directly from their operational data sources for the purpose of detailed transaction level reporting, this is referred to as operational reporting. In this scenario, the database is designed to solve an operational problem, and usually contains live data supporting critical business functions. Users of Pentaho Reporting can point directly to this data source and start generating reports.
Some examples of operational reporting include building custom reports directly based on a third-party software vendor's database schema such as Bugzilla's bug tracking system or SugarCRM's Customer Relationship Management system. These reports might include summaries of daily activity, or detailed views into particular projects or users in the system. Reports might also be generated from data originating from an in-house custom application. These reports are typically based on a SQL backend, but could be generated from flat log files or directly from in-memory Java objects.
Pentaho Reporting's parameterization capabilities provide a powerful mechanism to render up-to-the-minute customized operational reports. With features such as cross tabs and interactive reporting, business users can quickly view their operational data and drill back into operational systems that might require attention.
There are limitations when developing reports based on live operational data. Developers need to be careful to make sure that queries in the operational system do not impact the performance of regular operations. An extremely CPU-intensive query could delay a transaction from taking place. Also, certain historical questions —for example, state transitions or changes to particular informational fields such as address—aren't traditionally captured in an operational schema design.
When you've reached the limits of operational reporting, the next logical step is to move your data into a data warehouse. This move is often referred to as business intelligence reporting. Reporting alone does not provide the necessary tools to make this transition. You will need an Extract, Transform, and Load(ETL) tool such as Pentaho Data Integration, along with a sensible warehouse design such as a snow flake schema, in order to enable business intelligence reporting.
This type of use allows business users to monitor changes over time. It also helps gain performance benefits by pre-calculating aggregations and defining schemas that are built in mind for summarized reporting. Until recently, data warehousing and business intelligence have been limited to large enterprises due to the cost of software and limited expertise. With open source tools becoming more widely available, a large number of small and medium size businesses are deploying data warehouses, in order to get solutions for the critical questions in their business domain. Common examples of data warehouse reporting include combining sales and inventory data into a single location for reporting, or combining internal proprietary sales data with publicly available market trends and analysis.
Pentaho Reporting's flexible data source support makes it easy to incorporate reports into your business intelligence solutions. Also, with Pentaho Reporting's speed and scalability, you can deploy Pentaho Reporting with confidence that reports will be executed efficiently.
As with all approaches, there are limitations to this approach. In traditional warehousing, data is usually batched nightly, weekly, or monthly. Therefore, business users rarely get to see up-to-the-minute reports on business operations. Also, when designing a warehouse, it is important to ask the correct business questions. Unfortunately, it is possible to build a data warehouse and still not address business users’ needs, if not investigated ahead of time.
Financial reporting is a very specific, but very common form of reporting, geared towards generating financial summaries for accountants, managers, and business investors. Standard reports that fall into this category include balance sheets, income statements, retained earning statements, and cash flow statements. Unlike business intelligence or operational reporting, many of these reports are required by law, with regulations around their content and presentation. Financial reports often include computations for assets, liabilities, revenues, and expenses.
Following is the screenshot showing one such report:
With features such as group summary aggregations, Pentaho Reporting makes it very easy for developers to implement custom financial reports that business managers and owners require.
Typically, this type of data exists in a controlled form, be it in a proprietary system such as QuickBooks or SAP, or in a secure database system such as Oracle or MySQL. Due to the sensitivity of this data, developers will need to be conscious of who has access to reports and may want to implement features such as audit logging.
Another typical use of Pentaho Reporting includes production reporting. This type of reporting includes reports such as a customized form letter, invoice, or postcard for a large audience, as well as automated mail merging. Normally, batch processing is involved in this form of reporting. However, custom reports generated for individuals based on a standard template can also fall under this category.
There are certain specific features in Pentaho Reporting such as dynamically incorporating images from a data source, as well as pixel accurate formatting, which can be of real help when implementing production reporting.
Pentaho Reporting began as JFreeReport, a Java-based reporting engine and Swing widget, back in 2002. David Gilbert, author of JFreeChart, implemented the initial version of JFreeReport to address report rendering needs. Soon after launching the project, Thomas Morgner, standing to the right of Will Gorman in the following picture, became the primary developer. He added critical functionality such as report functions and XML report definitions, launching JFreeReport into a successful open source Java project.
Since the beginning, Pentaho Reporting has been an international project. David is located in Hertfordshire, United Kingdom, and Thomas is located in Frankfurt, Germany. Many others from all over the world have contributed translations and code to Pentaho Reporting.
From 2002 to 2006, Thomas continued to develop JFreeReport into an enterprise-worthy reporting engine. While working as a consultant, Thomas added support for a variety of outputs, including Excel and RTF. At the beginning of 2006, Thomas and JFreeReport joined Pentaho, an open source business intelligence company, and JFreeReport officially became Pentaho Reporting. At this time, Thomas transitioned from a full-time consultant to a full-time developer, on the Pentaho Reporting Engine and suite of tools.
In January 2006, along with the acquisition of Pentaho Reporting, Pentaho announced the general availability of the Pentaho Report Design Wizard, which walks business users through a set of simple instructions for building sophisticated template-based reports. Mike D'Amour, a Senior Engineer at Pentaho, was the initial author of this wizard, which is now used in many Pentaho applications.
Another important milestone in Pentaho Reporting's history was the introduction of Pentaho Report Designer. In 2006, Martin Schmid contributed the first version of the Pentaho Report Designer to the community. Since its introduction, the Report Designer has evolved with the reporting engine.
In 2007, Pentaho teamed up with Sun's OpenOffice.org, to deliver a reporting solution for OpenOffice.org's database tool set. This project was headed by Thomas Morgner, and is now known as the Pentaho Reporting Flow Engine. While this engine shares many of the concepts from the classic engine discussed in this book, it is a separate project with dramatically different features and functionality than Pentaho's classic reporting project.
Beginning in Pentaho Business Intelligence (BI) Platform release 1.6, Pentaho Reporting also tightly integrates with Pentaho's Metadata Engine, allowing easy-to-use web-based ad hoc reporting by business users who may not have SQL expertise, data driven formatting in reports, as well as column and row level data security. The same functionality is available inside Pentaho Report Designer for query and report building, allowing business users to go from a quick template-based report to a full-fledged custom report.
The following is a timeline of the major events in Pentaho Reporting over the past several years:
April 2002: David Gilbert and Thomas Morgner start the JFreeReport project.
September 2003: Version 0.8.3 of JFreeReport is released, refining PDF, HTML, and Excel rendering, along with many additional enhancements.
March 2005: Version 0.8.5 of JFreeReport is released, with enhancements to function and expression building, along with new features such as Barcode support.
January 2006: Pentaho acquires JFreeReport and hires Thomas Morgner as Pentaho's Chief Reporting Engineer. In the same month, Pentaho Reporting Wizard is released.
June 2006: Martin Schmid releases the first version of Pentaho Report Designer.
November 2006: Web-based Ad hoc Reporting Support is added to Pentaho's BI Platform.
April 2007: Pentaho teams up with OpenOffice.org to deliver Pentaho Reporting's Flow Engine, embedded in OpenOffice.org.
August 2009: Pentaho releases version 3.5 of Pentaho Reporting.
In this quick introduction to the various features available in Pentaho Reporting 3.5, you'll have an executive summary of how Pentaho Reporting works and what it can accomplish for your reporting needs. The topics that will follow are covered in more depth in later chapters of the book.
The reporting algorithm is at the heart of Pentaho Reporting. This algorithm manages the layout and rendering of the entire report, no matter which output format is being rendered. This algorithm combines a reporting template, along with a dataset on the fly, in order to generate the final report. There is no unnecessary compilation step. All other Pentaho Reporting features can be described in the context of the overall reporting algorithm.
This algorithm allows reports to render with a page header and footer, a report header and footer, group headers and footers, as well as a details band. The reporting algorithm traverses the dataset multiple times to render the report. In the first pass, the algorithm performs calculations and determines how to separate the data into groups, along with calculating the height and width of text and images. After the initial pass, the algorithm traverses the dataset a second time, in order to render the output.
Pentaho Reporting defines a standard Java API for accessing data. Many data source implementations are made available with Pentaho Reporting. The most commonly used implementations include JDBC Database Connectivity, XML XPATH capability, Multidimensional OLAP Data Access using MDX, and simple Plain Old Java Object (POJO) support.
Additional data sources that are available include a Pentaho Data Integration data source, a Hibernate Query Language (HQL) data source, and a Pentaho Metadata data source. With Pentaho's Data Integration data source it is easy to use Excel, Logs, or other file formats as inputs to a report without the need to write any code.
All of these data sources interact with the reporting engine through a standard API, which is easy to extend.
The following data sources are available with Pentaho Reporting:
By combining Pentaho Reporting's data source functionality with Pentaho's Data Integration engine, most known data formats and systems are available for input. This includes combining data sources into a single report. An example might include a Microsoft Excel file, on a remote shared drive, with a plain text log file from an HTTP server.
Pentaho Reporting has the ability to render to the most widely used output formats, including Adobe's PDF standard using the iText Library, Microsoft's Excel standard using the POI Library, and HTML, all highlighted in the following image. Other formats available include XML, plain text, Rich Text Format (RTF), and Comma-separated values (CSV). In addition to these output formats, a Pentaho Report can be rendered in Swing and directly printed using PostScript formatting, allowing print previewing capabilities.
Following is the screenshot showing one such report obtained using Report Designer.
Reports defined in Pentaho Reporting can specify at the pixel level where objects such as text or images should render. Using Pentaho Report Designer, it is easy to align fields and group items that need to stay aligned. While not always possible due to different format types such as XML, CSV, and plain text, the three main graphical outputs—HTML, PDF, and Excel—strive to look as similar as possible.
Rich formatting includes TrueType system font selection, the ability to render geometrical shapes and lines, along with the ability to include images and other objects in a report. This rich formatting is specified under the covers through styles similar to Cascading Style Sheets (CSS), separating out the format from the report detail. This makes it easier to modify and maintain reports, and also to apply corporate styles through the report wizard.
The Pentaho Reporting Engine and suite of tools make it easy to embed charts in reports, using the JFreeChart engine. Many chart types are available, including Bar, Histogram, Pie, and Line charts.
Pentaho Reporting provides easy-to-use tools to parameterize a report, allowing users to specify ranges and other values that customize the output of a report. Parameter values can be selected from a list of hardcoded values or driven from a query. With parameterization, end users may control the amount of information that is displayed on a report. The following screenshot is an example of parameter input from within Pentaho's Business Intelligence Server:
Report builders may define custom formulas and style expressions, using the OpenFormula standard, allowing for calculated values and dynamic formatting in their reports, such as aggregations, number formatting, as well as traffic lighting.
Pentaho Reporting allows report developers to include sub-reports within a master report. This provides a powerful capability, which allows reports to contain different smaller reports, both side-by-side and within the various bands of a report. These sub-reports may be based on different data sources.
This capability makes it possible to reuse detailed reports within multiple primary reports, as well as enabling a single report template to render multiple times in a single PDF document, allowing painless printing of a large number of reports. The following screenshot is an example of a report that includes a separate chart sub-report:
Cross Tab Reports present data in a spreadsheet-like format, making it easier to view summaries of data. Cross Tab Reports present both row and column headers, as well as cells of data, all of which can be customized through report elements.
While most reports are static after being rendered, a subset of reporting includes functionality such as drill through, pivoting, and other interactivity. Pentaho Reporting provides a straightforward Java and JavaScript API for manipulating a report after it has been rendered, allowing report builders to create very interactive reports. Pentaho Reporting's Interactive functionality is available when rendering a report in HTML, Excel or Swing. Links to external documents can also be added to PDF documents.
The following screenshot shows a report with links, that when clicked launches a more detailed report:
While it is possible to build Pentaho reports using either XML or a Java API, most reports begin as templates built by the Pentaho Report Designer. Pentaho Report Designer is a What-You-See-Is-What-You-Get (WYSIWYG) report editor that exposes the rich set of features provided by the Pentaho Reporting Engine. In addition to building a report from scratch, the Report Design Wizard, included as a part of the Pentaho Report Designer, walks a report author through building a report, which will then be displayed in the Report Designer for further customization.
As a part of the Pentaho suite, reports created by Pentaho Reporting may be published, executed, and scheduled on Pentaho's Business Intelligence Server. The BI Server offers authentication and authorization, as well as a central repository, to manage your business reports. The BI Server also hosts the web-based Ad Hoc Reporting user interface for creating Pentaho Metadata-based reports. By combining the use of Pentaho Report Designer and Pentaho's BI Server, there is no need to write any code to get your business up and running with Pentaho Reporting.
Pentaho Reporting comes with a well-documented Java API for building reports from the ground up, so developers can stick with the Java programming language when customizing existing report templates or building reports from scratch. This Java API allows developers to create and modify the various sections of a report, including the various header, footer, group and detail bands, along with creating and modifying objects within each section of a report.
Pentaho Reporting is designed from the ground up in pure Java, exposing many interfaces for extension. From implementing basic formulas and functions that can be embedded in reports, to writing a custom data source or output format, Pentaho Reporting's source code and API interfaces are well documented and easy to work with.
One very attractive feature of Pentaho Reporting is its license. Pentaho Reporting is available for free under the GNU Lesser General Public License. This license allows other open source and proprietary projects to embed Pentaho Reporting without fear of large license fees or viral open source limitations. As an open source project, developers also have unprecedented access to the engine and to a large group of software developers within the Pentaho Reporting community. This community includes open discussion forums, Internet Relay Chat (IRC) along with commercial support and licensing, if required.
In addition to these features, Pentaho Reporting is in active development. Please visit http://reporting.pentaho.org to learn more about what additional features and functionality are being considered for development, or to access early release versions of the product.
The Pentaho Reporting Engine is broken up into eleven main Java projects, which are then combined to author and render reports. The Pentaho Reporting Engine is backward compatible to Java 1.2.2, making certain that it stays as lightweight and as useful as possible. Most of the eleven libraries are independently useful for Java developers, outside of using them strictly for reporting purposes. The following diagram describes the various dependencies between each of the reporting projects:
LibBase is the root library for all other Pentaho Reporting libraries. This library contains common capabilities, such as debug and error logging utilities, library configuration, along with library initialization APIs, for consistent startup and shutdown management of the reporting engine.
LibDocBundle abstracts the management of Pentaho Reporting file bundles, which are by default stored as ZIP files, and implements the OpenDocument format (ODF). This makes it simpler for other parts of the reporting engine to work with and manipulate Pentaho Reporting's file formats.
LibFonts allows Pentaho Reporting to work with TrueType system fonts, extracting the necessary metadata from font types, populating an abstract interface to allow appropriate rendering in various contexts, including PDF and Excel views.
LibFormat is a string formatting library, which can render dates and numbers appropriately based on format strings. This library is focused on memory and CPU efficiency for high performance report rendering.
LibFormula is a formula parsing and execution library based on the OpenFormula standard. You can learn more about OpenFormula by visiting http://wiki.oasis-open.org/office/About_OpenFormula. This library is similar in function to Excel-based formula definitions. LibFormula is a very general library, and is used outside Pentaho Reporting in other projects that require OpenFormula style parsing and execution.
LibLoader manages the loading and caching of all necessary resources required for generating reports in a generic way, providing a simple API for other parts of the reporting engine that control static and dynamic content, including data sources and images.
LibRepository abstracts the input and output of hierarchical storage systems, such as file systems, that Pentaho Reporting interacts with. This makes it possible for a custom storage system such as FTP, to be implemented and to be mapped to the API, giving Pentaho Reporting access to the system.
LibSerializer provides helper methods for serializing non-serializable objects. This is necessary so that the reporting engine can serialize standard Java classes that don't implement Java's Serializable interface.
LibXml provides utility classes for SAX (Simple API for XML) parsing and XML writing, based on Java's JAXP (Java API for XML Parsing) API. This library assures the speedy loading and validation of Pentaho Reporting XML template files.
The Report Engine Core project contains the main reporting algorithm for rendering reports, along with the necessary functionality to support styling. This project also contains the algorithms for rendering specific outputs, including PDF, Excel, CSV, XML, and more. The engine relies on the already mentioned Lib libraries for managing the loading, parsing, formatting, rendering, and archiving of generated reports.
The Report Engine Extensions project contains third-party extensions to the reporting engine, which are very useful, but increase dependencies. Extensions in this project include JavaScript Expression support using the Rhino project, a Hibernate data source factory, Barcode support using Barbecue, Sparkline support, along with additional JDK 1.4 support for configuration and printing. Additional extension projects exist that include charting and many of the data sources discussed in this book.
When combined, these libraries form the Pentaho Reporting Engine. In addition to these libraries, there are also other related open source tools and projects in the Pentaho Reporting landscape, including the Report Engine Demo, Report Design Wizard, Report Designer, and the web-based Ad Hoc Reporting user interface.
In this chapter, we've highlighted some typical uses of Pentaho Reporting, providing you with baseline ideas for implementing your own solutions. Typical uses for embedded reporting include operational, business intelligence, financial, and production reporting.
We've covered the unique history of Pentaho Reporting, from its JFreeReport roots to its current status as Pentaho Reporting. We've learned about the individuals who have built Pentaho Reporting from a spare time open source project into an enterprise level reporting engine, competing with proprietary reporting engines.
We've also learned a great deal about the rich features of Pentaho Reporting. Core features include a wide variety of data source integration, along with PDF, HTML, and Excel rendering. On the other hand, more advanced features include sub-reports and cross tab reports. Additionally, developer-oriented features such as open Java APIs, along with the available source code and a business-friendly LGPL open source license gives Pentaho Reporting a leg up on all other Java Reporting toolkits.
The architecture of Pentaho Reporting is also covered in this chapter, providing developers with a twenty thousand foot view of where they might be able to modify or contribute to the Pentaho Reporting Engine, along with giving them the ultimate flexibility of access to source code.
You'll soon be able to apply the rich feature set of Pentaho Reporting to your use case. In the following chapters, we'll introduce you to Pentaho Reporting's easy to use Report Designer and Java API, making it fun and easy to embed reporting into your Java application.