Pentaho for Big Data Analytics

Pentaho for Big Data Analytics
eBook: $23.99
Formats: PDF, PacktLib, ePub and Mobi formats
save 15%!
Print + free eBook + free PacktLib access to the book: $63.98    Print cover: $39.99
save 37%!
Free Shipping!
UK, US, Europe and selected countries in Asia.
Also available on:
Table of Contents
Sample Chapters
  • A guide to using Pentaho Business Analytics for big data analysis
  • Learn Pentaho’s visualization and reporting tools with practical examples and tips
  • Precise insights into churning big data into meaningful knowledge with Pentaho

Book Details

Language : English
Paperback : 118 pages [ 235mm x 191mm ]
Release Date : November 2013
ISBN : 1783282150
ISBN 13 : 9781783282159
Author(s) : Manoj R Patil, Feris Thia
Topics and Technologies : All Books, Big Data and Business Intelligence, Virtualization and Cloud, Open Source

Table of Contents

Chapter 1: The Rise of Pentaho Analytics along with Big Data
Chapter 2: Setting Up the Ground
Chapter 3: Churning Big Data with Pentaho
Chapter 4: Pentaho Business Analytics Tools
Chapter 5: Visualization of Big Data
Appendix A: Big Data Sets
Appendix B: Hadoop Setup
    • Chapter 2: Setting Up the Ground
      • Pentaho BI Server and the development platform
      • Prerequisites/system requirements
      • Obtaining Pentaho BI Server (Community Edition)
      • The JAVA_HOME and JRE_HOME environment variables
      • Running Pentaho BI Server
      • Pentaho User Console (PUC)
      • Pentaho Action Sequence and solution
      • The JPivot component example
      • The message template component example
      • The embedded HSQLDB database server
      • Pentaho Marketplace
      • Saiku installation
      • Pentaho Administration Console (PAC)
      • Creating data connections
      • Summary
      • Chapter 3: Churning Big Data with Pentaho
        • An overview of Big Data and Hadoop
          • Big Data
          • Hadoop
        • The Hadoop architecture
          • The Hadoop ecosystem
          • Hortonworks Sandbox
        • Pentaho Data Integration (PDI)
          • The Pentaho Big Data plugin configuration
        • Importing data to Hive
        • Putting a data file into HDFS
        • Loading data from HDFS into Hive (job orchestration)
        • Summary
        • Chapter 4: Pentaho Business Analytics Tools
          • The business analytics life cycle
          • Preparing data
            • Preparing BI Server to work with Hive
            • Executing and monitoring a Hive MapReduce job
          • Pentaho Reporting
          • Data visualization and dashboard building
            • Creating a layout using a predefined template
            • Creating a data source
            • Creating a component
          • Summary
          • Chapter 5: Visualization of Big Data
            • Data visualization
            • Data source preparation
              • Repopulating the nyse_stocks Hive table
              • Pentaho's data source integration
              • Consuming PDI as a CDA data source
            • Visualizing data using CTools
              • Visualizing trends using a line chart
              • Interactivity using a parameter
              • Multiple pie charts
              • Waterfall charts
            • CSS styling
            • Summary
              • Appendix B: Hadoop Setup
                • Hortonworks Sandbox
                  • Setting up the Hortonworks Sandbox
                  • Hortonworks Sandbox web administration
                • Transferring a file using secure FTP
                • Preparing Hive data
                • The nyse_stocks sample data

                Manoj R Patil

                Manoj R Patil is the Chief Architect in Big Data at Compassites Software Solutions Pvt. Ltd. where he overlooks the overall platform architecture related to Big Data solutions, and he also has a hands-on contribution to some assignments. He has been working in the IT industry for the last 15 years. He started as a programmer and, on the way, acquired skills in architecting and designing solutions, managing projects keeping each stakeholder's interest in mind, and deploying and maintaining the solution on a cloud infrastructure. He has been working on the Pentaho-related stack for the last 5 years, providing solutions while working with employers and as a freelancer as well. Manoj has extensive experience in JavaEE, MySQL, various frameworks, and Business Intelligence, and is keen to pursue his interest in predictive analysis. He was also associated with TalentBeat, Inc. and Persistent Systems, and implemented interesting solutions in logistics, data masking, and data-intensive life sciences.

                Feris Thia

                Feris Thia is a founder of PHI-Integration, a Jakarta-based IT consulting company that focuses on data management, data warehousing and Business Intelligence solutions. As a technical consultant, he has spent the last seven years delivering solutions with Pentaho and the Microsoft Business Intelligence platform across various industries, including retail, trading, finance/banking, and telecommunication. He is also a member and maintainer of two very active local Indonesian discussion groups related to Pentaho ( and Microsoft Excel (the Facebook group). His current activities include research and building software based on Big Data and the data mining platform, that is, Apache Hadoop, R, and Mahout. He would like to work on a book with a topic on analyzing customer behavior using the Apache Mahout platform.

                Code Downloads

                Download the code and support files for this book.

                Submit Errata

                Please let us know if you have found any errors not listed on this list by completing our errata submission form. Our editors will check them and add them to this list. Thank you.

                Sample chapters

                You can view our sample chapters and prefaces of this title on PacktLib or download sample chapters in PDF format.

                Frequently bought together

                Pentaho for Big Data Analytics +    Network Analysis using Wireshark Cookbook =
                50% Off
                the second eBook
                Price for both: $37.50

                Buy both these recommended eBooks together and get 50% off the cheapest eBook.

                What you will learn from this book

                • Get to grips with the Pentaho suite
                • Explore the basics of Big Data and its business context
                • Set up a Pentaho business analytics server
                • Consume Big Data on HDFS platform using Pentaho Data Integration
                • Create visualization with Pentaho's tools
                • Distinguish signal from noise with Pentaho's Data Analytics capabilities
                • Design and set up your own Pentaho dashboard
                • Move from data to analytics in just a few steps with Community Dashboard Framework (CDF)

                In Detail

                Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics and data integration. The real power of big data analytics is the abstraction between data and analytics. Data can be distributed across the cluster in various formats, and the analytics platform should have the capability to talk to different heterogeneous data stores and fetch the filtered data to enrich its value.

                Pentaho Big Data Analytics is a practical, hands-on guide that provides you with clear, step-by-step exercises for using Pentaho to take advantage of big data systems, where data beats algorithm, and gives you a good grounding in using Pentaho Business Analytics’ capabilities.

                This book looks at the key ingredients of the Pentaho Business Analytics platform. We will see how to prepare the Pentaho BI environment, and get to grips with the big data ecosystem through. The book provides a clear guide to the essential tools of Pentaho Business Analytics, providing familiarity with both the various design tools for setting up reports, and the visualization tools necessary for complete data analysis.


                The book is a practical guide, full of step-by-step examples that are easy to follow and implement.

                Who this book is for

                This book is for developers, system administrators, and business intelligence professionals looking to learn how to get more out of their data through Pentaho. In order to best engage with the examples, some knowledge of Java will be required.

                Code Download and Errata
                Packt Anytime, Anywhere
                Register Books
                Print Upgrades
                eBook Downloads
                Video Support
                Contact Us
                Awards Voting Nominations Previous Winners
                Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
                Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software