Reader small image

You're reading from  Instant Pentaho Data Integration Kitchen

Product typeBook
Published inJul 2013
Reading LevelBeginner
PublisherPackt
ISBN-139781849696906
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Sergio Ramazzina
Sergio Ramazzina
author image
Sergio Ramazzina

Sergio Ramazzina is an experienced software architect/trainer with more than 25 years of experience in the IT field. He has worked on a broad number of projects for banks and major Italian companies and has designed complex enterprise solutions in Java, JavaEE, and Ruby. He started using Pentaho products from the very beginning in late 2003. He gained thorough experience by deploying Pentaho as an open source BI solution, standalone or deeply integrated in other applications as the analytical engine of choice. In 2009, due to his experience in the Java/JavaEE world and appreciation for the open source world and its main ideas, he began participating actively as a contributor to some of the Pentaho projects such as JPivot, Saiku, CDF, and CDA and rose to the Pentaho Active Contributor level. At that time, he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main players. In late 2010, he founded Serasoft, a young Italian consulting firm that specializes in delivering high value open source Business Intelligence solutions. With the team in Serasoft, he shared his passion and experience in designing and delivering highly innovative enterprise solutions to help users make their work more effective. In July 2013, he published his first book, Instant Pentaho Data Integration Kitchen, Packt Publishing. He is also passionate about skiing, tennis, and photography, and he loves his young daughter, Camilla, very much. You can follow him on Twitter at @sramazzina. You can also look at his profile on LinkedIn at http://it.linkedin.com/in/sramazzina/.
Read more about Sergio Ramazzina

Right arrow

Discovering your PDI repository from the command line (Simple)


This recipe guides you through discovering the structure and content of your PDI repository using the PDI command-line tools. We can know anything about the repository from the command line: we can view the list of available repositories, or view the list of directories in the repository, view the list of jobs or transformations in the specified directory. This recipe will work the same for both Kitchen and Pan, with the exception of the listing of jobs and transformations in a repository's directory; the first works with Kitchen and the second with Pan.

Getting ready

To get ready for this recipe, you need to check that the JAVA_HOME environment variable is set properly and then configure your environment variables so that the Kitchen script can start from anywhere without specifying the complete path to your PDI home directory. For details about these checks, refer to the recipe Executing PDI jobs from a filesystem (Simple).

How to do it...

To get the list of the available repositories, perform the following steps:

  1. Sometimes we need to start a job or a transformation but we do not have the details of the repository we are going to interact with. The first thing we need to know is the name of the repository we are going to connect to to start our process. To get the name of the available repositories, we can use the listrep command-line argument.

  2. The usage is very simple because it does not need any value, just the name of the argument specified in the command line.

  3. Imagine that we need to find the list of the available repositories on Linux/Mac; the command to give is as follows:

    $ kitchen.sh –listrep
    
  4. To do the same thing on Windows, the command is written as follows:

    C:\temp\samples>Kitchen.bat /listrep
    
  5. The result we get is the repositories listed in a clear and concise form with the repository ID in the first column and the repository name in the second column:

    INFO  19-03 23:18:51,675 - Kitchen - Start of run.
    INFO  19-03 23:18:51,695 - RepositoriesMeta - Reading repositories XML file: /home/sramazzina/.kettle/repositories.xml
    List of repositories:
    #1 : sample3 [PDI Book Samples]  id=KettleFileRepository
    

To get the list of directories in a selected repository, perform the following steps:

  1. The next step after we have found the repository we were looking for could be to look for a job or transformation located somewhere in the repository.

  2. To do this, we need to get the list of available directories in the repository using the listdir argument used together with the following arguments:

    • The rep argument, to specify the name of the repository where we want to display the internal directory structure.

    • The dir argument, to give the directory name's starting point. The command will show you the directories contained in a specific directory. If this argument has not been specified, PDI assumes that you want to show all the directories contained in the root directory. Navigating through the structure of a complex repository is quite a tedious and iterative process, but something is better than nothing!

    • The user and –pass arguments, in case your repository is an authenticated repository, to specify the username and password that needs to be connected to.

  3. To find the list of the available directories in the root of the repository rep3, the command to fire on Linux/Mac is as follows:

    $ kitchen.sh –rep:sample3 –listdir
    
  4. To find the list of the available directories in the root of the repository rep3, the command to fire on Windows is as follows:

    C:\temp\samples>Kitchen.bat /rep:sample3 /listdir
    
  5. The command returns the list of available directories in the following form:

    INFO  20-03 07:07:17,236 - Kitchen - Start of run.
    INFO  20-03 07:07:17,252 - RepositoriesMeta - Reading repositories XML file: /home/sramazzina/.kettle/repositories.xml
    dir2
    dir1
    
  6. The directory dir1 has a subdirectory, subdir11; to show this directory, we need to specify another command that for Linux/Mac is as follows:

    $ kitchen.sh –rep:sample3 –dir:dir1 –listdir
    

    And for Windows, the command is as follows:

    C:\temp\samples>Kitchen.bat /rep:sample3 /dir:dir1 /listdir
    
  7. PDI will show us the children of the directory dir1 as follows:

    INFO  20-03 07:07:34,324 - Kitchen - Start of run.
    INFO  20-03 07:07:34,339 - RepositoriesMeta - Reading repositories XML file: /home/sramazzina/.kettle/repositories.xml
    subdir11
    
  8. If you're checking the directories of an authenticated repository, the command will change as follows for Linux/Mac:

    $ kitchen.sh –user:pdiuser –pass:password –rep:sample3 –listdir
    

    And the command will change as follows for Windows:

    C:\temp\samples>Kitchen.bat /user:pdiuser /pass:password /rep:sample3 /listdir
    
  9. The output of the command will remain the same.

To get the list of jobs in a specified directory, perform the following steps:

  1. Now that we know about the internals of our repository, we're ready to look for our jobs.

  2. The argument used to show the list of the available jobs in a specified directory is listjob. This argument must be used together with the following:

    • The rep argument, to specify the name of the repository where we want to display the internal directory structure.

    • The dir argument, to give the name of the directory. The command will show you the jobs contained in a specific directory. If this argument is not specified, PDI assumes that you want to show all the jobs contained in the root directory.

    • The user and pass arguments, in case your repository is an authenticated repository, to specify the username and password that needs to be connected to.

  3. To find the list of the available jobs in the root directory of the repository rep3, the command to fire on Linux/Mac is as follows:

    $ kitchen.sh –rep:sample3 –listjobs
    
  4. To find the list of the available jobs in the root directory of the repository rep3, the command to fire on Windows is as follows:

    C:\temp\samples>Kitchen.bat /rep:sample3 /listjobs
    
  5. The command returns the list of available jobs in the following form:

    INFO  20-03 07:30:46,642 - Kitchen - Start of run.
    INFO  20-03 07:30:46,657 - RepositoriesMeta - Reading repositories XML file: /home/sramazzina/.kettle/repositories.xml
    export-job
    
  6. If you're checking the jobs in an authenticated repository, the command on Linux/Mac will change in the following way:

    $ kitchen.sh –user:pdiuser –pass:password –rep:sample3 –listjobs
    
  7. If you're checking the jobs in an authenticated repository, the command on a Windows platform will change in the following way:

    C:\temp\samples>Kitchen.bat /user:pdiuser /pass:password /rep:sample3 /listjobs
    

To get the list of transformations in a specified directory, perform the following steps:

  1. The Pan script lets us see the list of transformations contained in a directory of our repository.

  2. To do this, we need to specify the –listtrans argument together with the same arguments specified for the –listjobs argument; for details about this, please refer to the previous paragraph to get a detailed explanation of the meaning and syntax of those arguments.

  3. To find the list of the available transformations in the root directory of the repository rep3, the command to fire on Linux/Mac is as follows:

    $ pan.sh –rep:sample3 –listtrans
    
  4. To find the list of the available transformations in the root directory of the repository rep3, the command to fire on Windows is as follows:

    C:\temp\samples>Pan.bat /rep:sample3 /listtrans
    
  5. The command returns the list of available transformations in the following form:

    INFO  20-03 07:35:10,073 - Pan - Start of run.
    INFO  20-03 07:35:10,103 - RepositoriesMeta - Reading repositories XML file: /home/sramazzina/.kettle/repositories.xml
    read-customers
    
  6. Anything applied to the display of jobs contained in a specific directory and the ability to apply the same command to an authenticated repository applies here to transformations as well; the only recommendation is to remember to use the Pan script instead of the Kitchen script.

Previous PageNext Page
You have been reading a chapter from
Instant Pentaho Data Integration Kitchen
Published in: Jul 2013Publisher: PacktISBN-13: 9781849696906
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Sergio Ramazzina

Sergio Ramazzina is an experienced software architect/trainer with more than 25 years of experience in the IT field. He has worked on a broad number of projects for banks and major Italian companies and has designed complex enterprise solutions in Java, JavaEE, and Ruby. He started using Pentaho products from the very beginning in late 2003. He gained thorough experience by deploying Pentaho as an open source BI solution, standalone or deeply integrated in other applications as the analytical engine of choice. In 2009, due to his experience in the Java/JavaEE world and appreciation for the open source world and its main ideas, he began participating actively as a contributor to some of the Pentaho projects such as JPivot, Saiku, CDF, and CDA and rose to the Pentaho Active Contributor level. At that time, he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main players. In late 2010, he founded Serasoft, a young Italian consulting firm that specializes in delivering high value open source Business Intelligence solutions. With the team in Serasoft, he shared his passion and experience in designing and delivering highly innovative enterprise solutions to help users make their work more effective. In July 2013, he published his first book, Instant Pentaho Data Integration Kitchen, Packt Publishing. He is also passionate about skiing, tennis, and photography, and he loves his young daughter, Camilla, very much. You can follow him on Twitter at @sramazzina. You can also look at his profile on LinkedIn at http://it.linkedin.com/in/sramazzina/.
Read more about Sergio Ramazzina