Cloudera Hadoop and HP Vertica

Exclusive offer: get 50% off this eBook here
Business Intelligence with MicroStrategy Cookbook

Business Intelligence with MicroStrategy Cookbook — Save 50%

Over 90 practical, hands-on recipes to help you build your MicroStrategy business intelligence project, including more than a 100 screencasts with this book and ebook

$32.99    $16.50
by Davide Moraschi | October 2013 | Enterprise Articles

In this article by Davide Moraschi, author of Business Intelligence with MicroStrategy Cookbook, explains how to connect from MicroStrategy to one of the most commonly used platforms for Big Data. The distribution used is CH4 with Impala and also demonstrates the connection to a common platform for columnar databases, the Vertica engine by HP. The distribution used is the Community Edition, freely available from the vendor site.

(For more resources related to this topic, see here.)

Cloudera Hadoop

Hadoop is one of the names we think about when it comes to Big Data. I'm not going into details about it since there is plenty of information out there; moreover, like somebody once said, "If you decided to use Hadoop for your data warehouse, then you probably have a good reason for it". Let's not forget: it is primarily a distributed filesystem, not a relational database.

That said, there are many cases when we may need to use this technology for number crunching, for example, together with MicroStrategy for analysis and reporting.

There are mainly two ways to leverage Hadoop data from MicroStrategy: the first is Hive and the second is Impala. They both work as SQL bridges to the underlying Hadoop structures, converting standard SELECT statements into jobs. The connection is handled by a proprietary 32-bit ODBC driver available for free from the Cloudera website.

In my tests, Impala resulted largely faster than Hive, so I will show you how to use it from our MicroStrategy virtual machine.

Please note that I am using Version 9.3.0 for consistency with the rest of the book. If you're serious about Big Data and Hadoop, I strongly recommend upgrading to 9.3.1 for enhanced performance and easier setup. See MicroStrategy knowledge base document TN43588 : Post-Certification of Cloudera Impala 1.0 with MicroStrategy 9.3.1 .

The ODBC driver is the same for both Hive and Impala, only the driver settings change.

Connecting to a Hadoop database

To show how we can connect to a Hadoop database, I will use two virtual machines: one with MicroStrategy Suite and the second with Cloudera Hadoop distribution, specifically, a virtual appliance that is available for download from their website.

The configuration of the Hadoop cluster is out of scope; moreover, I am not a Hadoop expert. I'll simply give some hints, feel free to use any other configuration/vendor, the procedure and ODBC parameters should be similar.

Getting ready

Start by going to http://at5.us/AppAU1

The Cloudera VM download is almost 3 GB (cloudera-quickstart-vm-4.3.0-vmware.tar.gz) and features the CH4 version. After unpacking the archive, you'll find a cloudera-quickstart-vm-4.3.0-vmware.ovf file that can be opened with VMware, see screen capture:

Accept the defaults and click on Import to generate the cloudera-quickstart-vm-4.3.0-vmware virtual machine.

Before starting the Cloudera appliance, change the network card settings from NAT to Bridged since we need to access the database from another VM:

Leave the rest of the parameters, as per the default, and start the machine.

After a while, you'll be presented with a graphical interface of Centos Linux. If the network has started correctly, the machine should have received an IP address from your network DHCP. We need a fixed rather than dynamic address in the Hadoop VM, so:

  1. Open the System | Preferences | Network Connections menu.
  2. Select the name of your card (should be something like Auto eth1 ) and click on Edit… .
  3. Move to the IPv4 Settings tab and change the Method from Automatic (DHCP) to Manual .
  4. Click on the Add button to create a new address. Ask your network administrator for details here and fill Address , Netmask , and Gateway .
  5. Click on Apply… and when prompted type the root password cloudera and click on Authenticate . Then click on Close .
  6. Check if the change was successful by opening a Terminal window (Applications | System Tools | Terminal ) and issue the ifconfig command, the answer should include the address that you typed in step 4.
  7. From the MicroStrategy Suite virtual machine, test if you can ping the Cloudera VM.

    When we first start Hadoop, there are no tables in the database, so we create the samples:

  8. In the Cloudera virtual machine, from the main page in Firefox open Cloudera Manager , click on I Agree in the Information Assurance Policy dialog.
  9. Log in with username admin and password admin.
  10. Look for a line with a service named oozie1 , notice that it is stopped. Click on the Actions button and select Start… .
  11. Confirm with the Start button in the dialog. A Status window will pop up, wait until the Progress is reported as Finished and close it.
  12. Now click on the Hue button in the bookmarks toolbar.
  13. Sign up with username admin and password admin, you are now in the Hue home page.
  14. Click on the first button in the blue Hue toolbar (tool tip: About Hue ) to go to the quick Start Wizard .
  15. Click on the Next button to go to Step 2: Examples tab.
  16. Click on Beeswax (Hive UI) and wait until a message over the toolbar says Examples refreshed .
  17. Now in the Hue toolbar, click on the seventh button from the left (tool tip: Metastore Manager ), you will see the default database with two tables: sample_07 and sample_08 .
  18. Enable the checkbox of sample_08 and click on the Browse Data button. After a while the Results tab shows a grid with data. So far so good.
  19. We now go back to the Cloudera Manager to start the Impala service. Click on the Cloudera Manager bookmark button.
  20. In the Impala1 row, open the Actions menu and choose Start… , then confirm Start .
  21. Wait until the Progress says Finished , then click on Close in the command details window.
  22. Go back to Hue and click on the fourth button on the toolbar (tool tip: Cloudera Impala (TM) Query UI ).
  23. In the Query Editor text area, type select * from sample_08 and click on Execute to see the table content.

Next, we open the MicroStrategy virtual machine and download the 32-bit Cloudera ODBC Driver for Apache Hive, Version 2.0 from http://at5.us/AppAU2.

Download the ClouderaHiveODBCSetup_v2_00.exe file and save it in C:\install.

How to do it...

We install the ODBC driver:

  1. Run C:\install\ClouderaHiveODBCSetup_v2_00.exe and click on the Next button until you reach Finish at the end of the setup, accepting every default.
  2. Go to Start | All Programs | Administrative Tools | Data Sources (ODBC) to open the 32-bit ODBC Data Source Administrator (if you're on 64-bit Windows, it's in the SysWOW64 folder).
  3. Click on System DSN and hit the Add… button.
  4. Select Cloudera ODBC Driver for Apache Hive and click on Finish .

  5. Fill the Hive ODBC DSN Configuration with these case-sensitive parameters (change the Host IP according to the address used in step 4 of the Getting ready section):
    • Data Source Name : Cloudera VM
    • Host : 192.168.1.40
    • Port : 21050
    • Database : default
    • Type : HS2NoSasl
  6. Click on OK and then on OK again to close the ODBC Data Source Administrator.
  7. Now open the MicroStrategy Desktop application and log in with administrator and the corresponding password.
  8. Right-click on MicroStrategy Analytics Modules and select Create New Project… .
  9. Click on the Create project button and name it HADOOP, uncheck Enable Change Journal for this project and click on OK .
  10. When the wizard finishes creating the project click on Select tables from the Warehouse Catalog and hit the button labeled New… .
  11. Click on Next and type Cloudera VM in the Name textbox of the Database Instance Definition window.
  12. In this same window, open the Database type combobox and scroll down until you find Generic DBMS . Click on Next .
  13. In Local system ODBC data sources , pick Cloudera VM and type admin in both Database login and Password textboxes.
  14. Click on Next , then on Finish , and then on OK .
  15. When a Warehouse Catalog Browser error appears, click on Yes .
  16. In the Warehouse Catalog Options window, click on Edit… on the right below Cloudera VM .
  17. Select the Advanced tab and enable the radio button labeled Use 2.0 ODBC calls in the ODBC Version group.
  18. Click on OK . Now select the category Catalog | Read Settings in the left tree and enable the first radio button labeled Use standard ODBC calls to obtain the database catalog .
  19. Click on OK to close this window. When the Warehouse Catalog window appears, click on the lightning button (tool tip: Read the Warehouse Catalog ) to refresh the list of available tables.
  20. Pick sample_08 and move it to the right of the shopping cart. Then right-click on it and choose Import Prefix .
  21. Click on Save and Close and then on OK twice to close the Project Creation Assistant .
  22. You can now open the project and update the schema.

From here, the procedure to create objects is the same as in any other project:

  1. Go to the Schema Objects | Attributes folder, and create a new Job attribute with these columns:
    • ID :

      Table: sample_08

      Column: code

    • DESC :

      Table: sample_08

      Column: description

  2. Go to the Fact folder and create a new Salary fact with salary column. Update the schema.
  3. Go to the Public Objects | Metrics folder and create a new Salary metric based on the Salary fact with Sum as aggregation function.
  4. Go to My Personal Objects | My Reports and create a new report with the Job attribute and the Salary metric:

There you go; you just created your first Hadoop report.

How it works...

Executing Hadoop reports is no different from running any other standard DBMS reports. The ODBC driver handles the communication with Cloudera machine and Impala manages the creation of jobs to retrieve data. From MicroStrategy perspective, it is just another SELECT query that returns a dataset.

There's more...

Impala and Hive do not support the whole set of ANSI SQL syntax, so in some cases you may receive an error if a specific feature is not implemented:

See the Cloudera documentation for details.

HP Vertica

Vertica Analytic Database is grid-based, column-oriented, and designed to manage large, fast-growing volumes of data while providing rapid query performance. It features a storage organization that favors SELECT statements over UPDATE and DELETE plus a high compression that stores columns of homogeneous datatype together.

The Community (free) Edition allows up to three hosts and 1 TB of data, which is fairly sufficient for small to medium BI projects with MicroStrategy. There are several clients available for different operating systems, including 32-bit and 64-bit ODBC drivers for Windows.

Business Intelligence with MicroStrategy Cookbook Over 90 practical, hands-on recipes to help you build your MicroStrategy business intelligence project, including more than a 100 screencasts with this book and ebook
Published: October 2013
eBook Price: $32.99
Book Price: $54.99
See more
Select your format and quantity:

Connecting to a Vertica database

As in the previous appendix, we will use a virtual appliance with the Vertica software and the MicroStrategy machine. The procedure is similar, just the ODBC driver changes.

Getting ready

You need to create a username and log in to the https://my.vertica.com website before downloading; then go to address http://at5.us/AppDU1, and click on the VMWare Server 2.0 and Workstation 7.0 (vmdk) link.

When you unzip the archive, you'll find a Vertica 6.1.2 x64 for VMWare folder, open the Vertica 6.1.2 x64 for VMWare.vmx file. The network card should already be bridged, and there is no need to change it.

Power on the machine; it will boot (see the following screenshot) and stop at a row that says Starting sendmail :

At this point, one may be tempted to shut it down. Don't lose your patience; it is not stuck. I did the same error many times before finding the solution.

Looks like a DNS problem, after about 10 to 20 minutes, when the sendmail daemon times out, the boot will continue.

When the graphical interface appears, we can change the IP address and get rid of the failing services:

  1. Open the System | Administration | Server Settings | Services menu.
  2. When prompted, in Password for root , type password.
  3. In the Service Configuration window, Background Services tab, scroll down to find ntpd , and deselect its checkbox. Scroll down to sendmail and uncheck this as well.
  4. Click on the Save button in the toolbar and close this window.
  5. Now open the System | Administration | Network menu; here, you'll find a couple of lines with two devices named eth0.bak and eth0 , select eth0 and click on the Edit button.
  6. In the Ethernet Device window, enable the radio button labeled Statically set IP address .
  7. Ask your network administrator for details here, and fill in the Address , Subnet mask , and Default gateway address fields.
  8. You may need to move up the window to see the bottom; click on OK (the rightmost).
  9. Now go to the Hosts tab, select the line with Vertica and click on Edit .
  10. Change the Address field and write the same IP that you used in step 7, then click on OK .
  11. From the File menu, select Save and click on OK , and then close this window.
  12. Open System | Shutdown and hit Restart .
  13. When the machine reboots, go to System | Administration | Network .
  14. Check that the DNS tab reflects the correct Primary DNS for your network, change it if needed, as shown in the following screen capture:

  15. Reboot the machine. When it restarts open the Applications | Accessories | Terminal menu.
  16. In the Terminal window, type admintools.
  17. In the blue screen that appears, use the arrow keys to select the row Accept and press the Space bar to check it; then press the Enter key.
  18. In the Main Menu tab of this window, select Exit and press Enter .
  19. When you're back to the command prompt, type:
      /opt/vertica/sbin/install_example VMart
    
    
  20. The example data installation can take a little while, when it's finished, type the following command:
      /opt/vertica/bin/vsql
    
    
  21. This is the Vertica database command-line SQL utility. Now type:
      Select count(*) from store.store_sales_fact;
    
    
  22. The result shows a row count of 5 million. Type \q to quit vsql and close the terminal with exit.

Open the MicroStrategy virtual machine and test if you can ping the Vertica appliance, then download the HP Vertica Client Package for the Community Edition from:

Select the Windows 32-bit vertica-client-6.1.2-0.32.exe file and save it in C:\install.

How to do it...

We install the ODBC driver:

  1. Run C:\install\vertica-client-6.1.2-0.32.exe and click your way to the end of the installation accepting the defaults and click on Finish .
  2. Go to Start | All Programs | Administrative Tools | Data Sources (ODBC) to open the 32-bit ODBC Data Source Administrator (if you're on 64-bit Windows, it's in the SysWOW64 folder).
  3. Click on System DSN and hit the Add… button.
  4. Select the Vertica driver and click on Finish , you'll see the following dialog box:

  5. In the Vertica ODBC Data Source Configuration dialog, type these case-sensitive parameters (change the IP according to your network, as per step 7 in the Getting ready section):
    • Data Source : HP Vertica
    • Server : 192.168.1.30
    • Port : 5433
    • Database : VMart
  6. Click on OK and then on OK again to close the ODBC Data Source Administrator.
  7. Now open the MicroStrategy Desktop application and log in with Administrator and the corresponding password.
  8. Right-click on MicroStrategy Analytics Modules and select Create New Project… .
  9. Click on the Create project button and name it VERTICA, uncheck Enable Change Journal for this project and click on OK .
  10. When the wizard finishes creating the project, click on Select tables from the Warehouse Catalog and hit the button labeled as New… .
  11. Click on Next and type HP Vertica in the Name textbox of the Database Instance Definition window.
  12. In this same window, open the Database type combobox and scroll down until you find Vertica 6.0 . Click on Next .
  13. In Local system ODBC data sources , pick HP Vertica and type dbadmin in Database login and password in the Password textbox.
  14. Click on Next , on Finish , and then on OK .
  15. When the Warehouse Catalog window appears, pick customer_dimension, product_dimension and store_sales_fact , and move them to the right of the shopping cart. Then select all the three tables in the right side, right-click on them and choose Import Prefix .
  16. Click on Save and Close and then on OK twice to close the Project Creation Assistant .
  17. You can now open the project and update the schema.

From here on the procedure to create objects is the same as in any other project:

  1. Go to the Schema Objects | Attributes folder, and create a new Product attribute with these columns:
    • ID :

      Tables: product_dimension (lookup) and store_sales_fact

      Column: product_key

    • DESC :

      Table: product_dimension (lookup)

      Column: product_description

  2. Create a new Category attribute with this column:
    • ID :

      Table: product_dimension (lookup)

      Column: category_description

  3. Set the Product attribute as a child of Category .
  4. Go to the Fact folder and create a new Sales Dollar Amount fact with the sales_dollar_amount column from the stores_sales_fact table.
  5. Go to the Public Objects | Metrics folder and create a new Sales Dollar Amount metric based on the Sales Dollar Amount fact with Sum as aggregation function.
  6. Update the schema.
  7. Go to My Personal Objects | My Reports and create a new report with the Category attribute and the Sales Dollar Amount metric, see image:

Congratulations, you just created your first Vertica report, and it runs pretty fast for a 5 million fact table.

How it works...

MicroStrategy generates the correct SQL, provided that we import the prefix when selecting the tables in the Warehouse Catalog window; other than that, there is no difference from other systems.

There's more...

In the download section of the Vertica Community Edition website, in addition to ODBC and JDBC drivers for several platforms, you'll find connectors to work with Hadoop (including Cloudera distribution) and Informatica plugins.

Summary

In this article we learned about Cloudera Hadoop and how to connect from MicroStrategy to one of the most commonly used platforms for Big Data. Also, we got learned how to connect to a common platform for columnar databases, the Vertica engine by HP.

Resources for Article :


Further resources on this subject:


Business Intelligence with MicroStrategy Cookbook Over 90 practical, hands-on recipes to help you build your MicroStrategy business intelligence project, including more than a 100 screencasts with this book and ebook
Published: October 2013
eBook Price: $32.99
Book Price: $54.99
See more
Select your format and quantity:

About the Author :


Davide Moraschi

Davide Moraschi is a Business Intelligence contractor and trainer. He is a MicroStrategy Certified Engineering Principal, specialized in healthcare data. Born in Italy and married to a Japanese woman, he now lives in what is probably the hottest city in Europe, Seville. He speaks English, Spanish, and fluent SQL.

Since the early nineties he’s been working as a Database Developer and BI specialist for multinational companies (Microsoft, Novartis) or public institutions like the European Commission.

He’s the author of the blog http://blog.eurostrategy.net, and an active member of the MicroStrategy group on LinkedIn.

He can be reached at davidem@eurostrategy.net.

Books From Packt


Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology
Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology

Oracle Business Intelligence : The Condensed Guide to Analysis and Reporting
Oracle Business Intelligence : The Condensed Guide to Analysis and Reporting

 IBM Cognos Business Intelligence
IBM Cognos Business Intelligence

Oracle Business Intelligence 11g R1 Cookbook
Oracle Business Intelligence 11g R1 Cookbook

Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial
Oracle Business Intelligence Enterprise Edition 11g: A Hands-On Tutorial

 IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook
IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook

 Oracle E-Business Suite Financials R12: A Functionality Guide
Oracle E-Business Suite Financials R12: A Functionality Guide

Oracle Primavera Contract Management, Business Intelligence Publisher Edition v14
Oracle Primavera Contract Management, Business Intelligence Publisher Edition v14


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
H
X
7
N
i
a
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software