Preparing data

Pentaho Data Integration (PDI) is a great tool to prepare data thanks to its rich data connectors. We will not discuss PDI further here as we already discussed it in the latter part of Chapter 3, Churning Big Data with Pentaho.

Preparing BI Server to work with Hive

Before you proceed to the following examples, complete the steps listed in Appendix B, Hadoop Setup. Note that all the remaining examples work with the 192.168.1.122 IP address configuration at Hortonworks Sandbox VM.

The following steps will help you prepare BI Server to work with Hive:

  1. Copy the pentaho-hadoop-hive-jdbc-shim-1.3-SNAPSHOT.jar and pentaho-hadoop-shims-api-1.3-SNAPSHOT.jar files into the [BISERVER]/administration-console/jdbc and [BISERVER]/biserver-ce/tomcat/lib folders respectively. See Chapter 3, Churning Big Data with Pentaho, for information on how to obtain the JAR files.
  2. Launch Pentaho User Console (PUC).
  3. Copy the Chapter 4 folder from the book's code bundle folder into [BISERVER]/pentaho-solutions.
  4. From the Tools menu, choose Refresh and then Repository Cache.
  5. In the Browse pane, you should see a newly added Chapter 4 folder. Click on the folder.
  6. In the Files pane, double-click on the Show Tables in HIVE menu. If all goes well, the engine will execute an action sequence file—that is, hive_show_tables.xaction—and you will see the following screenshot that shows four tables contained in the HIVE database:
    Preparing BI Server to work with Hive

The .xaction file gets the result—a PDI transformation file—by executing the hive_show_tables.ktr file.

Note

If you want more information about Action Sequence and the client tool to design the .xaction file, see http://goo.gl/6NyxYZ and http://goo.gl/WgHbhE.

Executing and monitoring a Hive MapReduce job

The following steps will guide you to execute and monitor a Hive MapReduce job:

  1. While still in the Files pane, double-click on the PDI-Hive Java Query menu. This will execute hive_java_query.xaction, which in turn will execute the hive_java_query.ktr PDI transformation. This will take longer to display the result than the previous one.
  2. While this is executing, launch a web browser and type in the job's browser address, http://192.168.1.122:8000/jobbrowser.
  3. Remove hue from the Username textbox. In the Job status listbox, choose Running. You will find that there is one job running as an anonymous user. The page will then look like the following screenshot:
    Executing and monitoring a Hive MapReduce job
  4. Click on the Job Id link, the Recent Tasks page appears, which lists a MapReduce process stage. Refresh the page until all the steps are complete. The page will look like the following screenshot:
    Executing and monitoring a Hive MapReduce job
  5. Back to PUC, you will find the Hive query result, which is actually a MapReduce process result.
    Executing and monitoring a Hive MapReduce job
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset