Pentaho Data Integration (PDI) is a great tool to prepare data thanks to its rich data connectors. We will not discuss PDI further here as we already discussed it in the latter part of Chapter 3, Churning Big Data with Pentaho.
Before you proceed to the following examples, complete the steps listed in Appendix B, Hadoop Setup. Note that all the remaining examples work with the 192.168.1.122
IP address configuration at Hortonworks Sandbox VM.
The following steps will help you prepare BI Server to work with Hive:
pentaho-hadoop-hive-jdbc-shim-1.3-SNAPSHOT.jar
and pentaho-hadoop-shims-api-1.3-SNAPSHOT.jar
files into the [BISERVER]/administration-console/jdbc
and [BISERVER]/biserver-ce/tomcat/lib
folders respectively. See Chapter 3, Churning Big Data with Pentaho, for information on how to obtain the JAR
files.Chapter 4
folder from the book's code bundle folder into [BISERVER]/pentaho-solutions
.Chapter 4
folder. Click on the folder.hive_show_tables.xaction
—and you will see the following screenshot that shows four tables contained in the HIVE
database:The .xaction
file gets the result—a PDI transformation file—by executing the hive_show_tables.ktr
file.
If you want more information about Action Sequence and the client tool to design the .xaction
file, see http://goo.gl/6NyxYZ and http://goo.gl/WgHbhE.
The following steps will guide you to execute and monitor a Hive MapReduce job:
hive_java_query.xaction
, which in turn will execute the hive_java_query.ktr
PDI transformation. This will take longer to display the result than the previous one.http://192.168.1.122:8000/jobbrowser
.