Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Preparing Hive data

The following steps will make you aware of web based administration:

Launch your web browser, and in the address bar, type http://192.168.1.122:8888 to launch a Hortonworks Sandbox home page.
In the menu bar, click on the HCatalog menu.
In the Actions menu, click on the Create a new table from a file link.
In the Table Name textbox, type price_history.
Leave the Description textbox blank.
Click on the Choose a file button next to the Input File textbox.
When the Choose a file dialog appears, click on the Upload a file button. Navigate to the product-price-history.tsv.gz file—no need to extract it—and click on Open. Once the upload process finishes, the file will appear in the listbox. Now, click on the filename to close the dialog.
You may need to wait a few moments before the HCatalog automatically detects the file structure based on its content. It also shows a data preview in the lower part of the page. Note that it automatically detects all the column names from the first line of the file. The following screenshot shows the HCatalog Data Preview page display:
Click on the Create Table button; the Hive import data begins immediately.
The HCatalog Table List page appears; note that the price_history table is updated in the list. Click on the Browse button next to the table name to explore the data.
In the menu bar, click on the Beeswax (Hive UI) menu.
A Query Editor page appears; type the following query and click on the Execute button.
```
Select * from price_history;
```
Shortly, you will see the query result in a tabular view.
While you are executing the query, until it finishes, the left panel displays a box with the MR JOB (MapReduce Job) identifier. It indicates that every SQL-like query in Hive is actually a transparent Hadoop MapReduce process.
The identifier format will be in job_yyyyMMddhhmm_sequence. Now, when you click on the link, the job browser page appears and should look similar to the following screenshot:
Now, we will drop this table from Hive. In the menu bar, choose the HCatalog menu. The HCatalog Table List page appears; make sure the checkbox labeled price_history is checked.
Click on the Drop button. In the confirmation dialog, click on Yes. It drops the table immediately. The following screenshot shows you how to drop a table using HCatalog:
Note
The price_history table consists of 45,019 rows of data. In Chapter 3, Churning Big Data with Pentaho, we will show you how to use Pentaho Data Integration to generate and populate the same data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Preparing Hive data

Create new playlist

Sign In

Sign Up

Preparing Hive data

Note

Table of Contents for
Preparing Hive data