Preparing Hive data

The following steps will make you aware of web based administration:

  1. Launch your web browser, and in the address bar, type http://192.168.1.122:8888 to launch a Hortonworks Sandbox home page.
  2. In the menu bar, click on the HCatalog menu.
  3. In the Actions menu, click on the Create a new table from a file link.
  4. In the Table Name textbox, type price_history.
  5. Leave the Description textbox blank.
  6. Click on the Choose a file button next to the Input File textbox.
  7. When the Choose a file dialog appears, click on the Upload a file button. Navigate to the product-price-history.tsv.gz file—no need to extract it—and click on Open. Once the upload process finishes, the file will appear in the listbox. Now, click on the filename to close the dialog.
  8. You may need to wait a few moments before the HCatalog automatically detects the file structure based on its content. It also shows a data preview in the lower part of the page. Note that it automatically detects all the column names from the first line of the file. The following screenshot shows the HCatalog Data Preview page display:
    Preparing Hive data
  9. Click on the Create Table button; the Hive import data begins immediately.
  10. The HCatalog Table List page appears; note that the price_history table is updated in the list. Click on the Browse button next to the table name to explore the data.
  11. In the menu bar, click on the Beeswax (Hive UI) menu.
  12. A Query Editor page appears; type the following query and click on the Execute button.
    Select * from price_history;

    Shortly, you will see the query result in a tabular view.

    While you are executing the query, until it finishes, the left panel displays a box with the MR JOB (MapReduce Job) identifier. It indicates that every SQL-like query in Hive is actually a transparent Hadoop MapReduce process.

    The identifier format will be in job_yyyyMMddhhmm_sequence. Now, when you click on the link, the job browser page appears and should look similar to the following screenshot:

    Preparing Hive data
  13. Now, we will drop this table from Hive. In the menu bar, choose the HCatalog menu. The HCatalog Table List page appears; make sure the checkbox labeled price_history is checked.
  14. Click on the Drop button. In the confirmation dialog, click on Yes. It drops the table immediately. The following screenshot shows you how to drop a table using HCatalog:
    Preparing Hive data

    Note

    The price_history table consists of 45,019 rows of data. In Chapter 3, Churning Big Data with Pentaho, we will show you how to use Pentaho Data Integration to generate and populate the same data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset