Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Importing data to Hive

Before we begin the walkthrough, see Appendix A, Big Data Sets, to complete the Hive nyse_stocks data preparation and follow these steps:

Launch Spoon if you have closed it.
On the File menu, click on New and select Transformation.
On the left-hand side panel, click on the View tab.
Right-click on the Database connections node to show up a contextual menu and choose New.

The following screenshot shows you how to create a new database connection:

When the Database Connection dialog appears, fill in the following configuration:

Connection Name: HIVE2
Connection Type: Hadoop Hive 2
Host Name: [your working IP address]
Database Name: default

Now follow these steps:

Click on the Test button to verify the connection. If successful, click on the OK button to close it. The display window will look like the following screenshot:
On the left-hand side panel, click on the Design tab.
In the Input group, click on the Table input step and drag it into the working space. The following screenshot shows how the Table input step can be put into the working space:
Double-click on the Table input step and the editor appears. On the connection listbox, select HIVE2. Type the following query into the SQL editor pane. Set the Limit Size parameter to 65535. We plan to export the data to an Excel file; the 65535 threshold is the limit up to which an Excel file is able to store data.
```
SELECT * FROM nyse_stocks
```
Click on the Preview button; the preview dialog appears, and then click on the OK button. Shortly, it will show a data preview of nyse_stocks, a Hive table.
This process is actually a Hadoop MapReduce job; see the Preparing Hive data section in Appendix B, Hadoop Setup, which shows the job logs.
The following screenshot shows a data preview of the Hive query:
Click on the Close and OK buttons, respectively, to close all the open dialogs.
On the File menu, choose Save and name the file export-hive-to-excel.ktr.
In the Output group, choose and put the Microsoft Excel Output step into the workspace. The following screenshot shows the newly added Microsoft Excel Output step:
Press Ctrl + click on the Table input step followed by pressing Ctrl + click on the Microsoft Excel Output step. Right-click on one of the steps—a contextual dialog will pop up—and choose New Hop. A hop represents data or control flow among steps. Make sure the configuration looks similar to the following screenshot; click on the OK button.
Your transformation will look similar to the following screenshot:
Double-click on Microsoft Excel Output; the editor dialog appears. On the File tab, specify your output filename in the Filename textbox. Click on the OK button to close the dialog. The following screenshot shows a Microsoft Excel Output step dialog:
On the menu bar, right below the transformation tab, click on the Run this transformation or job button. On the Execute a transformation dialog, click on the Launch button. The following screenshot shows the running of a transformation or job menu in PDI:
If the transformation runs successfully, explore the data transfer activity metrics. The following screenshot shows that the transformation runs in 58 seconds:
Open the Excel file result. The displayed data will look similar to the following screenshot: