Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Putting a data file into HDFS

The previous example shows how PDI interacts with Hive using a SQL-like expression.

Now let's work with the framework filesystem, HDFS. We will copy a CSV text file into an HDFS folder. Follow these steps:

Download a compressed CSV sample file from http://goo.gl/EdJwk5.
Create a new job from Spoon.
Put data in the workspace and create a flow between the following steps:
- From the General grouping, choose START
- From the Big Data grouping, choose Hadoop Copy Files
Double-click on Hadoop Copy Files. The step's editor dialog will appear.
Click on the Browse button next to the File/Folder textbox. The Open File dialog appears; choose the file you have downloaded from step 1 and click on OK to close the dialog.
Remove the gz: prefix and exclamation mark symbol (!) suffix from the filename.
Click on the Browse button next to the File/Folder destination textbox.
Type in your HDFS server IP address and click on the Connect button. It may take a while before a connection is established. Once connected, select the /user/sample folder as the output folder. Do not click on OK at this stage, but rather copy the URL on to the clipboard. Click on Cancel. Paste the clipboard result into the File/Folder destination field.
Click on the Add button to put the filename path into the grid.
Save the job's filename as hdfs_copy.kjb.
Run the job.
The following screenshot shows the HDFS file browser dialog:
The following screenshot shows the local and remote HDFS paths:
Once the job is finished, you can validate whether the file has been successfully copied into HDFS or not by issuing the following command:
```
hadoop fs  -ls /user/sample/
```
The following screenshot shows the HDFS content after the copy process:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Putting a data file into HDFS

Create new playlist

Sign In

Sign Up

Putting a data file into HDFS

Table of Contents for
Putting a data file into HDFS