Put data in the workspace and create a flow between the following steps:
From the General grouping, choose START
From the Big Data grouping, choose Hadoop Copy Files
Double-click on Hadoop Copy Files. The step's editor dialog will appear.
Click on the Browse button next to the File/Folder textbox. The Open File dialog appears; choose the file you have downloaded from step 1 and click on OK to close the dialog.
Remove the gz: prefix and exclamation mark symbol (!) suffix from the filename.
Click on the Browse button next to the File/Folder destination textbox.
Type in your HDFS server IP address and click on the Connect button. It may take a while before a connection is established. Once connected, select the /user/sample folder as the output folder. Do not click on OK at this stage, but rather copy the URL on to the clipboard. Click on Cancel. Paste the clipboard result into the File/Folder destination field.
Click on the Add button to put the filename path into the grid.
Save the job's filename as hdfs_copy.kjb.
Run the job.
The following screenshot shows the HDFS file browser dialog:
The following screenshot shows the local and remote HDFS paths:
Once the job is finished, you can validate whether the file has been successfully copied into HDFS or not by issuing the following command:
hadoop fs -ls /user/sample/
The following screenshot shows the HDFS content after the copy process: