Putting a data file into HDFS

The previous example shows how PDI interacts with Hive using a SQL-like expression.

Now let's work with the framework filesystem, HDFS. We will copy a CSV text file into an HDFS folder. Follow these steps:

  1. Download a compressed CSV sample file from http://goo.gl/EdJwk5.
  2. Create a new job from Spoon.
  3. Put data in the workspace and create a flow between the following steps:
    • From the General grouping, choose START
    • From the Big Data grouping, choose Hadoop Copy Files
    Putting a data file into HDFS
  4. Double-click on Hadoop Copy Files. The step's editor dialog will appear.
  5. Click on the Browse button next to the File/Folder textbox. The Open File dialog appears; choose the file you have downloaded from step 1 and click on OK to close the dialog.
  6. Remove the gz: prefix and exclamation mark symbol (!) suffix from the filename.
  7. Click on the Browse button next to the File/Folder destination textbox.
  8. Type in your HDFS server IP address and click on the Connect button. It may take a while before a connection is established. Once connected, select the /user/sample folder as the output folder. Do not click on OK at this stage, but rather copy the URL on to the clipboard. Click on Cancel. Paste the clipboard result into the File/Folder destination field.
  9. Click on the Add button to put the filename path into the grid.
  10. Save the job's filename as hdfs_copy.kjb.
  11. Run the job.

    The following screenshot shows the HDFS file browser dialog:

    Putting a data file into HDFS

    The following screenshot shows the local and remote HDFS paths:

    Putting a data file into HDFS
  12. Once the job is finished, you can validate whether the file has been successfully copied into HDFS or not by issuing the following command:
    hadoop fs  -ls /user/sample/

    The following screenshot shows the HDFS content after the copy process:

    Putting a data file into HDFS
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset