Getting files from a remote server

When you need to copy files from or to remote machines, you can use the standard network protocol File Transfer Protocol (FTP) built on client-server architecture.

Kettle provides the Get a file with FTP job entry to get files from an FTP server. In the example, you will connect to a remote directory named remoteDir on an FTP server and copy some text files from that server to a local folder named destinationDir.

Getting ready

You need access to an FTP server.

How to do it...

Carry out the following steps:

  1. Create a new job and drop a Start entry into the canvas.
  2. Add a Get a file with FTP job entry from the File transfer category.
  3. Under the General tab, type the server name or its IP address in the FTP server name / IP address textbox.
  4. Type the port number in the Server port textbox. Usually, it is the port 21.
  5. In the Username and Password textboxes, type the credentials to log into the FTP server.

    Tip

    You can verify the connection information by clicking on the Test connection button.

  6. In the Remote directory textbox under the Files tab, you must type the name of the remote directory on the FTP server from where the source files will be retrieved.

    Tip

    You can check if folder exists by clicking on the Check folder button.

  7. Type .*.txt as the Wildcard.
  8. In the Target directory textbox inside the Local frame, type the destination directory on the local machine. Under the Files tab, you have various fields as shown in the following screenshot:
    How to do it...
  9. Run the job. The files with .txt extension will be copied from remoteDir on the FTP server to destinationDir on the local machine.

How it works...

The Get a file with FTP job entry performs the copy task; it uses the configuration set under the General tab to connect to the remote FTP server.

Under the Files tab, you defined the source directory (in the example, the remote folder remoteDir) and target directory (in the example, the local folder destinationDir).

Tip

Try to avoid the use of directories with special characters, such as spaces. Some FTP servers don't allow these special characters.

You also provided a regular expression for the files to get. In this case, you typed .*.txt which is a regular expression representing all .txt files.

There's more...

The following sections give you some additional information and useful tips when it's time to transfer files from a remote server.

Specifying files to transfer

In the recipe, you copied all files with a given extension; you did it by providing a regular expression that all those files matched. As another possibility, you may need to transfer a single file.

Note

Note that even if you have the exact name of the file, you still have to provide a regular expression.

For example, if the name of the file is my_file.txt you have to type my_file.txt.

As a last possibility, instead of typing a wildcard, you may provide a Kettle variable name. Using a variable is particularly useful if you don't know the name of the file beforehand. Suppose that you have to get a file named daily_update_yyyyMMdd.csv where yyyyMMdd represents year, month, and day. In that case, you can create a transformation that builds a regular expression representing that file name (for example, daily_update_20101215.csv) and sets a variable with that value. In the job, you should execute that transformation before the Get a file with FTP job entry.

Your job would look like the one shown in the following screenshot:

Specifying files to transfer

Finally, in the Get a file with FTP entry, you should type that variable (for example, ${DAILY_FILENAME}) as the wildcard.

Some considerations about connecting to an FTP server

In order to be able to connect to an FTP server, you must complete the connection settings for the FTP server under the General tab of the Get a file with FTP job entry. If you are working with an anonymous FTP server, you can use anonymous as the username and free text as the password. This means that you can access the machine without having to have an account on that machine.

If you need to provide authentication credentials for access via a proxy, you must also complete the following textboxes: Proxy host, Proxy port, Proxy username, and Proxy password.

Access via SFTP

SFTP means SSH File Transfer Protocol. It's a network protocol used to secure the file transfer capability. With Kettle, you can get files from an SFTP server by using the Get a file with SFTP job entry. To configure this entry, you have to enter the name or IP of the SFTP server in the SFTP server name / IP textbox. The rest of the configuration of the General and Files tabs is pretty similar to the Get a file with FTP entry.

For more information on SFTP, you can visit the following URL:

http://en.wikipedia.org/wiki/SSH_file_transfer_protocol

Access via FTPS

An FTPS server extends the standard FTP protocol, adding cryptographic protocols, such as the Transport Layer Security (TLS) and the Secure Sockets Layer (SSL). You can use the Get a file with FTPS job entry to get files from an FTPS server. To configure this entry, you have to enter the name or IP address of the FTPS server in the FTPS server name / IP address: textbox. The rest of the configuration of the General and Files tabs is pretty similar to the Get a file with FTP entry.

More information about FTPS can be obtained from the following URL:

http://en.wikipedia.org/wiki/Ftps

Getting information about the files being transferred

A drawback when accessing an FTP server is that, from the job, you can only know if the entry succeeded or failed; you don't have control over how files behave, for example, how many files were transferred. To overcome this situation, it is recommended that you keep the log generated by the job, which is the only source of information about what happened. To see the details, you can simply take a look at the log, or parse it in a subsequent Kettle transformation.

See also

The recipe named Putting files on a remote server in this chapter. See this recipe if instead of getting files you have to transfer files to a remote server.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset