When you need to copy files from or to remote machines, you can use the standard network protocol File Transfer Protocol (FTP) built on client-server architecture.
Kettle provides the Get a file with FTP job entry to get files from an FTP server. In the example, you will connect to a remote directory named remoteDir
on an FTP server and copy some text files from that server to a local folder named destinationDir
.
Carry out the following steps:
21
. .*.txt
as the Wildcard. .txt
extension will be copied from remoteDir
on the FTP server to destinationDir
on the local machine.The Get a file with FTP job entry performs the copy task; it uses the configuration set under the General tab to connect to the remote FTP server.
Under the Files tab, you defined the source directory (in the example, the remote folder remoteDir)
and target directory (in the example, the local folder destinationDir)
.
Try to avoid the use of directories with special characters, such as spaces. Some FTP servers don't allow these special characters.
You also provided a regular expression for the files to get. In this case, you typed .*.txt
which is a regular expression representing all .txt
files.
The following sections give you some additional information and useful tips when it's time to transfer files from a remote server.
In the recipe, you copied all files with a given extension; you did it by providing a regular expression that all those files matched. As another possibility, you may need to transfer a single file.
Note that even if you have the exact name of the file, you still have to provide a regular expression.
For example, if the name of the file is my_file.txt
you have to type my_file.txt
.
As a last possibility, instead of typing a wildcard, you may provide a Kettle variable name. Using a variable is particularly useful if you don't know the name of the file beforehand. Suppose that you have to get a file named daily_update_yyyyMMdd.csv
where yyyyMMdd
represents year, month, and day. In that case, you can create a transformation that builds a regular expression representing that file name (for example, daily_update_20101215.csv)
and sets a variable with that value. In the job, you should execute that transformation before the Get a file with FTP job entry.
Your job would look like the one shown in the following screenshot:
Finally, in the Get a file with FTP entry, you should type that variable (for example, ${DAILY_FILENAME})
as the wildcard.
In order to be able to connect to an FTP server, you must complete the connection settings for the FTP server under the General tab of the Get a file with FTP job entry. If you are working with an anonymous FTP server, you can use anonymous
as the username and free text as the password. This means that you can access the machine without having to have an account on that machine.
If you need to provide authentication credentials for access via a proxy, you must also complete the following textboxes: Proxy host, Proxy port, Proxy username
, and Proxy password
.
SFTP means SSH File Transfer Protocol. It's a network protocol used to secure the file transfer capability. With Kettle, you can get files from an SFTP server by using the Get a file with SFTP job entry. To configure this entry, you have to enter the name or IP of the SFTP server in the SFTP server name / IP textbox. The rest of the configuration of the General and Files tabs is pretty similar to the Get a file with FTP entry.
For more information on SFTP, you can visit the following URL:
An FTPS server extends the standard FTP protocol, adding cryptographic protocols, such as the Transport Layer Security (TLS) and the Secure Sockets Layer (SSL). You can use the Get a file with FTPS job entry to get files from an FTPS server. To configure this entry, you have to enter the name or IP address of the FTPS server in the FTPS server name / IP address: textbox. The rest of the configuration of the General and Files tabs is pretty similar to the Get a file with FTP entry.
More information about FTPS can be obtained from the following URL:
A drawback when accessing an FTP server is that, from the job, you can only know if the entry succeeded or failed; you don't have control over how files behave, for example, how many files were transferred. To overcome this situation, it is recommended that you keep the log generated by the job, which is the only source of information about what happened. To see the details, you can simply take a look at the log, or parse it in a subsequent Kettle transformation.