The Copy Files job entry allows you to copy one or more files or folders. Let's see this step in action. Assume that you have a folder with a set of files, and you want to copy them to three folders depending on their extensions: you have one folder for text files, another for Excel files, and the last one for the rest of the files.
You will need a directory named sampleFiles
containing a set of files with different extensions, including .txt and .xls. You will also need three destination directories, named txtFiles, xlsFiles
and OtherFiles
.
Carry out the following steps:
sampleFiles
folder. In the File/Folder destination, type or browse for the txtFiles
folder. Also, type .*.txt
in the Wildcard (regExp) textbox. Click on the Add button. sampleFiles
folder. In the File/Folder destination, type or browse for the xlsFiles
folder. Also, type .*.xls
in the Wildcard (regExp) textbox. Click on the Add button. sampleFiles
folder. In the File/Folder destination, type or browse for the OtherFiles
folder. Also, type .+(?<!(txt|xls))$
in the Wildcard (regExp) textbox. Click on the Add button. sampleFiles
folder will be copied into the folder associated in the setting window, depending on its extension.You use the Copy Files job entry to perform the task of copying files. As you can see in the recipe, you can execute several copy instructions with a single job entry by entering different lines in the Files/Folders section from the General tab.
In the sample grid, you have three lines. For each line, the objective is to copy all the files from the source folder (first column) to the destination folder (second column) that match the regular expression (third column).
The first and second line copy the .txt
and .xls
files by using the regular expressions.*.txt and .*.xls
respectively.
The third line copies the rest of the files. The regular expression that matches those files is a little more complex: The characters ?<!
represent a negation over the rest of the expression, so the expression .+(?<!(txt|xls))$
means all files whose extension is neither .txt
nor .xls
.
The recipe showed you the basics of copying files with Kettle. The following sections explain how to add more functionality, for example, validating the existence of files or folders before copying. You will also see the extra settings available for the Copy Files job entry.
You can move the file (instead of copying) by checking the Remove source files checkbox in the Settings section under the General tab in the Copy Files job entry. If you check it, Kettle will delete the files after a successful copy. This is analogous to using a Delete file job entry right after the Copy Files entry.
In the recipe, you simply wanted to organize some files in folders, and you didn't care if the files existed or not. However, the most common scenario is the one in which it's assumed that the files to copy or move already exist. You cannot perform that verification with the Copy Files entry, but there are other means.
Suppose that you want the files to be copied only if there is a mixture of file extensions. If there are only Excel files, or text files, they will not be copied and the situation will be recorded in a log.
In order to do that, you can create a transformation that succeeds if there is a mixture of files, or fails if you have only Excel files or only text files.
The transformation should start with a Get File Names to get the list of files in the folder, and proceed differently according to the validations you want to do.
Then, in your job, you call the transformation before copying the files. The copy will be done only after the success of the transformation, as shown in the following diagram:
In the simplest case where you have to copy files specified by their exact name—that is, not expressed with regular expressions—you can verify their existence simply with a File Exists (for a single file) or a Checks if files exist (for multiple files) entry.
You can create the destination directory automatically by selecting the Create destination folder checkbox in the Settings section under the General tab in the Copy Files job entry. You could also create those directories by using a Create a folder job entry from the File management category. The difference is that, with the Create a folder entry, you can detect if the directory already exists; if you didn't expect that situation, you can act accordingly by, for example, aborting the job.