Copying or moving one or more files

The Copy Files job entry allows you to copy one or more files or folders. Let's see this step in action. Assume that you have a folder with a set of files, and you want to copy them to three folders depending on their extensions: you have one folder for text files, another for Excel files, and the last one for the rest of the files.

Getting ready

You will need a directory named sampleFiles containing a set of files with different extensions, including .txt and .xls. You will also need three destination directories, named txtFiles, xlsFiles and OtherFiles.

How to do it...

Carry out the following steps:

  1. Create a new job and drop a Start job entry into the canvas.
  2. Add a Copy Files job entry. In this entry, you will add the directions for copying the files into the three available destination folders. Double-click on the entry to open it.
  3. In the File/Folder source textbox, type or browse for the sampleFiles folder. In the File/Folder destination, type or browse for the txtFiles folder. Also, type .*.txt in the Wildcard (regExp) textbox. Click on the Add button.
  4. In the File/Folder source textbox, type or browse for the sampleFiles folder. In the File/Folder destination, type or browse for the xlsFiles folder. Also, type .*.xls in the Wildcard (regExp) textbox. Click on the Add button.
  5. In the File/Folder source textbox, type or browse for the sampleFiles folder. In the File/Folder destination, type or browse for the OtherFiles folder. Also, type .+(?<!(txt|xls))$ in the Wildcard (regExp) textbox. Click on the Add button.
  6. Assuming that all folders are inside the directory where you have your job, the Files/Folders grid will look like the following screenshot:
    How to do it...

    Note

    Remember that Internal.Job.Filename.Directory is a predefined Kettle variable whose value is the full directory where the job is saved.

  7. When you run the job, each file from the sampleFiles folder will be copied into the folder associated in the setting window, depending on its extension.

How it works...

You use the Copy Files job entry to perform the task of copying files. As you can see in the recipe, you can execute several copy instructions with a single job entry by entering different lines in the Files/Folders section from the General tab.

In the sample grid, you have three lines. For each line, the objective is to copy all the files from the source folder (first column) to the destination folder (second column) that match the regular expression (third column).

The first and second line copy the .txt and .xls files by using the regular expressions.*.txt and .*.xls respectively.

The third line copies the rest of the files. The regular expression that matches those files is a little more complex: The characters ?<! represent a negation over the rest of the expression, so the expression .+(?<!(txt|xls))$ means all files whose extension is neither .txt nor .xls.

There's more...

The recipe showed you the basics of copying files with Kettle. The following sections explain how to add more functionality, for example, validating the existence of files or folders before copying. You will also see the extra settings available for the Copy Files job entry.

Moving files

You can move the file (instead of copying) by checking the Remove source files checkbox in the Settings section under the General tab in the Copy Files job entry. If you check it, Kettle will delete the files after a successful copy. This is analogous to using a Delete file job entry right after the Copy Files entry.

Detecting the existence of the files before copying them

In the recipe, you simply wanted to organize some files in folders, and you didn't care if the files existed or not. However, the most common scenario is the one in which it's assumed that the files to copy or move already exist. You cannot perform that verification with the Copy Files entry, but there are other means.

Suppose that you want the files to be copied only if there is a mixture of file extensions. If there are only Excel files, or text files, they will not be copied and the situation will be recorded in a log.

In order to do that, you can create a transformation that succeeds if there is a mixture of files, or fails if you have only Excel files or only text files.

Tip

The transformation should start with a Get File Names to get the list of files in the folder, and proceed differently according to the validations you want to do.

Then, in your job, you call the transformation before copying the files. The copy will be done only after the success of the transformation, as shown in the following diagram:

Detecting the existence of the files before copying them

In the simplest case where you have to copy files specified by their exact name—that is, not expressed with regular expressions—you can verify their existence simply with a File Exists (for a single file) or a Checks if files exist (for multiple files) entry.

Creating folders

You can create the destination directory automatically by selecting the Create destination folder checkbox in the Settings section under the General tab in the Copy Files job entry. You could also create those directories by using a Create a folder job entry from the File management category. The difference is that, with the Create a folder entry, you can detect if the directory already exists; if you didn't expect that situation, you can act accordingly by, for example, aborting the job.

See also

The recipe named Copying or moving a custom list of files in this chapter. See this recipe if you have a set of files for copying, but the list cannot be specified with a regular expression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset