Providing the name of a file (for reading or writing) dynamically

Sometimes, you don't have the complete name of the file that you intend to read or write in your transformation. That can be because the name of the file depends on a field or on external information. Suppose you receive a text file with information about new books to process. This file is sent to you on a daily basis and the date is part of its name (for example, newBooks_20100927.txt).

Getting ready

In order to follow this recipe, you must have a text file named newBooks_20100927.txt with sample book information such as the following:

"Title","Author","Price","Genre"
"The Da Vinci Code","Dan Brown","25.00","Fiction"
"Breaking Dawn","Stephenie Meyer","21.00","Children"
"Foundation","Isaac Asimov","38.50","Fiction"
"I, Robot","Isaac Asimov","39.99","Fiction"

How to do it...

Carry out the following steps:

  1. Create a new transformation.
  2. Drop a Get System Info step from the Input category into the canvas. Add a new field named today, and in the Type listbox, select System date (variable).
  3. From the Transform category, add a Selected values step, in order to give the date the desired format. Click on the Meta-data tab and fill in the first row as follows:
    • As Fieldname, type or select today
    • As Type select String
    • As Format type yyyyMMdd.

    Note

    In the recipe, the file is saved in the same directory as the transformation. In order to get this directory, you have to get it as a field in your dataset. That's the purpose of the next step.

  4. Add the Get Variables step from the Job category. In the grid, add a new field named path. In the Variable column, press Ctrl+Space in order to show the list of possible variables, and select Internal.Transformation.Filename.Directory.
  5. From the Scripting category, add a User Defined Java Expression (UDJE) step from now on.
  6. In the step setting window, add a field named filename (type it in the New field column), and type path + "/newBooks_" + today +".txt" in the Java Expression column. Previewing this step, you will obtain the complete path for the file, for example, file:///C:/myDocuments/newBooks_20100927.txt.

    Note

    The recipe uses the UDJE for its simplicity and performance. However, you can obtain this calculated field in other ways, for example, using the Calculator step from the Transform category or the Formula step from the Scripting category.

  7. Now that you have the filename, let's read the file. Add a Text file input step. Your transformation should look like the one shown in the following diagram (except possibly for the step names):
    How to do it...
  8. Double-click on the step. Under the File tab, go to the bottom section and check on the Accept filenames from previous step checkbox.
  9. In the Step to read filenames from textbox, type or select the name of the UDJE step created earlier.
  10. In the Field in input to use as filename textbox, type filename.
  11. Select the Content tab. Type, in the Separator, and set the header to 1 line.
  12. Under the Fields tab, add the following Names and Types: Title (String), Author (String), Price (Number), Genre (String).

    Tip

    You can't use the Get Fields button in this case because the name of the file will be set dynamically. In order to obtain the headers automatically, you can fill the File tab with the name of a sample file. Then, clicking on the Get Fields button, the grid will be populated. Finally, you must remove the sample file from the File tab and set the Accept filenames from previous step section again.

  13. Running the transformation, you will obtain a datasource with the text file information whose name was resolved dynamically.

How it works...

When you have to read a file and the filename is known only at the moment you run the transformation, you cannot set the filename explicitly in the grid located under the File tab of the Input step. However, there is a way to provide the name of the file.

First, you have to create a field with the name of the file including its complete path.

Once you have that field, the only thing to do is to configure the Accept filenames from previous step section of the Input step specifying the step from which that field comes and the name of the field.

In the recipe, you didn't know the complete name because part of the name was the system date, as for example, C:/myDocuments/newBooks_20100927.txt. In order to build a field with that name, you did the following:

  • Getting the date of today (Get System Info step)
  • Formatting this date as yyyyMMdd (Selected values step)
  • Getting the path where the file were located (Get Variables step)
  • Concatenating the path and the formatted date (UDJE step) generating the final field named filename

These steps are among the most used for these situations. However, the steps and the way of building the field will depend on your particular case.

In the recipe, you used a Text File Input step, but the same applies for other Input steps: Excel Input, Property Input, and so on.

It may happen that you want to read a file with a CSV file input step, but notice that it doesn't have the option of accepting the name of the file from a previous step. Don't worry! If you create a hop from any step toward this step, the textbox named The filename field (data from previous steps) will magically show up, allowing the name to be provided dynamically.

This method for providing the name of the file also applies when you write a file by using a Text file output step.

There's more...

What follows is a little background about the Get System Info step used in the recipe. After that, you will see how the Accept file name from field? feature can be used in the generation of files.

Get System Info

You can use the Get System Info step to retrieve information from the PDI environment. In the recipe, it was used to get the system date, but you can use it for bringing and adding to the dataset other environmental information, for example, the arguments from the command line, the transformation's name, and so on.

You can get further information about this step at the following URL:

http://wiki.pentaho.com/display/EAI/Get+System+Info

Generating several files simultaneously with the same structure, but different names

Let's assume that you want to write files with book information, but a different file for each genre. For example, a file named fiction.txt with all the fiction books, another file named children.txt with the children books, and so on. To do this, you must create the name of the file dynamically as shown in the recipe. In this case, supposing that your dataset has a field with the genre of the book, you could create a Java Expression that concatenates the path, the field that has the genre, and the string .txt. Then, in the Text file output step, you should check the checkbox named Accept file name from field? and in the File name field listbox, select the field just created.

Running this transformation will generate different text files with book's information; one file for each genre.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset