Executing steps even when your stream is empty

As you must know, a Kettle transformation is a group of linked steps through which data flows. Each step is meant to receive rows of data, process the data somehow, and deliver those rows to the next step or steps. If you agree with this definition, then you must realize that if there are no rows coming to the step, the step will not be executed.

This seems reasonable, but on occasion, it can be a problem. To get an idea of that kind of situation, look at the following scenarios:

  • You have a very simple transformation that reads a file, does some calculations, and finally updates a table with the system date and the number of processed rows. If the file doesn't exist or if it is empty, then no rows will go out from the file input step. Consequently and contrary to what you need to do, the step that updates the table will never be executed.
  • You need to set some variables with values that are supposed to be in a file. If the file exists and has the values, you are able to do it. If not, the step that sets the variables will not be executed. It would be good if it sets the variables with at least some default values.
  • You have a database with products and want to generate a list of products whose descriptions match a given text. For example, if the text is lamp, your file will have all products that contain lamp in their descriptions. If there are no lamps, you want to generate a file with a single row recording the situation. The problem is that, if there are no lamps, no row will come out of the input step. Consequently the output step, as in the first example, will never be executed.

For situations like these, there is a way to overcome the problem: the use of the Detect empty stream step. This recipe shows you how to use it. It implements the last of the examples: The generation of the file with a list of products.

Getting ready

For this recipe, you will need a database with outdoor products with the structure defined in Appendix, Data Structures.

How to do it...

Carry out the following steps:

  1. Create a transformation and drag a Table Input step.
  2. Double-click on the step and select the connection to the outdoors database or create it if it doesn't exist. Then, enter the following statement:
    SELECT
    category
    , id_product
    , desc_product
    , price
    FROM products p
    ,categories c
    WHERE p.id_category = c.id_category
    AND desc_product like '%${PROD_FILTER}%'
    ORDER by category, desc_product
    
  3. Check the Replace variables in script? option.
  4. Add an Excel output step. Configure the step to generate a file with all fields coming from the Table Input step.
  5. From the Flow category, add a Detect empty stream step. Also, add a User Defined Java Expression or UDJE step, and link all steps as follows:
    How to do it...
  6. Use the UDJE step and fill it in, as shown in the following screenshot:
How to do it...

That's all! Let's test the transformation:

  1. Press F9 to run it; give the PROD_FILTER variable the value lamp (or any value that you know is part of the description of some of your products). You do this by typing the value into the grid named Variables. Click on Launch.
  2. Open the generated file. It should look like the one shown in the following screenshot:
    How to do it...
  3. Run the transformation again, but this time, type a value that you know isn't part of the descriptions of your products, for example motorcycle.
  4. Open the file. This time it should have the content as shown in the following screenshot:
How to do it...

How it works...

When a step doesn't return data, the flow ends. None of the steps that follow that step are executed because they don't receive data for processing. The Detect empty stream step, as the name suggests, detects that situation. As a consequence, it generates a stream with the same metadata as the expected stream, and a single row with null values. This way, you avoid the stream to "die".

In order to understand what the step does in a better way, try the following:

  1. In the transformation that you just created, select the Detect empty stream step.
  2. Press F9 to do a preview, give to the variable PROD_FILTER the value lamp, and click on Launch.
  3. You will see a message informing you that there are no rows to preview. That's because the main stream had rows and they went toward the Excel step.
  4. Try the same procedure again, but this time, enter an invalid value, for example, motorcycle. You will see a single row with the columns category, id_product, desc_product, and price, all with null values.

In the recipe, in the step that follows the Detect empty stream step, you replaced the null value in the category column with the message you wanted to write in the file, and sent the data toward the Excel file.

The Excel output step doesn't care if the data came from the main stream or the alternative one that you created for the empty stream. It simply sends the columns to the Excel file.

Finally, it's worth mentioning why we used the UDJE step. The selection of this step is smart because it replaces the value of the category field. Most steps add new fields, but are not able to manipulate existing ones.

There's more...

You can use the Detect empty stream step in the same way you would implement error handling. The difference is that here there are no errors; you simply have an exceptional situation.

As you would do when handling errors, you can fix or manipulate the stream and send it back to the main stream, as you did in the recipe, or you could completely ignore the metadata generated by the Detect empty stream step and simply use that step as the beginning of a new independent stream. For example, instead of generating the Excel file when there are no rows, you could write a message to the log, such as criteria doesn't match any product.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset