As you must know, a Kettle transformation is a group of linked steps through which data flows. Each step is meant to receive rows of data, process the data somehow, and deliver those rows to the next step or steps. If you agree with this definition, then you must realize that if there are no rows coming to the step, the step will not be executed.
This seems reasonable, but on occasion, it can be a problem. To get an idea of that kind of situation, look at the following scenarios:
lamp
, your file will have all products that contain lamp
in their descriptions. If there are no lamps, you want to generate a file with a single row recording the situation. The problem is that, if there are no lamps, no row will come out of the input step. Consequently the output step, as in the first example, will never be executed.For situations like these, there is a way to overcome the problem: the use of the Detect empty stream step. This recipe shows you how to use it. It implements the last of the examples: The generation of the file with a list of products.
For this recipe, you will need a database with outdoor products with the structure defined in Appendix, Data Structures.
Carry out the following steps:
outdoors
database or create it if it doesn't exist. Then, enter the following statement:SELECT category , id_product , desc_product , price FROM products p ,categories c WHERE p.id_category = c.id_category AND desc_product like '%${PROD_FILTER}%' ORDER by category, desc_product
That's all! Let's test the transformation:
lamp
(or any value that you know is part of the description of some of your products). You do this by typing the value into the grid named Variables. Click on Launch. motorcycle
.When a step doesn't return data, the flow ends. None of the steps that follow that step are executed because they don't receive data for processing. The Detect empty stream step, as the name suggests, detects that situation. As a consequence, it generates a stream with the same metadata as the expected stream, and a single row with null values. This way, you avoid the stream to "die".
In order to understand what the step does in a better way, try the following:
lamp
, and click on Launch. motorcycle
. You will see a single row with the columns category, id_product, desc_product
, and price
, all with null values.In the recipe, in the step that follows the Detect empty stream step, you replaced the null value in the category
column with the message you wanted to write in the file, and sent the data toward the Excel file.
The Excel output step doesn't care if the data came from the main stream or the alternative one that you created for the empty stream. It simply sends the columns to the Excel file.
Finally, it's worth mentioning why we used the UDJE step. The selection of this step is smart because it replaces the value of the category
field. Most steps add new fields, but are not able to manipulate existing ones.
You can use the Detect empty stream step in the same way you would implement error handling. The difference is that here there are no errors; you simply have an exceptional situation.
As you would do when handling errors, you can fix or manipulate the stream and send it back to the main stream, as you did in the recipe, or you could completely ignore the metadata generated by the Detect empty stream step and simply use that step as the beginning of a new independent stream. For example, instead of generating the Excel file when there are no rows, you could write a message to the log, such as criteria doesn't match any product
.