In this chapter, we will cover:
The main purpose of Kettle transformations is to manipulate data in the form of a dataset; this task is done by the steps of the transformation.
When a transformation is launched, all its steps are started. During the execution, the steps work simultaneously reading rows from the incoming hops, processing them, and delivering them to the outgoing hops. When there are no more rows left, the execution of the transformation ends.
The dataset that flows from step to step is not more than a set of rows all having the same structure or metadata. This means that all rows have the same number of columns, and the columns in all rows have the same type and name.
Suppose that you have a single stream of data and that you apply the same transformations to all rows, that is, you have all steps connected in a row one after the other. In other words, you have the simplest of the transformations from the point of view of its structure. In this case, you don't have to worry much about the structure of your data stream, nor the origin or destination of the rows. The interesting part comes when you face other situations, for example:
With Kettle, you can actually do this, but you have to be careful because it's easy to end up doing wrong things and getting unexpected results or even worse: undesirable errors.
With regard to the first example, it doesn't represent a default behavior due to the parallel nature of the transformations as explained earlier. There are two steps however, that might help, which are as follows:
Both these steps are in the Flow category.
You will find examples of the use of the last of these steps in the following recipes:
This chapter focuses on the other two examples and some similar use cases, by explaining the different ways for combining, splitting, or manipulating streams of data.