When you work with databases, most of the times you start by writing an SQL statement that gets the data you need. However, there are situations in which you don't know that statement exactly. Maybe the name of the columns to query are in a file, or the name of the columns by which you will sort will come as a parameter from outside the transformation, or the name of the main table to query changes depending on the data stored in it (for example sales2010)
. PDI allows you to have any part of the SQL statement as a variable so you don't need to know the literal SQL statement text at design time.
Assume the following situation: You have a database with data about books and their authors, and you want to generate a file with a list of titles. Whether to retrieve the data ordered by title or by genre is a choice that you want to postpone until the moment you execute the transformation.
You will need a book database with the structure explained in Appendix, Data Structures.
ORDER_COLUMN
, and put title
as its default value.SELECT * FROM books ORDER BY ${ORDER_COLUMN}
ORDER_COLUMN
parameter typing genre
as the new value.You can use Kettle variables in any part of the SELECT
statement inside a Table Input step. When the transformation is initialized, PDI replaces the variables by their values provided that the Replace variables in script? option is checked.
In the recipe, the first time you ran the transformation, Kettle replaced the variable ORDER_COLUMN
with the word title
and the statement executed was:
SELECT * FROM books ORDER BY title
The second time, the variable was replaced by genre
and the executed statement was:
SELECT * FROM books ORDER BY genre
You may use variables not only for the ORDER BY
clause, but in any part of the statement: table names, columns, and so on. You could even hold the full statement in a variable. Note however that you need to be cautious when implementing this.
A wrong assumption about the metadata generated by those predefined statements can make your transformation crash.
You can also use the same variable more than once in the same statement. This is an advantage of using variables as an alternative to question marks when you need to execute parameterized SELECT
statements.