If you're used to working with databases, one of your main objectives while working with PDI must be getting data from your databases for transforming, loading in other databases, generating reports, and so on. Whatever operation you intend to achieve, the first thing you have to do after connecting to the database, is to get that data and create a PDI dataset. In this recipe you will learn the simplest way to do that.
The Table Input step you used in the recipe is the main PDI step to get data from a database. When you run or preview the transformation, Kettle executes the SQL and pushes the rows of data coming from the database into the output stream of the step. Each column of the SQL statement leads to a PDI field and each row generated by the execution of the statement becomes a row in the PDI dataset.
Once you get the data from the database, it will be available for any kind of manipulation inside the transformation.
In order to save time, or in case you are not sure of the name of the tables or columns in the database, instead of typing the SQL statement press the Get SQL select statement... button. This will bring the Database Explorer window. This window allows you to explore the selected database. By expanding the database tree and selecting the table that interests you, you will be able to explore that table through the different options available under the Actions menu as shown below:
Double-clicking the name of the table will generate a SELECT
statement to query that table. You will have the chance to include all the field names in the statement, or simply generate a SELECT *
statement. After bringing the SQL to the Table Input configuration window, you will be able to modify it according to your needs.