In this chapter, we will cover:
Kettle, also known as PDI, is mostly used as a stand-alone application. However, it is not an isolated tool, but part of the Pentaho Business Intelligence Suite. As such, it can also interact with other components of the suite; for example, as the datasource for a report, or as part of a bigger process. This chapter shows you how to run Kettle jobs and transformations in that context.
The chapter assumes a basic knowledge of the Pentaho BI platform and the tools that made up the Pentaho Suite. If you are not familiar with these tools, it is recommended that you visit the wiki page (wiki.pentaho.com) or the Pentaho BI Suite Community Edition (CE) site: http://community.pentaho.com/.
As another option, you can get the Pentaho Solutions book (Wiley) by Roland Bouman and Jos van Dongen that gives you a good introduction to the whole suite.
The different recipes in this chapter show you how to run Kettle transformations and jobs integrated with several components of the Pentaho BI suite. In order to focus on the integration itself rather than on Kettle development, we have created a sample transformation named weather.ktr
that will be used through the different recipes.
The transformation receives the name of a city as the first parameter from the command line, for example Madrid, Spain
. Then, it consumes a web service to get the current weather conditions and the forecast for the next five days for that city. The transformation has a couple of named parameters:
Name |
Purpose |
Default |
---|---|---|
|
Scale for the temperature to be returned. It can be |
|
|
Scale for the wind speed to be returned. It can be |
|
The following diagram shows what the transformation looks like:
It receives the command-line argument and the named parameters, calls the service, and retrieves the information in the desired scales for temperature and wind speed.
You can download the transformation from the book's site and test it. Do a preview on the next_days, current_conditions, and current_conditions_normalized steps to see what the results look like.
The following is a sample preview of the next_days step:
The following is a sample preview of the current_conditions step:
Finally, the following screenshot shows you a sample preview of the current_conditions_normalized step:
For details about the web service and understanding the results, you can take a look at the recipe named Specifying fields by using XPath notation (Chapter 2, Reading and Writing Files)
There is also another transformation named weather_np.ktr
. This transformation does exactly the same, but it reads the city as a named parameter instead of reading it from the command line. The Getting ready sections of each recipe will tell you which of these transformations will be used.
Avoiding consuming the web service
It may happen that you do not want to consume the web service (for example, for delay reasons), or you cannot do it (for example, if you do not have Internet access). Besides, if you call a free web service like this too often, then your IP might be banned from the service. Don't worry. Along with the sample transformations on the book's site, you will find another version of the transformations that instead of using the web service, reads sample fictional data from a file containing the forecast for over 250 cities. The transformations are weather (file version).ktr
and weather_np (file version).ktr
. Feel free to use these transformations instead. You should not have any trouble as the parameters and the metadata of the data retrieved are exactly the same as in the transformations explained earlier.