This recipe guides you through managing the PDI execution log in terms of the following aspects:
This recipe will work the same for both Kitchen and Pan; the only difference is in the name of the script's file used to start the process.
To get ready for this recipe, you need to check that the JAVA_HOME
environment variable is properly set and then configure your environment variables so that the Kitchen script can start from anywhere without specifying the complete path to your PDI home directory. For details about these checks, refer to the recipe Executing PDI jobs from a filesystem (Simple).
For changing the log's verbosity level, perform the following steps:
<book_samples>/sample1
directory.–level
argument specified for Linux/Mac as follows:–level: <logging_level>
And for Windows, the argument specified is as follows:
/level: <logging_level>
–level
argument lets you specify the desired logging level by choosing its value from a set of seven possible values specified as follows:Error
: This level is intended only to show errorsNothing
: This means the argument isn't showing any outputMinimal
: This level uses minimal logging and provides a low verbosity on your log outputBasic
: This is the default basic logging levelDetailed
: This is intended to be used as soon as you require a detailed logging outputDebug
: This is used for debugging purposes for a very detailed outputRowlevel
: The maximum amount of verbosity; logging at a row level can generate a lot of data$ kitchen.sh –file:/home/sramazzina/tmp/samples/export-job.kjb –level:Error
C: empsamples>Kitchen.bat /file C: empsamplesexport-job.kjb /level:Error
For saving an ETL process log to output files for future reference, use the following steps:
logfile
argument. This argument lets you specify the complete path to the logfile name.–logfile: <complete_logfilename>
/logfile: <complete_logfilename>
export-job.kjb
Kettle job, and we want a Debug log level and to save the output to a specified logfile called pdilog_debug_output.log
. To do this on Linux/Mac, type the following command:$ kitchen.sh –file:/home/sramazzina/tmp/samples/export-job.kjb –level:Debug –logfile:./pdilog_debug_output.log
C: empsamples>Kitchen.bat /file:C: empsamplesexport-job.kjb /level:Debug /logfile:.pdiloc_debug_output.log
maxloglines
argument; this argument lets you specify the maximum number of log lines.–maxloglines: <number_of_log_lines_to_keep>
/maxloglines: <number_of_log_lines_to_keep>
0
as the value for this argument, PDI will maintain all of the log lines produced.maxlogtimeout
argument lets you specify the maximum age of a log line in minutes before it is removed by the log buffer.–maxlogtimeout: <age_of_a_logline_in_minutes>
/maxlogtimeout: <age_of_a_logline_in_minutes>
0
as the value for this argument, PDI will maintain all the log lines indefinitely.export-job.kjb
Kettle job and that we want to keep 10000 rows in our log buffer. In this case, the command we need to use in Linux/Mac is as follows:$ kitchen.sh –file:/home/sramazzina/tmp/samples/export-job.kjb –level:Debug –logfile:./pdilog_debug_output.log –maxloglines:10000
The log is an invaluable source of information that is useful to understand what and where something does not work. This will be the topic covered by the first paragraph of this section. Then, we will see a brief example that helps us produce logfiles with a parametric name.
The log that our ETL process produces contains a valuable set of information that helps us understand where our process fails. The first case of failure is given by a system exception. In this case, it is very easy to identify why our process fails because the exception message is clearly identifiable in the logfile. As an example, let's suppose that we're starting our job from a wrong directory or that our job file is not found in the path we're giving; we will find a detailed exception message in the log as follows:
INFO 17-03 22:15:40,312 - Kitchen - Start of run. ERROR: Kitchen can't continue because the job couldn't be loaded.
However, a very different thing is when our process does not explicitly fail because of an exception, but the results are different from what is expected. It could be that we expected 1000 rows to be written to our file, but only 900 were written. Therefore, what can we do to understand what is going wrong? A simple but effective way to try to understand what goes wrong is to analyze an important part of our log that summarizes what happened for each of our tasks. Let's consider the following section taken from the log of one of our sample processes:
INFO 17-03 22:31:54,712 - Read customers from file - Finished processing (I=123, O=0, R=0, W=122, U=1, E=0) INFO 17-03 22:31:54,720 - Get parameters - Finished processing (I=0, O=0, R=122, W=122, U=0, E=0) INFO 17-03 22:31:54,730 - Filter rows with different countries – Finished processing (I=0, O=0, R=122, W=122, U=0, E=0) INFO 17-03 22:31:54,914 - Write selected country customers – Finished processing (I=0, O=122, R=122, W=122, U=0, E=0)
As you can clearly see, the section that can always be found almost at the end of any transformation called by the job summarizes what happens at the boundaries (input and output) of every step of our transformation. Keeping an eye on this log fragment is a key point in understanding where our business rules are failing and where we are getting lesser records than expected. On the other hand, remember that because jobs are mainly orchestrators, they do not contain any data, so there is no need for such a log section for them.
It could be interesting to separate our execution logs by appending the execution date and time to the logfile name. To do this, the simplest thing we can do is to wrap our Kitchen script in another script used to set the proper logfile name using some shell functions.
As an example, I wrote a sample script for Linux/Mac that starts PDI jobs by filename and writes a logfile whose name is made up of a base name, a text string containing the date and time when the job was submitted, and the extension (conventionally .log
). Have a look at the following code:
now=$(date +"%Y%m%d_%H%M%s") kitchen.sh -file:$1.kjb -logfile:$2_$now.log
As you can see, it's a fairly trivial script; the first line builds a string composed by year, month, name, hour, minutes, and seconds, and the second concatenates that string with a logfile's base name and extension. To start the script, use the following syntax:
startKitchen.sh <job_to_start_base_name> <logfile_base_name>
So, for example, the following command starts Kitchen by calling the job export-job.kjb
and produces a logfile called logfile_20130319_13261363696006.log
:
$./startKitchen.sh ./export-job ./logfile
You can do a similar batch for the Windows platform.