This recipe guides you through configuring the script for command-line tools so that you can properly manage your execution performance in case of increased memory requirements. Many steps work in memory, so the more memory we reserve to PDI, coherently with the available memory and the overall system requirements, the better it is. A wrong memory configuration leads you to bad performance and/or unexpected OutOfMemory
exception errors.
You will learn how to modify the script files Kitchen or Pan to set new memory requirements. This recipe will work the same for both Kitchen and Pan; the only difference to consider is in the names of the script files.
Remember that in PDI, we have two different sets of scripts to start PDI processes from the command line:
As soon as you get into the PDI home directory, you can edit them depending on the specific operating system environment.
So, let's move on and go to the PDI home directory and start working on this recipe.
To change the memory settings by modifying the script in Windows, perform the following steps:
kitchen.bat
(or pan.bat
) script.if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS=-Xmx512m
PENTAHO_DI_JAVA_OPTIONS
to the required memory value for example, to 1024
:set PENTAHO_DI_JAVA_OPTIONS=-Xmx1024m
To change the memory settings by modifying the script in Linux or Mac, perform the following steps:
kitchen.sh
script (or pan.sh
).if [ -z "$JAVAMAXMEM" ]; then JAVAMAXMEM="512" fi if [ -z "$PENTAHO_DI_JAVA_OPTIONS" ]; then PENTAHO_DI_JAVA_OPTIONS="-Xmx${JAVAMAXMEM}m" fi
JAVAXMEM
to the required memory value for example, to 1024
:JAVAMAXMEM="1024"
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Setting the environment variables in Windows is another, cleaner way to change the memory settings. To do this, you need to execute the steps summarized as follows:
PENTAHO_DI_JAVA_OPTIONS
.–Xmx1024m
.To change the memory settings by setting the environment variables in Linux/Mac, perform the following steps:
.bash_profile
script file. If it does not exist, create a new one.export PENTAHO_DI_JAVA_OPTIONS="-Xmx1024m"
Setting the environment variable is a good way to configure our scripts seamlessly without modifying anything in the standard script. However, we can simplify our life by writing scripts that encapsulate all the internals related to the preparation of the script's execution environment. This lets us run our process without any hassle.
Kettle and Pan are two scripts that start our PDI processes from the command line. This means that they are full of switches that let us configure our PDI job to run properly. However, sometimes starting a job or a transformation is also a matter of preparing an execution environment that could require a bit of effort in terms of technical knowledge as well as a considerable amount of time. We do not usually want our user to be in such a situation. Therefore, to work around this, encapsulate the call to either the Kitchen or the Pan script, and the rest of the things will be taken care of by the custom script that does all of this without any pain.
Let's say we have a PDI job to start in Linux/Mac. We can write a bash script called startMyJob.sh
that starts our job easily by configuring all the settings required to perform the job execution properly as shown in the following code:
#!/bin/bash export PENTAHO_DI_JAVA_OPTIONS="-Xmx3072m -Djava.io.tmpdir=/mnt/tmp" export KETTLE_HOME=/home/ubuntu/pdi_settings /home/ubuntu/pentaho/data-integration/kitchen.sh -file=/home/ubuntu/pentaho/etl/run_load.kjb -level=Basic -param:SKIP_FTP=false -param:SKIP_DIMENSIONS=true "run_model"
As you can see, the code prepares the execution environment by setting the following:
KETTLE_HOME
variableFinally, it starts the PDI job. You can see how simple it is to start our job using this script instead of spending a lot of time on manual settings every time!