Chapter 1. Working with Databases

In this chapter, we will cover:

  • Connecting to a database
  • Getting data from a database
  • Getting data from a database by providing parameters
  • Getting data from a database by running a query built at runtime
  • Inserting or updating rows in a table
  • Inserting new rows when a simple primary key has to be generated
  • Inserting new rows when the primary key has to be generated based on stored values
  • Deleting data from a table
  • Creating or altering a table from PDI (design time)
  • Creating or altering a table from PDI (runtime)
  • Inserting, deleting, or updating a table depending on a field
  • Changing the database connection at runtime
  • Loading a parent-child table

Introduction

Databases are broadly used by organizations to store and administer transactional data such as customer service history, bank transactions, purchases and sales, and so on. They also constitute the storage method for data warehouses, the repositories used in Business Intelligence solutions.

In this chapter, you will learn to deal with databases in Kettle. The first recipe tells you how to connect to a database, which is a prerequisite for all the other recipes. The rest of the chapter teaches you how to perform different operations and can be read in any order according to your needs.

Note

The focus of this chapter is on relational databases (RDBMS). Thus the term database is used as a synonym for relational databases through the recipes.

Sample databases

Through the chapter you will use a couple of sample databases. Those databases can be created and loaded by running the scripts available at the book's website. The scripts are ready to run under MySQL.

Note

If you work with a different DBMS you may have to modify the scripts slightly.

For more information about the structure of the sample databases and the meaning of the tables and fields, please refer to Appendix, Data Structures. Feel free to adapt the recipes to different databases. You could try some well known databases; for example Foodmart (available as part of the Mondrian distribution at http://sourceforge.net/projects/mondrian/) or the MySQL sample databases (available at http://dev.mysql.com/doc/index-other.html).

Pentaho BI platform databases

As part of the sample databases used in this chapter you will use the Pentaho BI platform Demo databases. The Pentaho BI Platform Demo is a pre-configured installation that lets you explore the capabilities of the Pentaho platform. It relies on the following databases:

Database name

Description

hibernate

Administrative information including user authentication and authorization data.

quartz

Repository for Quartz, the scheduler used by Pentaho.

sampledata

Data for Steel Wheels, a fictional company that sells all kind of scale replicas of vehicles.

By default, all those databases are stored in Hypersonic (HSQLDB). The script for creating the databases in HSQLDB can be found at http://sourceforge.net/projects/pentaho/files. Under Business Intelligence Server | 1.7.1-stable look for pentaho_sample_data-1.7.1.zip.

It can also be found at svn://ci.pentaho.com/view/Platform/job/bi-platform-sample-data/.

These databases can be stored in other DBMS as well. Scripts for creating and loading these databases in other popular DBMS as for example MySQL or Oracle can be found in Prashant Raju's blog, at http://www.prashantraju.com/projects/pentaho/.

Beside the scripts, you will find instructions for creating and loading the databases.

Tip

Prashant Raju, an expert Pentaho developer, provides several excellent tutorials related to the Pentaho platform. If you are interested in knowing more about Pentaho, it's worth taking a look at his blog.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset