Preface

Pentaho Data Integration (PDI) is an ETL tool that was born 10 years ago. Its creator, Matt Caster, celebrated the 10th anniversary of this product, originally named Kettle (you can read the celebratory post on Matt's blog at: http://www.ibridge.be/?p=211), this year on March 8th 2013. The term K. E. T. T. L. E. is an acronym that stands for Kettle Extraction Transformation Transport Load Environment. When Pentaho acquired Kettle, its name was changed to Pentaho Data Integration, but actually, many developers continue to call it by the old name: Kettle.

How the story began…

The history of Kettle began in 2001 when Matt Caster, Pentaho Data Integration's chief architect and creator of Kettle, was working as a BI consultant. He had the idea of writing his own ETL tool to have a better and cheaper way to transfer data from one place to another. He was looking for a different solution, something that was better than inventing ugly data warehouse solutions written in PL/SQL, VB, or Shell scripts. He spent two years doing a thorough analysis of the problem. Because he was busy all the time with his work as a consultant, he worked on this project either during the weekends or at night. After this phase, he came out with a set of analyses documents and a couple of test programs written in C. He was not fully satisfied with what he got, so by early 2003, he started looking towards Java and continued his work on the product on this platform that, in those years, was gaining more traction in the market. So by the mid of 2003, the first version of the ETL design tool named Stir (which is now called Spoon) came to life.

It is interesting to see a screenshot of how things were then:

How the story began…

Stir featured a big X on the graphical view, and the log view was not working and neither were most step dialogs; but, it is useful for you to understand what the starting point of this adventure was. A certain number of other releases came out, each with a different set of new features or bugs fixed.

In 2004, work was reasonably stable and he was able to deploy Kettle for the first time to a customer. Because of the "real-world" situation, a lot of things needed to be fixed and new features needed to be implemented. That was why, in those days, things were advancing a lot faster than they were in the first three years. It seemed that the code base grew so fast that several refactorings and code cleanings were needed. Version 2.0 was one of the last "unstructured" versions. But it was thanks to the Java expertise from companies such as ixor (Wim De Clerq especially) that Kettle survived and changed radically. They helped Matt a lot with refactoring and code reorganizations to give the application a better structure and to simplify the code. At that time, Kettle had a fairly complete first release with support for slow-changing dimensions, junk dimensions, 28 steps, and 13 database connectors.

The application that was initially closed source was open sourced in late 2005. The first version under this new licensing mode was published in December 2005, and the response from the community was massive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset