Reading an Excel file

Kettle provides the Excel input step, in order to read data from Excel files. In this recipe, you will use this step to read an Excel file regarding museums in Italy. The file has a sheet with one column for the name of the museum and other for the city where it is located. The data starts in the C3 cell (as shown in the screenshot in the next section).

Getting ready

For this example, you need an Excel file named museumsItaly.xls with a museums sheet, as shown in the following screenshot:

Getting ready

You can download a sample file from the book's site.

How to do it...

Carry out the following steps:

  1. Create a new transformation.
  2. Drop an Excel input step from the Input category.
  3. Under the Files tab, browse to the museumsItaly.xls file and click on the Add button. This will cause the name of the file to be moved to the grid below.
  4. Under the Sheet tab, fill in the first row as follows: type museums in the Sheet name column, 2 in the Start row, and 2 in the Start column.

    Note

    The rows and columns are numeric values (you cannot define the column with the identification letter you see in Excel). These values are zero-based (they start at the number 0).

  5. Under the Content tab, leave the Header checked.
  6. Under the Fields tab, click on the Get fields from header row button to obtain the name and city fields.
  7. Previewing the step, you will obtain a dataset with the museums data coming from the Excel sheet.

How it works...

The Excel input step allows you to read Excel files. Starting with Kettle 4.1.0, you can also use this step to read OpenOffice calc files.

This recipe showed you the way to read a simple Excel file, with a single sheet. However, the Excel input step allows you to read several Excel files at the same time. You do it just by adding more filename specifications to the grid located under the File tab. The step also allows you to read multiple sheets. You can click on the Get Sheetname(s) button to select from the list of sheets to read. If you don't specify any sheet in the grid, the step will read all of them.

Note

Take care when you leave the sheet name blank or when you select more than one sheet because if the sheets have different structures, you will get an error.

Except for the sheet information, configuring an Excel input step for reading an Excel file is quite the same as configuring a Text file input step. You should not have any troubles making it work.

See also

  • The section named Reading a simple file in this chapter. As said, the configuration of an Excel input and a Text file input step are similar—you can learn more about how to configure the Fields tab in this recipe.
  • The section named Using the name of a file (or part of it) as a field in this chapter. See this recipe for more information about the Additional output fields tab.
  • The section named Getting the value of specific cells in an Excel file in this chapter. Refer to this recipe if you don't know exactly where your data is in a sheet.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset