This example is similar to the previous one, with the difference being that you have to lookup the museum opening hours on a website instead of a web server. In this case, you will use the HTTP Client step.
You must have a database with the museum structure shown in the Appendix, Data Structures and a web page that provides the museum opening hours. The recipe uses an ASP page named hours.asp
, but you can use the language of your preference. The page receives the museum's identification and returns a string with the schedule. You can download a sample web page from the book's website.
Carry out the following steps:
SELECT id_museum , name , city , country FROM museums JOIN cities ON museums.id_city=cities.id_city
Hours
.status
. GET
parameter. Type id_museum in both in the Name and Parameter columns.The HTTP Client step looks for the museums' opening hours over the intranet; the step does a request to the web page for each museum in the dataset. One example of this request passing the parameter would be the following:
http://localhost/museum/hours.asp?id_museum=25
Then, the response of the page containing the museum opening hours will set the Hours
field.
The status
field will hold the status code of the operation. For example, a status code equal to 200
means a successful request, whereas a status code 400
is a bad request. You can check the different status codes at the following URL:
Suppose that each museum has a different website (and different URL address) with a web page that provides its opening hours. In this case, you can store this specific URL as a new field in the museum dataset. Then in the HTTP Client step check the Accept URL from field? checkbox and select that field from the URL field name drop-down list.
For an example using the HTTP Client step to get data from an Internet service take a look at the sample transformation in the Introduction of Chapter 8, Integrating Kettle and the Pentaho Suite.