Web services are interfaces that are accessed through HTTP and executed on a remote hosting system. They use XML messages that follow the SOAP standard.
With Kettle, you can look for values in available web services by using the Web service lookup step. In this recipe, you will see an example that shows the use of this step.
Suppose that you have a dataset of museums and you want to know about their opening and closing hours. That information is available as an external web service.
The web service has a web method named GetMuseumHour
that receives the id_museum
as a parameter, and returns the museum schedule as a String
. The request and response elements for the GetMuseumHour
web method used in this recipe look like the following:
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <GetMuseumHour xmlns="http://tempuri.org/"> <idMuseum>int</idMuseum> </GetMuseumHour> </soap:Body> </soap:Envelope>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <GetMuseumHourResponse xmlns="http://tempuri.org/"> <GetMuseumHourResult>string</GetMuseumHourResult> </GetMuseumHourResponse> </soap:Body> </soap:Envelope>
You must have a database with the museum structure shown in the Appendix, Data Structures and access to a web service similar to the one detailed earlier. On the book's website, there is sample code for those services.
Carry out the following steps:
SELECT id_museum , name , city , country FROM museums JOIN cities ON museums.id_city=cities.id_city
http://localhost/museumHours/Service.asmx?wsdl
. GetMuseumHour
. This generates the necessary inputs and outputs tabs; you can also generate them by clicking on the Add Input and Add Output buttons. GetMuseumHour
. This generates the necessary inputs and outputs tabs; you can also generate them by clicking on the Add Input and Add Output buttons. id_museum
in the Name column. Hours
. http://localhost/museumsHours/Service.asmx?wsdl
.The objective in the example is to look for a value that is hosted on a web server. You do it by consuming a web service.
Note that the URL of the web service in the Web service lookup step points to a WSDL address. WSDL (Web Services Description Language) is an XML-based language used to describe web services.
When you click on the Load button in the Web service lookup step, the information retrieved from the WSDL contract is interpreted and used to fill the Operation combo with the web methods found in the web service. When you choose a method, it automatically sets the necessary values for the input and the output. You only need to write the local values that will be passed as parameters (In tab), and the value for the result (GetMuseumHourResult tab).
If the tabs for the input and output are not created automatically, you can write the specifications manually, by clicking on the Add Input and Add Output buttons.
For each museum row, there will be a request to the web service passing the id_museum
parameter declared in the In tab. The result containing the museum opening hours will be saved in the Hours
field declared under the GetMuseumHourResult tab.
There is also a step named Check if webservice is available from the Lookup category. You can use it to verify that the web service is available just before consuming it. In this step, you must have the URL address as a field in each row.
In some cases, the web server could be flooded due to the multiple simultaneous requests and it could return an error similar to: Too many users are connected
.
In these cases, you can check the configuration of the web server. Alternatively, you can rerun the transformation consuming the web service by groups of rows, forcing a delay to avoid the web server saturation.
For more information about web services, you can follow this link:
http://en.wikipedia.org/wiki/Web_service
More information about WSDL can be obtained from the following URL: