A Document Type Definition (DTD) defines the document structure of an XML document with a list of elements and attributes. Kettle provides the DTD Validator entry job to do a validation against a DTD definition file.
For example, suppose you have an XML file with museums information, as follows:
<museums> <museum> <name>Fundacion Federico Klemm</name> <city>Buenos Aires</city> <country>Argentina</country> </museum> <museum id_museum= '2'> <name>Fundacion Proa</name> <city>Buenos Aires</city> <country>Argentina</country> </museum> <museum id_museum= '9'> <name>Museu Nacional de Belas Artes</name> <country>Brazil</country> </museum> <museum id_museum= '19'> <name>Biblioteca Luis Angel Arango</name> <city>Bogota</city> <country>Colombia</country> </museum> </museums>
You want to validate it against the following DTD definition file:
<!DOCTYPE museums [ <!ELEMENT museums (museum+)> <!ELEMENT museum (name+, city, country)> <!ELEMENT name (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ATTLIST museum id_museum CDATA #REQUIRED > ]>
With this definition, you are declaring the museum structure elements: name, city
, and country
, and defining the attribute id_museum as required.
For this recipe, you need a museum.xml document with DTD definition included. You can download it from the book's website.
Carry out the following steps:
Attribute "id_museum" is required and must be specified for element type "museum"
. The content of element type "museum" must match "(name+,city,country)"
.The DTD Validator job entry does the entire task of validating an XML file against a DTD definition. In the recipe, you checked the DTD Intern checkbox because the DTD definitions were inside the XML file. Otherwise, you must fill the DTD File name textbox with the name of the proper DTD file.
DTD has a lot of limitations. For example, you cannot define types for the XML elements or attributes. If you need more flexibility, the recommendation is to use the XSD validation feature.
You can learn more about DTD definitions here: http://www.w3schools.com/dtd/default.asp.