Validating an XML file against DTD definitions

A Document Type Definition (DTD) defines the document structure of an XML document with a list of elements and attributes. Kettle provides the DTD Validator entry job to do a validation against a DTD definition file.

For example, suppose you have an XML file with museums information, as follows:

<museums>
<museum>
<name>Fundacion Federico Klemm</name>
<city>Buenos Aires</city>
<country>Argentina</country>
</museum>
<museum id_museum= '2'>
<name>Fundacion Proa</name>
<city>Buenos Aires</city>
<country>Argentina</country>
</museum>
<museum id_museum= '9'>
<name>Museu Nacional de Belas Artes</name>
<country>Brazil</country>
</museum>
<museum id_museum= '19'>
<name>Biblioteca Luis Angel Arango</name>
<city>Bogota</city>
<country>Colombia</country>
</museum>
</museums>

You want to validate it against the following DTD definition file:

<!DOCTYPE museums [
<!ELEMENT museums (museum+)>
<!ELEMENT museum (name+, city, country)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ATTLIST museum id_museum CDATA #REQUIRED >
]>

With this definition, you are declaring the museum structure elements: name, city, and country, and defining the attribute id_museum as required.

Getting ready

For this recipe, you need a museum.xml document with DTD definition included. You can download it from the book's website.

Note

You can have the DTD definition as an independent file or inside the XML document. If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with the following syntax:<!DOCTYPE root-element [element-declarations]>

How to do it...

Carry out the following steps:

  1. Create a new job and add a Start entry.
  2. Drop a DTD Validator job entry from the XML category into the canvas.
  3. Here, you must point to your XML file in the XML File name textbox.
  4. Check the DTD Intern checkbox.
  5. Run this job, so that the XML data gets validated against the DTD definitions, which are inside the XML file.
  6. You can see the result of the validation including information about the errors under the Logging tab in the Execution results window. In this case, the results are as follows:
    • For the first element, the job will detect this error: Attribute "id_museum" is required and must be specified for element type "museum".
    • The second and fourth museum elements are correct.
    • For the third element, you will receive the following message: The content of element type "museum" must match "(name+,city,country)".

How it works...

The DTD Validator job entry does the entire task of validating an XML file against a DTD definition. In the recipe, you checked the DTD Intern checkbox because the DTD definitions were inside the XML file. Otherwise, you must fill the DTD File name textbox with the name of the proper DTD file.

There's more...

DTD has a lot of limitations. For example, you cannot define types for the XML elements or attributes. If you need more flexibility, the recommendation is to use the XSD validation feature.

You can learn more about DTD definitions here: http://www.w3schools.com/dtd/default.asp.

See also

  • The recipe named Validating an XML file against an XSD schema in this chapter. In this recipe, you can see an XSD validation example.
  • The recipe named Validating well-formed XML files in this chapter. This recipe is for you if you only want to validate whether your XML is well-formed or not.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset