Validating an XML file against an XSD schema

In this recipe, you will learn how to use the XSD Validator step, in order to verify a particular XML structure using an XSD (XML Schema Definition). For the example, you will use a database of books (with the structure shown in the Appendix, Data Structures) and an XSD schema file with the books structure. You want to validate each book element against the XSD schema file.

The XSD file is named books.xsd and it looks like following:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="idTitle">
<xs:restriction base="xs:string">
<xs:pattern value="d{3}-d{3}"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="positiveDecimal">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="0.0" />
</xs:restriction>
</xs:simpleType>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="genre" type="xs:string"/>
<xs:element name="price" type="positiveDecimal"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id_title" type="idTitle" />
</xs:complexType>
</xs:element>
</xs:schema>

This schema file verifies the following features:

  • Inside a sequence, there are three elements of string type: title, genre, and author.
  • There is an element named price of a simpleType named positiveDecimal, declared earlier as a decimal type with 0.0 as its minimum value.
  • There is a simpleType named idTitle for the id_Title attribute. This type is declared as a string with a pattern expression. In this case, you will use d{3}-d{3} that means three decimal followed by a hyphen and then three more decimals, for example: 123-456.

Getting ready

You need a database with books' and authors' information. You will also need the XSD schema as a separate file. You can download the file from the book's site.

How to do it...

Carry out the following steps:

  1. Create a new transformation.
  2. Drop a Table Input step and make a selection from the Books database with the following statement:
    SELECT id_title
    , title
    , genre
    , price
    , concat(lastname,", ",firstname) author
    FROM Books
    LEFT JOIN Authors
    ON Authors.id_author=Books.id_author
    
  3. Use the Add XML step from the Transform category, in order to create a new column with the data for each book in XML format.
  4. Under the Content tab, type xmlBook in Output Value and book as the Root XML element.
  5. Under the Fields tab, use the Get Fields button to populate the grid automatically. Then, modify the Format and Decimal for the price column, as shown in the following screenshot:
    How to do it...
  6. If you do a preview on this step, then you will see a new column with an XML structure for each book. The following is a sample XML structure created with this step:
    <book id_title="423-006">
    <title>Harry Potter and the Order of the Phoenix</title>
    <genre>Childrens</genre>
    <price>32.00</price>
    <author>Rowling, Joanne</author>
    </book>
    

    Note that the structure is shown in several lines for clarity. In the preview, you will see the structure in a single line.

  7. Add an XSD Validator step from the Validation category.
  8. In the XML field located under the Settings tab, select the column xmlBook that you created in the previous step.
  9. Under the same tab, complete the Output Fields frame, as shown in the following screenshot:
    How to do it...
  10. In the XSD Source listbox inside the XML Schema Definition frame, select the option is a file, let me specify filename.
  11. Then, in XSD Filename textbox, type or select the books.xsd file.
  12. When you run this transformation, you will obtain the dataset with books along with a field indicating the result of the validation and the validation message in case of failure. Assuming that you have some errors in the source data, your final dataset will look similar to the one shown in the following screenshot:
How to do it...

How it works...

An XML Schema Definition (XSD) file defines a set of rules for validating an XML document. An XSD file allows you to verify whether a document, written in XML format, is well-formed and also respects those rules.

In this example, you created a new column with each book in XML format, and then applied the XSD Validator step to verify this column against the books.xsd schema file.

In the result of your transformation, you could see that one book didn't follow the pattern expected for the id_title field, because it didn't contain a hyphen. In that case, you obtained the following message: cvc-pattern-valid: Value '123505' is not facet-valid with respect to pattern 'd{3}-d{3}' for type 'idTitle'.

Also, one book had an incorrect price (a negative one). In that case, you got the following error: cvc-minInclusive-valid: Value '-5.00' is not facet-valid with respect to minInclusive '0.0' for type 'positiveDecimal'.

There's more...

In the recipe, you used the XSD Validation step to validate an XML structure, which in turn was made from a field in a database. In general, you can use this step to validate any XML structure, both supplied as a field or saved in a file.

In cases where you want to validate a file, you can also take advantage of the same functionality from a job entry named XSD Validation inside the XML category. The configuration of that entry is simple - it's just setting the paths to the XML file and the XSD schema file.

You can learn more about XSD from the following URL:

http://www.w3.org/TR/xmlschema-0/

See also

  • The recipe named Validating well-formed XML files. This recipe shows you the simplest method of XML validation.
  • The recipe named Validating an XML file against DTD definitions. Yet another validation method.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset