In this recipe, you will learn how to use the XSD Validator step, in order to verify a particular XML structure using an XSD (XML Schema Definition). For the example, you will use a database of books (with the structure shown in the Appendix, Data Structures) and an XSD schema file with the books structure. You want to validate each book element against the XSD schema file.
The XSD file is named books.xsd and it looks like following:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="idTitle"> <xs:restriction base="xs:string"> <xs:pattern value="d{3}-d{3}"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="positiveDecimal"> <xs:restriction base="xs:decimal"> <xs:minInclusive value="0.0" /> </xs:restriction> </xs:simpleType> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="genre" type="xs:string"/> <xs:element name="price" type="positiveDecimal"/> <xs:element name="author" type="xs:string"/> </xs:sequence> <xs:attribute name="id_title" type="idTitle" /> </xs:complexType> </xs:element> </xs:schema>
This schema file verifies the following features:
title, genre
, and author
. price
of a simpleType
named positiveDecimal, declared earlier as a decimal type with 0.0
as its minimum value. simpleType
named idTitle for the id_Title attribute. This type is declared as a string with a pattern expression. In this case, you will use d{3}-d{3} that means three decimal followed by a hyphen and then three more decimals, for example: 123-456.You need a database with books' and authors' information. You will also need the XSD schema as a separate file. You can download the file from the book's site.
Carry out the following steps:
Books
database with the following statement:SELECT id_title , title , genre , price , concat(lastname,", ",firstname) author FROM Books LEFT JOIN Authors ON Authors.id_author=Books.id_author
price
column, as shown in the following screenshot:<book id_title="423-006"> <title>Harry Potter and the Order of the Phoenix</title> <genre>Childrens</genre> <price>32.00</price> <author>Rowling, Joanne</author> </book>
Note that the structure is shown in several lines for clarity. In the preview, you will see the structure in a single line.
An XML Schema Definition (XSD) file defines a set of rules for validating an XML document. An XSD file allows you to verify whether a document, written in XML format, is well-formed and also respects those rules.
In this example, you created a new column with each book in XML format, and then applied the XSD Validator step to verify this column against the books.xsd
schema file.
In the result of your transformation, you could see that one book didn't follow the pattern expected for the id_title
field, because it didn't contain a hyphen. In that case, you obtained the following message: cvc-pattern-valid: Value '123505' is not facet-valid with respect to pattern 'd{3}-d{3}' for type 'idTitle'
.
Also, one book had an incorrect price (a negative one). In that case, you got the following error: cvc-minInclusive-valid: Value '-5.00' is not facet-valid with respect to minInclusive '0.0' for type 'positiveDecimal'
.
In the recipe, you used the XSD Validation step to validate an XML structure, which in turn was made from a field in a database. In general, you can use this step to validate any XML structure, both supplied as a field or saved in a file.
In cases where you want to validate a file, you can also take advantage of the same functionality from a job entry named XSD Validation inside the XML category. The configuration of that entry is simple - it's just setting the paths to the XML file and the XSD schema file.
You can learn more about XSD from the following URL: