Output Validation in CSVToXMLBasic.cpp

There were a few idiosyncrasies with validating XML as input, but validating output can be downright tricky. I expected output validation to be a piece of cake, but it took me a couple of hours to figure out what was going on and to make it work correctly. Going through the exercise may help you understand a bit more about how MSXML does things.

The first time I coded the routine I just added a block of code to CSVToXMLBasic.cpp that was almost identical to the input validation snippet shown above in XMLToCSVBasic.cpp. It didn't work. I got a validation failure message indicating that “the root element had no associated DTD/schema.”

MSXML stores schemas internally and makes them available through the IXMLDOMSchemaCollection/XMLSchemaCache object. I was already familiar with the “Validating an XML Document against an XML Schema Using C++” example in the MSXML online documentation. The example goes through a somewhat convoluted process of creating a schema collection, associating it with the instance document to be validated, creating and loading the schema document, and finally adding it to the schema collection before calling the instance document's validate method. I guessed that MSXML couldn't identify the schema merely from the root Element's noNamespaceSchemaLocation Attribute. So, I added similar code to my main routine. The document still didn't validate, but I was getting closer to the source of the problem. This time the validation failure message indicated that the noNamespaceSchemaLocation Attribute “is not defined in the DTD/Schema.” So, it didn't recognize that Attribute as being part of my instance document's default target namespace. However, it didn't recognize the Attribute as being from the xsi namespace either!

This led me to review how I had added that Attribute. It occurred to me that it might be significant that MSXML does not offer the DOM Level 2 setAttribute NS method on the Element interface. This is one of the rare cases in which MSXML doesn't support the standards as well as Xerces. Since it wasn't offered, I had fallen back to adding the noNamespaceSchemaLocation Attribute using the setAttribute method. However, on further investigation the MSXML documentation advised me that a namespace qualified Attribute could not be added using that method. Instead the Attribute must be added using the Document interface's createNode method. So, I modified the code to add the Attribute, as you see in the next snippet.

Adding the noNamespaceSchemaLocation Attribute
//  Next set the schema location.
//  MSXML requires that namespace qualified Attributes
//  be created as Nodes, then set.
variant_t varType((short)NODE_ATTRIBUTE);
spSchemaLocationAttribute = spDocOutput->createNode(
  varType,"xsi:noNamespaceSchemaLocation",
  "http://www.w3.org/2001/XMLSchema-instance");
spEleRoot->setAttributeNode(spSchemaLocationAttribute);
// We can finally set it now.
spEleRoot->setAttribute("xsi:noNamespaceSchemaLocation",
  cSchemaFileFullPath);

This finally worked: The instance document validated before I called the save method. However, it made me wonder whether or not all the schema cache stuff was really necessary. After all, we didn't need it on input validation. So, I commented out the code that created the schema collection, read the schema document, and so on, then just cut straight to the validate method. It worked. It seems that since MSXML didn't recognize the noNamespaceSchemaLocation Attribute as being in the xsi namespace, it didn't use the value of that Attribute to load the schema document. Once MSXML understood the Attribute properly, MSXML used it. I hacked out the schema cache code, and what you see in CSVToXMLBasic.cpp is almost identical to what you see in XMLToCSVBasic.cpp.

In some cases you may actually need to load and use an MSXML schema cache. Be aware that they exist, and if you have trouble with validation, try using one. However, the code works fine without them as long as you're doing pretty basic stuff such as I show in these utilities. I'm generally not a fan of writing code I don't have to write, so you won't see me using schema caches.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset