File Description Document Schemas

In designing the schemas for the invoice and purchase order I was fairly safe in letting a tool do most of the work. However, for the file description documents I start from the ground up. Much of the rationale was discussed in Chapter 6, so I'll point out here only things that I didn't mention there. We'll also discuss other issues related to schema design in Chapter 12.

There are four schemas presented in this subsection.

  1. CSVSourceFileDescription.xsd: This is the schema for file description documents that describe conversions in which the source format is a CSV file.

  2. CSVTargetFileDescription.xsd: This schema is for file description documents that describe conversions in which the target format is a CSV file.

  3. CSVCommonFileDescription.xsd: This type library schema is used by the two previous schemas.

  4. BBCommonFileDescription.xsd: This schema specifies types used in all the Babel Blaster conversions. It primarily defines the type for the supported Babel Blaster data types (corresponding to the DataCell derived classes) and the enumeration of the codes for that type.

So, here are the schemas, with a few more comments interspersed where appropriate. If you recall the lessons of Chapter 4 and review the comments in Chapter 6 about the union data type, you should be able to read these schemas fairly well.

CSV Source File Description Schema (CSVSourceFileDescription.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="unqualified"
    attributeFormDefault="unqualified">
  <xs:include schemaLocation="CSVCommonFileDescription.xsd"/>
  <xs:element name="CSVSourceFileDescription">
    <xs:annotation>
      <xs:documentation>
        This schema specifies the format of File Description
        Documents when converting from CSV files as source to XML
        documents as targets
      </xs:documentation>
    </xs:annotation>
    <xs:complexType mixed="false">
      <xs:sequence>
        <xs:element name="PhysicalCharacteristics"
            type="CSVPhysicalCharacteristicsType"/>
        <xs:element name="XMLOutputCharacteristics"
            type="CSVXMLOutputCharacteristicsType"/>
        <xs:element name="Grammar"
            type="CSVGrammarType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

CSV Target File Description Schema (CSVTargetFileDescription.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="unqualified"
    attributeFormDefault="unqualified">
  <xs:include schemaLocation="CSVCommonFileDescription.xsd"/>
  <xs:element name="CSVTargetFileDescription">
    <xs:annotation>
      <xs:documentation>
        This schema specifies the format of File Description
        Documents when converting from XML documents as source to
        CSV files as targets
      </xs:documentation>
    </xs:annotation>
    <xs:complexType mixed="false">
      <xs:sequence>
        <xs:element name="PhysicalCharacteristics"
            type="CSVPhysicalCharacteristicsType"/>
        <xs:element name="Grammar"
            type="CSVGrammarType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

You'll notice that these two schemas are nearly identical. Aside from the difference in the root document Element name, the source file schema has a required XMLOutputCharacteristics Element that does not appear in the target file schema.

CSV File Description Common Schema (CSVCommonFileDescription.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="unqualified"
    attributeFormDefault="unqualified">
  <xs:include schemaLocation="BBCommonFileDescription.xsd"/>
  <xs:complexType name="CSVPhysicalCharacteristicsType"
      mixed="false">
    <xs:annotation>
      <xs:documentation>
          Describes the CSV physical record organization
      </xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element name="RecordTerminator">
        <xs:complexType>
          <xs:complexContent>
            <xs:extension base="EmptyType">
              <xs:attribute name="value"
                  type="RecordTerminatorValueType"
                  use="required"/>
            </xs:extension>
          </xs:complexContent>
        </xs:complexType>
      </xs:element>
      <xs:element name="ColumnDelimiter" type="DelimiterType"/>
      <xs:element name="TextDelimiter" type="DelimiterType"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="CSVXMLOutputCharacteristicsType"
      mixed="false">
    <xs:annotation>
      <xs:documentation>
        Describes characteristics of the output XML document
      </xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element name="DocumentBreakColumn">
        <xs:complexType mixed="false">
          <xs:complexContent mixed="false">
            <xs:extension base="EmptyType">
              <xs:attribute name="value" type="BreakColumnType"
                  use="required"/>
            </xs:extension>
          </xs:complexContent>
        </xs:complexType>
      </xs:element>
      <xs:element name="PartnerBreakColumn">
        <xs:complexType mixed="false">
          <xs:complexContent mixed="false">
            <xs:extension base="EmptyType">
              <xs:attribute name="value" type="BreakColumnType"
                  use="required"/>
            </xs:extension>
          </xs:complexContent>
        </xs:complexType>
      </xs:element>
      <xs:element name="SchemaLocationURL" minOccurs="0">
        <xs:complexType>
          <xs:complexContent>
            <xs:extension base="EmptyType">
              <xs:attribute name="value" type="anyURI127"
                  use="required"/>
            </xs:extension>
          </xs:complexContent>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="CSVGrammarType">
    <xs:annotation>
      <xs:documentation>
        Describes the grammar of the CSV file
      </xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element name="RowDescription">
        <xs:annotation>
          <xs:documentation>
            Describes a row in the CSV file. Currently, all rows
            must have the same format so we only allow a single
            one of these Elements.
          </xs:documentation>
        </xs:annotation>
        <xs:complexType>
          <xs:sequence>
            <xs:element name="ColumnDescription" maxOccurs="100">
              <xs:annotation>
                <xs:documentation>
                  Describes a column in the row. The current
                  design limits us to one hundred columns per
                  row.
                </xs:documentation>
              </xs:annotation>
              <xs:complexType>
                <xs:complexContent>
                  <xs:extension base="FieldGrammarType">
                    <xs:attribute name="DelimitText"
                        type="xs:boolean" use="optional"
                        default="false"/>
                  </xs:extension>
                </xs:complexContent>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
          <xs:attribute name="ElementName" type="NMToken127"
              use="required"/>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="ElementName" type="NMToken127"
        use="required"/>
  </xs:complexType>
  <xs:simpleType name="BreakColumnType">
    <xs:annotation>
      <xs:documentation>
        Enforces restrictions on column number for partner and
        document break
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:nonNegativeInteger">
      <xs:maxExclusive value="100"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

Babel Blaster File Description Common Schema (BBCommonFileDescription.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    elementFormDefault="unqualified"
    attributeFormDefault="unqualified">
  <xs:annotation>
    <xs:documentation>
      This schema specifies types common to all Babel Blaster
      conversion utilities.
    </xs:documentation>
  </xs:annotation>
  <xs:annotation>
    <xs:documentation>
      These complex types define reused types with Attributes
      only and no Element children.
    </xs:documentation>
  </xs:annotation>
  <xs:complexType name="EmptyType">
    <xs:annotation>
      <xs:documentation>
        Base Type for empty types
      </xs:documentation>
    </xs:annotation>
  </xs:complexType>
  <xs:complexType name="DelimiterType">
    <xs:annotation>
      <xs:documentation>
        Base type for defining delimiters
      </xs:documentation>
    </xs:annotation>
    <xs:complexContent>
      <xs:extension base="EmptyType">
        <xs:attribute name="value" type="DelimiterValueType"
            use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="FieldGrammarType" mixed="false">
    <xs:annotation>
      <xs:documentation>
        Base type for defining field grammars
      </xs:documentation>
    </xs:annotation>
    <xs:complexContent mixed="false">
      <xs:extension base="EmptyType">
        <xs:attribute name="FieldNumber" type="FieldNumberType"
            use="required"/>
        <xs:attribute name="ElementName" type="NMToken127"
            use="required"/>
        <xs:attribute name="DataType" type="BBDataType"
            use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <xs:annotation>
    <xs:documentation>
      These simple types define unions of other simple types.
    </xs:documentation>
  </xs:annotation>
  <xs:simpleType name="DelimiterValueType">
    <xs:annotation>
      <xs:documentation>
        Type for column and text delimiters. A union of a
        single character (as a token of length one) or a two-
        byte hex value.
      </xs:documentation>
    </xs:annotation>
    <xs:union memberTypes="Token1 HexBinary1"/>
  </xs:simpleType>
  <xs:simpleType name="RecordTerminatorValueType">
    <xs:annotation>
      <xs:documentation>
        Type for value attribute of RecordTerminator - union of
        U,W, and 2 byte Hex.
      </xs:documentation>
    </xs:annotation>
    <xs:union memberTypes="OSTerminatorType HexBinary1"/>
  </xs:simpleType>
  <xs:annotation>
    <xs:documentation>
      These simple types specify restrictions on built-in schema
      data types.
    </xs:documentation>
  </xs:annotation>
  <xs:simpleType name="OSTerminatorType">
    <xs:annotation>
      <xs:documentation>
        Enumerations for OS terminator values
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:token">
      <xs:enumeration value="U"/>
      <xs:enumeration value="W"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="HexBinary1">
    <xs:annotation>
      <xs:documentation>
        Type for a single-byte hex number
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:hexBinary">
      <xs:length value="1"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="Token1">
    <xs:annotation>
      <xs:documentation>
        Token with length of 1
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:token">
      <xs:length value="1"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="BBDataType">
    <xs:annotation>
      <xs:documentation>
        These are the supported native Babel Blaster data types
        for CSV files. Add an enumeration element when adding a
        new data type.
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:token">
      <xs:enumeration value="AN"/>
      <xs:enumeration value="R"/>
      <xs:enumeration value="DMMsDDsYYYY"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="NMToken127">
    <xs:annotation>
      <xs:documentation>
        Data type for Element names. Restricted to 127 characters
        since C++ char arrays are 128.
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:NMTOKEN">
      <xs:maxLength value="127"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="FieldNumberType">
    <xs:annotation>
      <xs:documentation>
        Enforces restriction on maximum number of fields
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:positiveInteger">
      <xs:maxExclusive value="100"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="anyURI127">
    <xs:annotation>
      <xs:documentation>
        Enforces restriction on maximum length of URI
      </xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:anyURI">
      <xs:maxLength value="127"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

The schema representation of the grammar is actually very simple. The schema is somewhat of an abstraction of the grammar in that we can ignore physical characteristics such as column and text delimiters. It basically reduces us to saying that the grammar, as expressed in the Grammar Element, is composed of a single row description in the RowDescription Element. A row is composed of one or more columns as described in the ColumnDescription Element. The schema, in effect, reduces the description of the grammar to the following productions.

CSV File Grammar in Schema
CSVFileGrammar ::= rowGrammar
rowGrammar ::= columnGrammar+

Again, we'll see the more complex version used when we discuss the parsing algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset