Schema Built-in Data Types

Schema language provides a very large set of data types. If you work with enough different schemas for long enough, you may eventually see all the data types used. However, the set I list below accounts for most you will see in common business documents.

  • string: This data type is just what its name implies. The important thing to note is that an Element with a type of string can contain any Unicode character, including those with double-byte encoding, if so specified in the instance document prolog.

  • boolean: Logically true or false, but the full set of allowed values for this data type is true, false, 0, and 1.

  • decimal: This is a number that may have a fractional part, that is, values to the left and right of the decimal point. However, it need not have both. Float and double are similar but aren't seen as often in business document schemas. A leading sign is allowed but may be omitted if positive. Leading and trailing zeroes are optional. If the fractional part is zero, the decimal and trailing zeroes may be omitted unless they're needed to indicate precision.

  • integer: Again, this data type is just what the name implies. Specialized integer types include nonPositiveInteger, negativeInteger, long, int, and short.

  • date, time, and dateTime: These data types represent units of time as specified by ISO 8601 (the International Organization for Standardization's representations of dates and times). The most commonly used are CCYY-MM-DD, hh:mm:ss.sss, and CCYY-MM-DDThh:mm:ss, respectively. Fractional seconds may be omitted, but seconds are required. The T indicates a time zone. A Z indicates Coordinated Universal Time (UTC). The local time is indicated by using the T and following the seconds with an offset. For example, 2002-08-23T20:27:00-05:00 could indicate August 23, 2002, 3:27 PM U.S. Central Daylight Time. Be aware that an offset from UTC is not necessarily the same thing as a time zone, although a time zone could be inferred from other knowledge. For example, -05:00 could be either Central Daylight Time or Eastern Standard Time depending on the day of the year.

To reiterate, there are many other built-in data types, but the short list above encompasses those most commonly used. If you encounter a type that isn't on this list, look for the definition in the Schema Recommendation.

Don't let talk of value space, canonical representations, lexical space, lexical representations, constraining facets, and order relations put you off. Here are the meanings of the terms you need to know the most.

  • Value space: This is the logical set of values that a data type might have, independent of how it might look in a data stream. For example, the concept of positive integers represents a value space.

  • Lexical representation, lexical space: These deal with how the values look in an XML instance document. For the value space of positive integers, we have the set 1, 2, 3, 4, and so on. The value space consisting of the integer value 1 might have a lexical space of 01 and 001 in addition to 1.

  • Constraining facets: Schema designers can use these attributes on the built-in types when they want to limit the range of allowable values. We'll talk about some of the most commonly used facets.

Note that despite the rich set of types that schema language provides, you will still see people using constructs that don't take advantage of those provided. For example, instead of using the conventional boolean, I have seen a Yes No Indicator used with the values Y and N enumerated. I have also seen date types created to accommodate alphabetic month abbreviations or slash separators, such as 23-Aug-2002 or 8/23/02, respectively. Another example is a derived type for a date range to express concepts such as June 1, 2002, through July 31, 2002. Using the built-in date data type you could easily express this concept as:

<OnSaleRange>
  <BeginDate>2002-06-01</BeginDate>
  <EndDate>2002-07-31</EndDate>
</OnSaleRange>

However, to save themselves a few bytes in instance documents, some schema authors create their own derived types that allow them to say:

<OnSaleRange>2002-06-01 - 2002-07-31</OnSaleRange>

Schema language makes it very easy to do things like this. As I said earlier, there are a thousand different ways to hang yourself.

What is much more common than such practices, however, is restricting the range of values allowed for the built-in data types. There are very good business reasons for doing this. We'll talk next about some of the types of restrictions you're likely to see.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset