Chapter 2 briefly introduced the use of types in XQuery. This chapter delves deeper into the XQuery type system and its set of built-in types. It explains the automatic type conversions performed by the processors and describes the expressions that are relevant to types, namely type constructors, cast and castable expressions, and instance of
expressions.
XQuery is a strongly typed language, meaning that each function and operator is expecting its arguments or operands to be of a particular type. This means, for example, that you cannot perform arithmetic operations on strings, without explicitly telling the processor to treat your strings like numbers. This is similar to some common object-oriented programming languages, like Java and C#. It is in contrast to most scripting languages, like JavaScript, which will automatically coerce values to the appropriate type.
There are several advantages to a strong type system. One of them is the early and reliable identification of errors in a query. Potential errors in the query can be determined statically before the query is even executed. For example, if you are trying to double a value that is a string (e.g., a product name), there is probably an error in the query. In addition, a strict type system allows for the identification of errors in the values of input data. This identification of errors can make queries easier to debug, and results in more reliable queries that are able to handle a variety of input data. This is especially true if schemas are used, because schema types can help identify possible errors. A schema allows the processor to tell you that the product name is a string and that you should not be trying to double it. Based on a schema, the processor can also tell you when you’ve specified a path that will never return any elements—for example, because of a misspelling or an invalid chain of steps.
Another advantage of a strong type system is optimization. Implementations can optimize performance if they know more about the types of data. This too is especially true if schemas are used, because schema types can help a processor find specific elements. If your schema says that all number
elements appear as children of product
elements, your processor only has to look in one place for the number
elements you have requested in your query. If it knows that there is always only one number
per product
, it can further optimize certain comparison operations.
A strong type system has its disadvantages, too. One is that it can complicate query authoring, because more attention is being paid to types. For example, if you know you want to treat a numeric value like a string, you have to explicitly cast it to xs:string
in order to perform string-related operations. Also, supporting an extensive type system can put a burden on implementers of the standard. This is why the more complex features—schema awareness and static typing—are optional features of the standard that are not available in all implementations.
If you do not use schemas, your input data will be untyped. Usually, this means that you, as a query author, do not need to be especially concerned about types. Because of the type conversions described in “Automatic Type Conversions”, the processor will usually “do the right thing” with your data.
For example, you may pass an untyped price
element to the round
function, or multiply it by two. In these cases, the processor will automatically assume that the content of the price
element is numeric, and convert it to a numeric type. Likewise, calling the substring
function with a name
element will assume that name
contains a string.
There is the occasional “gotcha,” though. One example is comparing two untyped values by using general comparison operators (e.g., <
or =
). If the values are untyped, they are compared as strings. Therefore, if you compare the untyped price
element <price>123.99</price>
with the untyped price
element <price>99.99</price>
, the second will be considered greater because the string value starts with a greater digit. Similarly, order by
clauses in FLWORs assume that untyped values are strings rather than numbers. In both of these cases, the prices need to be explicitly converted to numbers in order to be sorted or compared as numbers. Casting is described in “Constructors and Casting”.
With untyped text values, you need to be concerned when using the max
and min
functions. These two functions treat untyped data as if it is numeric. Therefore, the expression:
max(doc("catalog.xml")//name)
will raise error FORG0001
. Instead, you need to cast the names to xs:string
. One way to do this is to use the string
function, as in:
max(doc("catalog.xml")//name/string())
If you do use schemas, you will be able to get more of the benefits of strong typing, but you will need to pay more attention to types when writing your query. Unlike some weakly typed languages, XQuery will not automatically convert values of one type to an unrelated type (for example, a string to a number). So, if your schema for some reason declares the price
element to be of type xs:string
, you will not be able to perform arithmetic operations, or call functions like round
, on your price without explicitly casting it to a numeric type.
A wide array of simple types is built into XQuery. All the built-in types are covered individually in detail in Appendix B, with a description, lexical representations, and examples. In practice, you are likely to need only a handful of these built-in types. The XML Schema type system divides simple types into three varieties: atomic types, list types, and union types.
The atomic types, shown in Figure 11-1, represent common datatypes such as strings, dates, and times. The built-in types are identified by qualified names that are prefixed with xs
, because they are defined in the XML Schema Namespace. You can use all of these built-in types in your queries regardless of whether the implementation is actually schema-aware, and whether or not you are using schemas to validate your source or result documents.
Nineteen of the built-in types are primitive, meaning that they are the top level of the type hierarchy. Each primitive type has a value space, which describes all its valid values, and a set of lexical representations for each value in the value space. There is one lexical representation, the canonical representation, that maps one-to-one with each value in the value space. The canonical representation is important because it is the format used when a value is serialized or cast as a string.
For example, the primitive type xs:integer
has a value that is equal to 12
in its value space. This value has multiple lexical representations that map to the same value, such as 12
, +12
, and 012
. The canonical representation is 12
. Some primitive types, such as xs:date
, only have one lexical representation, which becomes, by default, the canonical representation.
The rest of the built-in types are derived (directly or indirectly) from one of the primitive types. The derived built-in types (and indeed, user-defined types) inherit the qualities of the primitive type from which they are derived, including their value space (possibly restricted), lexical representations, and canonical representations. Their values can also be substituted for each other. For example, the insert-before
function expects a value of type xs:integer
for its second argument. Nevertheless, it accepts a value of any type derived from xs:integer
, such as xs:positiveInteger
or xs:long
.
At the top of the built-in atomic type hierarchy is xs:anyAtomicType
. This type encompasses all the other atomic types. No values ever actually have the type xs:anyAtomicType
; they always have a more specific type. However, this type name can be used as a placeholder for all other atomic types. For example, the distinct-values
function signature specifies that its first argument is xs:anyAtomicType
. This means that atomic values of any type can be passed to this function.
A list type represents a list of possibly multiple atomic values of a particular type, known as its item type. There are three list types built into the type system: xs:IDREFS
, xs:NMTOKENS
, and xs:ENTITIES
. They are defined as list types with item types xs:IDREF
, xs:NMTOKEN
, and xs:ENTITY
, respectively. It is also possible to define new list types in a schema. List types are treated somewhat differently from atomic types in XQuery because, for example, they cannot appear in sequence types. However, other type-related features such as type constructors and casting are available for list types.
A union type allows a value to be a choice among several different types, known as its member types. There is one union type built into the type system, xs:numeric
, which is defined as the union of the three primitive numeric types: xs:double
, xs:float
, and xs:decimal
. This is a convenient way of allowing certain functions and operators, for example the round
function, to accept values of any numeric type. It is also possible to define new union types in a schema. Union types whose members are atomic types (like xs:numeric
) are treated much like atomic types in XQuery.
Element and attribute nodes, as well as atomic values, all have types associated with them. Sequences don’t technically have types, although they can be matched to sequence types, as described later in this chapter.
All element and attribute nodes have type annotations, which indicate the type of their content. An element or attribute can come to be annotated with a specific type when it is validated against a schema. This might occur when the document is first opened, or as the result of a validate
expression. Schema validation is discussed further in Chapter 14.
If an element or attribute has not been validated and does not have a specific type, it is automatically assigned a generic type, namely xs:untyped
(for elements) or xs:untypedAtomic
(for attributes). Sometimes these nodes are referred to as untyped, despite the fact that they do have a type, albeit a generic one.
Attributes, and most elements, also have a typed value. This typed value is an atomic value extracted from the node, taking into account the node’s type annotation. For example, if the number
element has been validated and given the type xs:integer
, its typed value is 784
(type xs:integer
). If the number
element is untyped, its typed value is 784
(type xs:untypedAtomic
). The data
function allows you to retrieve the typed value of a node.
Every atomic value has a type. An atomic value might have a specific type because:
It is extracted from an element or attribute that has a type annotation. This can be done explicitly using the data
function, or automatically using many functions and operators.
It is the result of a constructor function or a cast expression.
It is the value of a literal expression. Literals surrounded by single or double quotes are considered to have the type xs:string
, whereas non-quoted numeric values have the type xs:integer
, xs:decimal
, or xs:double
, depending on their format.
It is the result of an expression or function that returns a value of a particular type—for example, a comparison expression returns an xs:boolean
value, and the count
function returns an xs:integer
.
A value might not have a specific type if it was extracted from an untyped element or attribute. In this case, it is automatically assigned the generic type xs:untypedAtomic
. Untyped atomic values can be used wherever a typed value can be used, and they are usually cast to the required type automatically. This is because every function and expression has rules for casting untyped values to an appropriate type.
Because XQuery is a strongly typed language, an XQuery processor verifies that all items are of the appropriate type and raises type errors when they are not. There are two phases to processing a query: the static analysis phase and the dynamic evaluation phase, both of which have type-checking components.
During the static analysis phase, the processor checks the query itself, along with any related schemas, for static errors, without regard to the input documents. It is roughly equivalent to compiling the query; that is, checking for syntax errors and other errors that will occur regardless of the input document. The processor raises static errors during the static analysis phase. Examples of static errors include:
Syntax errors, such as invalid keywords or mismatched brackets
Referring to a variable or calling a function that has not been declared
Using namespace prefixes that are not declared
Some implementations support an optional static typing feature, which means that they evaluate the types of expressions in a query during the static analysis phase. This allows errors in the query to be caught early and more reliably, and can help optimize queries. A number of expressions, functions, and syntactic constructs are available solely to support static typing. These are discussed in Chapter 15.
Implementations that don’t claim to support the static typing feature might also do static analysis in order to reduce the amount of runtime type checking needs. It’s always a good idea to declare the types of your variables, function parameters, and function return types to give the processor as much information as possible.
During the dynamic evaluation phase, the processor checks the query again, this time with the data from the input document. Some expressions that did not result in errors during the analysis phase will in fact result in errors during the evaluation phase. For example, the expression:
sum(doc("catalog.xml")//number)
might pass the static analysis phase if number
is untyped; the processor has no way of knowing whether all the contents of the number
elements will be numeric values. However, it will raise a dynamic error in the evaluation phase if any of the number
elements contains a value that cannot be cast to a numeric type, such as the string abc
.
In XQuery, each function and operator expects its arguments to be of a particular type. However, this is not as rigid as it may sound because there are a number of type conversions that happen automatically. They are discussed in this section.
Functions and operators that expect a value of a particular type also accept a value of one of its derived types. This is known as subtype substitution. For example, the upper-case
function expects an xs:string
as an argument, but you can pass a value whose type is derived by restriction from xs:string
, such as xs:NMTOKEN
. This also works for complex types defined in schemas. A function expecting an element of type ProductType
also accepts an element of type UmbrellaType
, if UmbrellaType
is derived by restriction from ProductType
. Note that the value retains its original type; it is not actually cast to another type.
When two values of different numeric types are compared or used in the same operation, one is promoted to the type of the other. An xs:decimal
value can be promoted to the xs:float
or xs:double
type, and an xs:float
value can be promoted to the xs:double
type. For example, the expression 1.0 + 1.2E0
adds an xs:decimal
value (1.0
) to an xs:double
value. The xs:decimal
value is promoted to xs:double
before the expression is evaluated. Numeric type promotion happens automatically in arithmetic expressions, comparison expressions, and function calls.
In addition, values of type xs:anyURI
are automatically promoted to xs:string
in comparison expressions and function calls. Unlike subtype substitution, type promotion results in the type of a value changing.
In some cases, an untyped value is automatically cast to a specific type. This occurs in function calls, as well as in comparison and arithmetic expressions. For example, if you call the upper-case
function with an untyped value, it is automatically cast to xs:string
. If you add an untyped value to a number, as in <a>3</a> + 2
, the untyped value 3
is cast to xs:integer
, and the expression returns 5
.
Note that typed values are not automatically cast. For example, "3" + 2
will not automatically cast the string 3
to the number 3, even though this is theoretically possible. One exception is the concat
function, which automatically casts its arguments to strings. But that’s special behavior of this particular function, not something that happens implicitly on the function call.
Atomization occurs when a function or operator expects an atomic value and receives a node instead. Specifically, it is used in:
Arithmetic operations
Comparisons
Function calls and returns
Cast expressions and constructors
Name expressions in computed constructors
Switch expressions
Atomization involves extracting the typed value of one or more elements or attributes to return one or more atomic values. For example:
<e1>3</e1> + 5
returns the value 8
because the value 3
is extracted from the e1
element during atomization. Also:
substring(<e2>query</e2>, 2, 3)
returns uer
because the string query
is extracted from the e2
element. These two examples work if e1
and e2
are untyped, because their so-called typed values would be instances of xs:untypedAtomic
, and would be cast to the type required by the operation. They would work equally well if e1
had the type annotation xs:integer
, and e2
had the type annotation xs:string
, in which case no casting would need to take place.
It is often useful to treat a sequence as a Boolean value. For example, if you want to determine whether your catalog
element contains any products whose price is less than 20, you might use the expression:
if (doc("prices.xml")//prod[price < 20]) then <bargain-bin>...</bargain-bin> else ()
In this case, the result of the path expression doc("prices.xml")//prod[price < 20]
is a sequence of elements that match the criteria. However, the test expression (after if
) simply needs a true/false answer regarding whether there are any elements that match the criteria. Here, the sequence is automatically converted to its effective boolean value, which essentially indicates whether it is empty.
Sequences are automatically interpreted as Boolean values in:
Conditional (if
-then
-else
) expressions
Logical (and
/or
) expressions
where
clauses of FLWORs
Quantified (some
/every
) expressions
The argument to the not
function
The predicates of path expressions
In addition, the boolean
function can be used to explicitly convert a sequence to its effective boolean value. The effective boolean value of a sequence is false
if it is:
A single, atomic value of type xs:boolean
that is equal to false
A single, atomic value of type xs:string
that is a zero-length string (""
)
A single, atomic value with a numeric type that is equal to 0 or NaN
The effective boolean value cannot be determined on a sequence of more than one item whose first item is an atomic value, and on individual atomic values whose type is not numeric, untyped, xs:boolean
, or xs:string
. It is also not defined for function items, including maps and arrays. If the processor attempts to evaluate the effective boolean value in these cases, error FORG0006
is raised.
In all other cases, the effective boolean value is true
. This includes a sequence of one or more items whose first item is a node or a single atomic value other than those described in the preceding list. Table 11-1 shows some examples.
Example | Effective boolean value |
---|---|
()
|
false
|
false()
|
false
|
true()
|
true
|
""
|
false
|
"false"
|
true
|
"x"
|
true
|
0
|
false
|
xs:float("NaN")
|
false
|
(false() or false())
|
false
|
doc("prices.xml")/*
|
true
|
<a>false</a>
|
true
|
<a>{xs:boolean("false")}</a>
|
true
|
(false(), false(), false())
| Error FORG0006 |
1, 2, 3
| Error FORG0006 |
xs:date("2015-01-15")
| Error FORG0006 |
[true()]
| Error FORG0006 |
data( [true()] )
|
true
|
Note that a node that contains a false
atomic value is not the same thing as a false
atomic value by itself. In the <a>false</a>
example in Table 11-1, the effective boolean value is true
because a
is an element node, not an atomic value of type xs:boolean
. This is true even if the a
element is declared to be of type xs:boolean
.
When you call a function, sometimes the type of an argument differs from the type specified in the function signature. For example, you can pass an xs:integer
to a function that expects an xs:decimal
. Alternatively, you can pass an element that contains a string to a function that expects just a string. XQuery defines rules, known as function conversion rules, for converting arguments to the expected type. These function conversion rules apply only if the function expects an atomic value (or sequence of atomic values).
In fact, these function conversion rules use the various methods of type conversion and matching that are described in the preceding sections. They are put together here to show the sequential process that takes place for each argument when a function is called.
Atomization is performed on the argument sequence, resulting in a sequence of atomic values.
Casting of untyped values is performed. For example, the untyped value 12
can be cast to xs:integer
. As noted above, typed values are not cast to other types.
If the expected type is numeric or xs:string
, type promotion may be performed. This means that a value of type xs:decimal
can be promoted to xs:float
, and xs:float
can be promoted to xs:double
. A value of type xs:anyURI
can be promoted to xs:string
.
Note that these rules do not cover converting a value to the base type from which its type is derived. For example, if an xs:unsignedInt
value is passed to a function that expects an xs:integer
, the value is not converted to xs:integer
. However, subtype substitution does occur, and the function accepts this value.
The reverse is not true; you cannot pass an xs:integer
value to a function that expects an xs:unsignedInt
, even if the integer you pass meets all the tests for an xs:unsignedInt
. The value must be explicitly cast to xs:unsignedInt
.
As an example of the function conversion rules, if a function expects an argument of type xs:decimal?
, it accepts any of the following:
An atomic value of type xs:decimal
The empty sequence, because the occurrence indicator (?) allows for it
An atomic value of type prod:myDecimal
(derived from xs:decimal
) because the sequence type xs:decimal?
matches derived types as well
An atomic value of type xs:integer
(derived from xs:decimal
) because the sequence type xs:decimal?
matches derived types as well
An atomic value of type prod:myInteger
(derived from xs:integer
) because the sequence type xs:decimal?
matches derived types as well
An untyped atomic value, whose value is 12.5
, because it is cast to xs:decimal
(step 2)
An element of type xs:decimal
, because its value is extracted (step 1)
An untyped attribute, whose value is 12
, because its value is extracted (step 1) and cast to xs:decimal
(step 2)
An untyped element whose only content is 12.5
, because its value is extracted (step 1) and cast to xs:decimal
(step 2)
A function expecting xs:decimal*
accepts a sequence of any combination of the above items. On the other hand, a function expecting xs:decimal?
does not accept:
An atomic value of type xs:string
, even if its value is 12.5
. This value must be explicitly cast to xs:decimal
or type error XPTY0004
is raised.
An atomic value of type xs:float
, because type promotion only works in one direction.
An untyped element whose only content is abc
, because its value cannot be cast to xs:decimal
.
An untyped element with no content, because its value ""
(not the empty sequence) cannot be cast to xs:decimal
.
A typed element whose type allows element-only content even if it has no children, because step 1 raises an error.
A sequence of multiple xs:decimal
values; only one item is allowed.
A sequence type is used in a query to specify the expected type of a sequence of zero, one, or more items. When declaring functions, sequence types are used to specify the types of the parameters as well as the return value. For example, the function declaration:
declare function local:getProdNums ($catalog as element()) as xs:integer* {$catalog/product/xs:integer(number)};
uses two sequence types:
element()
, to specify that the $catalog
parameter must be one (and only one) element
xs:integer*
, to specify that the return type of the function is zero to many xs:integer
values
Sequence types are also used in many type-related expressions, such as the cast as
, treat as
, and instance of
expressions. The syntax of a sequence type is shown in Figure 11-2. The detailed syntax of some sequence types is diagrammed elsewhere, where the related component is described.
<
element-attribute-test
>
is shown in Figure 14-4
<
function-test
>
is shown in Figure 23-1
<
map-test
>
is shown in Figure 24-2
<
array-test
>
is shown in Figure 24-3
An occurrence indicator can be used at the end of a sequence type to indicate how many items can be in a sequence. The occurrence indicators are:
If no occurrence indicator is specified, it is assumed that the sequence can have one and only one item. For example, a sequence type of xs:integer
matches one and only one atomic value of type xs:integer
. A sequence type of xs:string*
matches a sequence that is either the empty sequence or contains one or more atomic values of type xs:string
. A sequence type of node()?
matches either the empty sequence or a single node.
Remember that there is no difference between an item and a sequence that contains only that item. If a function expects xs:string*
(a sequence of zero to many strings), it is perfectly acceptable to pass it a single string without attempting to enclose it in a sequence in any way.
The empty sequence, which is a sequence containing zero items, only matches sequence types that use the occurrence indicator ?
or *
, or empty-sequence()
.
Following are some generic sequence types:
item()
Matches any item (node, atomic value of any type, function, map, array)
node()
empty-sequence()
xs:anyAtomicType
Table 11-2 shows some examples of the generic sequence types.
Example | Meaning |
---|---|
node()*
| A sequence of one or more nodes, or the empty sequence |
item()?
| One item of any kind, or the empty sequence |
xs:anyAtomicType+
| A sequence of one or more atomic values (of any type) |
These generic sequence types are useful because it is not possible to specify, for example, “one or more xs:string
values or nodes.” In this case, you would instead need to specify a more generic sequence type, namely item()+
. They’re also useful when defining generic functions such as reverse
or count
.
The sequence type can also be the qualified name of specific built-in atomic type, such as xs:integer
, xs:double
, xs:date
, or xs:string
. This matches atomic values of that type or any type derived (directly or indirectly) from it. For example, the sequence type xs:integer
also matches an atomic value of type xs:unsignedInt
, because xs:unsignedInt
is indirectly derived by restriction from xs:integer
in the type hierarchy. The reverse is not true; the sequence type xs:unsignedInt
does not match an xs:integer
value; it must be explicitly cast.
These sequence types match atomic values only, not nodes that contain atomic values of the specified type. However, in function calls, nodes can be passed to functions expecting these kinds of atomic sequence types, because of atomization. An element that contains an integer would match element(*, xs:integer)
(described in the next section), but would also be acceptable as an argument being passed to a function expecting an xs:integer
, for example. Table 11-3 shows some examples.
Example | Meaning |
---|---|
xs:integer
| One atomic value of type xs:integer (or any type derived by restriction from xs:integer ) |
xs:integer?
| One atomic value of type xs:integer (or any type derived by restriction from xs:integer ), or the empty sequence |
prod:NameType*
| A sequence of one or more atomic values of type prod:NameType , or the empty sequence |
List type names cannot be used in sequence types, but it is possible to specify their item type with an occurrence indicator. For example, instead of trying to specify xs:NMTOKENS
, which is an illegal sequence type, you could specify xs:NMTOKEN*
. A union type name such as xs:numeric
can be used in a sequence type, as long as the union type has no list types among its members.
User-defined types such as prod:SizeType
can be used in sequence type expressions, but they must have been imported from a schema.
The sequence types element()
and attribute()
can be used to match any one element or attribute (respectively). An alternate syntax, with the same meaning, uses an asterisk, as in element(*)
and attribute(*)
.
It is also possible to test for a specific name. For example, the sequence type:
element(prod:product)
matches any element whose name is prod:product
.
When schemas are used, it is also possible to test elements and attributes based on their type annotations in addition to their names. This is described in “Sequence Types and Schemas”.
Sequence types can be used to test for other node kinds, using document-node()
, text()
, comment()
, and processing-instruction()
. These sequence types are discussed in Chapter 22.
Sequence types can be used to test for function items, including maps and arrays. These sequence types are discussed in “Functions and Sequence Types”, “Maps and Sequence Types”, and “Arrays and Sequence Types”, respectively.
Sequence type matching is the process of determining whether a sequence of zero or more items matches a specified sequence type, according to the rules specified in the preceding sections. Several kinds of expressions perform sequence type matching, such as the instance of
expression described in this section.
Additional static-typing-related expressions, described in Chapter 15, also use the rules for sequence type matching. The typeswitch expression uses sequence type matching to control which expressions are evaluated. Other expressions, namely FLWOR expressions and quantified expressions, allow a sequence type to be specified to test whether values bound to variables match a particular sequence type.
instance of
ExpressionTo determine whether a sequence of one or more items matches a particular sequence type, you can use an instance of
expression, whose syntax is shown in Figure 11-3.
The instance of
expression does not cast a value to the specified sequence type. It simply returns true
or false
, indicating whether the value matches that sequence type. Table 11-4 shows some examples of the instance of
expression.
Example | Return value |
---|---|
3 instance of xs:integer
|
true
|
3 instance of xs:decimal
| true , because xs:integer is derived by restriction from xs:decimal |
<x>{3}</x> instance of xs:integer
| false , because the element node x is untyped, even though it happens to contain an integer |
<x>{3}</x> instance of element()
|
true
|
<x>{3}</x> instance of node()
|
true
|
<x>{3}</x> instance of item()
|
true
|
(3, 4, 5) instance of xs:integer
|
false
|
(3, 4, 5) instance of xs:integer*
|
true
|
xs:float(3) instance of xs:double
|
false
|
Sequence type matching does not include numeric type promotion. For this reason, the last example in the table returns false
.
There are two mechanisms in XQuery for explicitly changing values from one type to another: constructors and casting.
Constructors are functions used to construct atomic values with given types. For example, the constructor xs:date("2015-05-03")
constructs an atomic value whose type is xs:date
. The signature of this xs:date
constructor function is:
xs:date($arg as xs:anyAtomicType?) as xs:date?
There is a constructor function for each of the built-in simple types (both primitive and derived). The qualified name of the constructor is the same as the qualified name of the type. For the built-in types, constructor names are prefixed with xs
to indicate that they are in the XML Schema namespace.
All the constructor functions have a similar signature, in that they accept an atomic value and return an atomic value of the appropriate type. Because function arguments are atomized, you can pass a node to a constructor function, and its typed value is extracted. If you pass an empty sequence to a constructor, the result will be the empty sequence.
Unlike most other functions, constructor functions will accept arguments of any type and attempt to cast them to the appropriate type. The argument value must have a type that can be cast to the new type; otherwise, type error XPTY0004
is raised. Values of almost all types can be cast to and from xs:string
and xs:untypedAtomic
. The specific rules for casting among types are described in “Casting Rules”.
In addition, the value must also be valid for the new type, or error FORG0001
is raised. For example, although the rules allow you to cast an xs:string
value to xs:date
, the expression xs:date("2015-13-02")
raises an error because the month 13
is invalid.
For list types, constructor functions only accept a single value, but may return multiple atomic values, each an instance of the item type. The value passed to the constructor function must be a string or untyped value that is a space-separated list of values. For example, the expression xs:NMTOKENS("a b c")
will return a sequence of three xs:NMTOKEN
values.
Constructors also exist for all named user-defined simple types that are in the in-scope schema definitions. If, in a schema, you have defined a type prod:SizeType
that is derived from xs:integer
by setting minInclusive
to 0 and maxInclusive
to 24, you can construct a value of this type using, for example:
prod:SizeType("10")
The qualified names must match, so the prefix prod
must be bound to the target namespace of the schema containing the SizeType
definition. If the type name is in no namespace (the schema in which it is defined has no target namespace), you cannot use a constructor (unless you change the default function namespace, which is not recommended). You must use a cast expression instead.
Casting is the process of changing a value from one type to another. The cast expression can be used to cast a value to another type. It has the same meaning as the constructor expression; it is simply a different syntax. The only difference is that it can be used with a type name that is in no namespace. For example:
$myNum cast as xs:integer
casts the value of $myNum
to the type xs:integer
. It is equivalent to xs:integer($myNum)
. The syntax of a cast expression is shown in Figure 11-4.
The cast expression consists of the expression to be cast, known as the input expression, followed by the keywords cast as
, followed by the qualified name of the target type. Only a named simple type can be specified: either a built-in type or a user-defined simple type whose definition is among the in-scope schema definitions.
The type name may optionally be followed by a question mark as an occurrence indicator. In this case, the cast expression evaluates to the empty sequence if the input expression evaluates to the empty sequence. If no question mark is used, the input expression cannot evaluate to the empty sequence, or type error XPTY0004
is raised. This is in contrast to constructors, which always allow the empty sequence.
You cannot use the other occurrence indicators +
and *
because you cannot cast a sequence of more than one item using a cast expression. If you attempt to do this, type error XPTY0004
is raised. To cast more than one value, you could place your cast expression as the last step of a path, as in:
doc("catalog.xml")//number/(. cast as xs:string)
The input expression can evaluate to a single atomic value, or a single node, in which case it is atomized to retrieve its typed value. As with constructors, the value must have a type that allows casting to the target type, and it must also be a valid value of the target type.
The castable expression is used to determine whether a value can be cast to another specified simple type. It is sometimes useful to determine this before the cast takes place to avoid dynamic errors, or to determine how the expression should be processed. For example:
if ($myNum castable as xs:integer) then $myNum cast as xs:integer else ()
evaluates to $myNum
cast to xs:integer
if that is valid, otherwise the empty sequence. If the castable expression had not been used to test this, and $myNum
was not castable as an xs:integer
, error XPTY0004
would have been raised. The syntax of a castable expression is shown in Figure 11-5.
The castable expression consists of an expression, followed by the keywords castable as
, followed by the qualified name of the target type. It evaluates to a Boolean value. As with the cast expression, you can use the question mark as an occurrence indicator. The castable expression determines not only whether the one type can be cast to the other type, but also whether that specific value is valid for that type.
This section describes the rules for casting atomic values between specific types. These rules are used in cast expressions and constructors. In this section, the source type refers to the type of the original value that is being cast, and the target type refers to the type to which the value is being cast.
Specific rules exist for casting between each combination of two primitive types. These rules are discussed, along with the types themselves, in Appendix B. The rules can be summarized as follows:
Values of any simple type can be cast to and from xs:string
and xs:untypedAtomic
if the value is valid for the target type. See the next two sections for more information.
A value of a numeric type can be cast to any other numeric type if the value is in the value space of the target type.
A value of a date, time, or duration type can sometimes be cast to another date, time, or duration type.
Other types (xs:boolean
, xs:QName
, xs:NOTATION
, xs:anyURI
, xs:hexBinary
, and xs:base64Binary
) have limited casting ability to and from types other than xs:string
and xs:untypedAtomic
. See Appendix B for more information on each type.
xs:string
or xs:untypedAtomic
A value of type xs:string
or xs:untypedAtomic
can be cast to any other primitive type. For example, xs:integer("12")
casts the xs:string
value 12
to xs:integer
. Of course, the string must represent a valid lexical representation of the target type. For example, xs:integer("12.1")
raises error FORG0001
because the lexical representation of xs:integer
does not allow fractional parts.
When a value is cast from xs:string
to another primitive type, whitespace is collapsed. Specifically, this means that every tab, carriage return, and line-feed character is replaced by a single space; consecutive spaces are collapsed into one space; and leading and trailing spaces are removed. Therefore, xs:integer(" 12 ")
is valid, even with the leading and trailing whitespace.
xs:string
or xs:untypedAtomic
An atomic value of any type can be cast to xs:string
or to xs:untypedAtomic
. Some types have special rules about how their values are cast to xs:string
. For example, integers have their leading zeros stripped. The rules (if any) for each type are described in Appendix B. Table 11-5 shows some examples of casting to xs:string
and xs:untypedAtomic
.
Example | Return value |
---|---|
xs:string("012")
|
"012"
|
xs:string(012)
|
"12"
|
xs:string(xs:float(12.3E2))
|
"1230"
|
xs:untypedAtomic(xs:float(12))
| 12 (of type xs:untypedAtomic ) |
xs:string(true())
|
"true"
|
Now that you have seen casting among the primitive types, let’s look at derived types. There are three different cases.
The first case is that the source type is derived by restriction from the target type. In this case, the cast always succeeds because the source type is a subset of the target type. For example, an xs:byte
value can always be cast to xs:integer
.
The second case is that the source type and the target type are derived by restriction from the same primitive type. In this case, the cast succeeds as long as the value is in the value space of the target type. For example, xs:unsignedInt("60")
can be cast to xs:byte
, but xs:unsignedInt("6000")
cannot, because 6000 is too large for xs:byte
. This case also applies when the target type is derived by restriction from the source type. For example, xs:integer("25")
can be cast to xs:unsignedInt
, which is derived from it.
The third case is that the source type and the target type are derived by restriction from different primitive types—for example, if you want to cast a value of xs:unsignedInt
to prod:myFloat
, which is derived by restriction from xs:float
. In this case, the casting process has three steps:
The value is cast to the primitive type from which it is derived, e.g., from xs:unsignedInt
to xs:decimal
.
The value is cast from that primitive type to the primitive type from which the target type is derived, e.g., from xs:decimal
to xs:float
.
The value is cast from that primitive type to the target type, e.g., from xs:float
to prod:myFloat
.