Errors in a query can be raised in either the static analysis phase or the dynamic evaluation phase. These two phases are roughly analogous to compiling and running program code. Certain XQuery implementations take a more aggressive approach to finding type-related errors in the static analysis phase. These implementations are said to support static typing.
Static typing, as the term is used in XQuery, refers to raising all possible type errors at analysis (compile) time rather than evaluation (run) time. This is sometimes referred to as pessimistic static typing, where the philosophy is to raise any errors that could possibly happen, not just those that it knows will happen. The static typing feature of XQuery is optional; implementations are not required to support static typing, and many do not fully support it.
The fact that a processor doesn’t support this feature doesn’t mean that it is doing no compile-time analysis. It might use the analysis transparently for optimization purposes, or it might raise some errors at compile time; but in this case, it will raise errors optimistically. It will only raise errors when it can see that there is definitely something wrong, like in the expression "x" + 3
, and not simply in cases of ambiguity, as in $x + 3
, where the value of $x
depends on some input data.
Static typing has the advantages of allowing type errors to be caught earlier and more reliably, and can help some implementations optimize queries. However, as you will see in this chapter, it can also be an irritation to query authors in that it raises many “false” errors.
As part of the static typing process, the processor checks all expressions in a query and assigns them a static type, which is the supplied type of the expression. For example, the expression "abc"
is assigned the static type of xs:string
. The expression count(doc("catalog.xml")//product)
is assigned the static type of xs:integer
, which is the return type of the count
function.
A type error is raised when a function or operator expects a value of a certain type and a parameter or operand of an incompatible type is used.
A number of obvious type errors can be caught in the analysis phase, for example:
Passing an integer to a function that expects a string, as in upper-case(2)
Attempting a cast between two types that do not allow casting, as in current-date() cast as xs:integer
Attempting to add two strings, as in "abc" + "def"
Passing a sequence of multiple values to a function or operation that expects a single atomic value, as in substring( ("a", "b"), 3)
All of these examples will raise type error XPTY0004
, no matter what the input document contains. Many implementations that do not support the static typing feature will also raise these errors at analysis time, since they will always result in an error.
Static typing also takes into account any in-scope schema definitions. A schema can make static typing much more useful by providing the processor with extra information about the input documents, namely:
If the number
element is declared to be of type xs:string
, it is obviously an error to try to multiply it by 2.
If a product
can have more than one name
child, you don’t want to use the expression product/name
as an argument to the substring
function, because the substring
function only accepts a single string (or the empty sequence), not a sequence of multiple strings.
If you refer to an element produt
(misspelled) in your query, it must be an error, because no produt
element is declared in the schema.
The path catalog/number
contains an error because the schema does not allow number
to be a child of catalog
, even if both of those elements are declared in the schema.
None of the above errors could be caught during the static analysis phase if no schema were present. Sometimes this type of feedback can be extremely useful. For one thing, it can lead to queries that are more robust. You may not have envisioned a product with more than one name in your test data, but you would have found this error the hard way later when querying some new input data that happened to have that characteristic.
Static typing can also make query debugging and testing much easier. If you get an error message saying that catalog/number
will always return the empty sequence (and therefore is not a valid path), it is much more useful than getting no results from your entire query and wondering why. It can relieve the burden on you to come up with test data that addresses every single possible combination of elements and values in an input document.
The downside of static typing is that sometimes the errors raised are less useful. Sometimes you know that an error situation would never arise in your input document, even if the schema might allow it. Suppose you want to substring the name of a single product, based on its product number. You might use the expression:
substring(doc("catalog.xml")//product[number = 557]/name, 1, 10)
However, if static typing is in effect, this expression causes a static error. This is because, as far as the processor knows, there could be more than one name
element that matches that criterion, but the substring
function’s signature requires that only zero or one item be provided as the first argument. You may know for sure that no two products will have the same product number (perhaps because you are familiar with the application that generates the XML documents), but the processor doesn’t know that.
This particular error can be avoided by calling the zero-or-one
function, described in “The zero-or-one
, one-or-more
, and exactly-one
Functions”.
The typeswitch expression provides a convenient syntax for performing special processing based on the type of an expression. An example is shown in Example 15-1. This example uses user-defined schema types, which are explained in the previous chapter.
declare
namespace
prod
=
"http://datypic.com/prod"
;
typeswitch
(
$
myProd
)
case
element
(
*
,
prod:HatType
)
return
xs:string
(
$
myProd
/
size
)
case
element
(
*
,
prod:ShirtType
)
return
xs:string
(
concat
(
$
myProd
/
size
/
@system
,
": "
,
$
myProd
/
size
))
case
element
(
*
,
prod:UmbrellaType
)
return
"none"
default
return
"n/a"
The example assumes that $myProd
is bound to a product
element. In the schema, assume that product
elements are declared to have type prod:ProductType
. However, in the input document, a product
element may carry an xsi:type
attribute that indicates another type that is derived by extension from prod:ProductType
, such as prod:HatType
, prod:ShirtType
, or prod:UmbrellaType
.
prod:ProductType
itself does not allow a size
child. Depending on which subtype it has, it may or may not have a size
child. The typeswitch expression will return a different value depending on the type annotation of the product
element.
The syntax of a typeswitch expression is shown in Figure 15-1. The typeswitch
keyword is followed by an expression in parentheses (called the operand expression), which evaluates to the sequence whose type is to be evaluated. This is followed by one or more case
clauses, plus a required default
clause that indicates the value if none of the case
clauses applies.
The processor uses sequence type matching (described in “Sequence Type Matching”) to determine whether a case
clause applies. This means that if the type of the items in the sequence is the same as, or is derived from, the type identified by the case
clause, it matches. If more than one case
clause is applicable, the first one is used. Remember, with sequence type matching, it is not enough that the items are valid according to the specified type, they must actually have been validated and have that type (or a type derived from it) as their type annotation.
Starting in version 3.0, you can specify multiple sequence types, separated by vertical bars, in a single case
clause. For example, case xs:float | xs:double
could be used to choose that case if the expression matches either sequence type xs:float
or xs:double
.
Each of the case
and default
clauses can have an optional variable name before the return
keyword. That variable is bound to the value of the operand expression. This is useful if the return
clause is dependent on the sequence being tested. In Example 15-2, the $h
and $s
variables are bound to the $myProd
value, and the return
clause references the variable.
declare
namespace
prod
=
"http://datypic.com/prod"
;
typeswitch
(
$
myProd
)
case
$
h
as
element
(
*
,
prod:HatType
)
return
xs:string
(
$
h
/
size
)
case
$
s
as
element
(
*
,
prod:ShirtType
)
return
xs:string
(
concat
(
$
s
/
size
/
@system
,
": "
,
$
s
/
size
))
case
element
(
*
,
prod:UmbrellaType
)
return
"none"
default
return
"n/a"
The typeswitch expression can serve as shorthand for multiple if-then-else expressions that use instance of
expressions to determine the type of the variable. Example 15-3 shows this alternative, which is similar to Example 15-2.
declare
namespace
prod
=
"http://datypic.com/prod"
;
if
(
$
myProd
instance
of
element
(
*
,
prod:HatType
))
then
xs:string
(
$
myProd
/
size
)
else
if
(
$
myProd
instance
of
element
(
*
,
prod:ShirtType
))
then
xs:string
(
concat
(
$
myProd
/
size
/
@system
,
": "
,
$
myProd
/
size
))
else
if
(
$
myProd
instance
of
element
(
*
,
prod:UmbrellaType
))
then
"none"
else
"n/a"
However, there is an important difference between the two: an implementation that supports static typing will raise a type error with Example 15-3. This is because, as far as the processor knows, $myProd
is of type prod:ProductType
, which does not allow a size
child. Even though the processor is aware that there are subtypes that allow a size
child, the static typing feature does not extend to parsing out the test expressions (after if
) to determine that you would never evaluate the expression $myProd/size
on anything that didn’t allow a size
child. The typeswitch expression in Example 15-2, on the other hand, assures the processor that the branch that contains $h/size
will only ever be evaluated for elements of type prod:HatType
. Remember that static typing is pessimistic; it will give you an error if anything could go wrong, not just if it knows that things will go wrong.
The treat expression, like the typeswitch expression, is used to assure the processor that only values of a certain type will participate in a particular function or operation. The syntax of a treat expression is shown in Figure 15-2.
Building on the prod:ProductType
/prod:HatType
example from the previous section, suppose you would like to display the size of a product, if it is a hat. Although prod:ProductType
doesn’t allow a size
child, prod:HatType
does. You could use the query shown in Example 15-4.
declare
namespace
prod
=
"http://datypic.com/prod"
;
if
(
$
myProd
instance
of
element
(
*
,
prod:HatType
))
then
<p>
The size is:
{
data
(
$
myProd
/
size
)}
</p>
else
()
It tests to see if the product is a hat, and if it is, constructs a p
element that contains its size. Unfortunately, an implementation that supports static typing will raise a type error with this query. This is because, as far as the processor knows, $myProd
has type prod:ProductType
, which does not allow a size
child. As discussed in the previous section, it does not matter that you check the type of $myProd
in the enclosing test expression.
Example 15-5 shows a revised query that uses a treat expression to assure the processor that $myProd
is indeed an element of type prod:HatType
.
declare
namespace
prod
=
"http://datypic.com/prod"
;
if
(
$
myProd
instance
of
element
(
*
,
prod:HatType
))
then
<p>
The size is:
{
data
(
(
$
myProd
treat
as
element
(
*
,
prod:HatType
))/
size
)}
</p>
else
()
Unlike a cast expression or a type constructor, the treat expression does not actually change the type of $myProd
. It doesn’t need to, because the type of $myProd
should already be prod:HatType
or some matching type. Like other static-typing-related expressions, it simply postpones any errors to runtime by saying, “I know that all the values are going to be valid prod:HatType
values, so don’t raise an error during the analysis phase.”
If it turns out later during the evaluation phase that there is a $myProd
value that does not match prod:HatType
, the error is raised at that time. The rules of sequence type matching are used to determine whether the value matches. In this particular example, it will never raise this error because it checks the type of $myProd
before evaluating the /size
path.
If you’re familiar with casts in Java or C#, you’ll recognize that most casts in those languages are assertions (like treat as
) rather than actual type conversions. XQuery uses cast as
to mean a type conversion, and treat as
to mean a type assertion.
Some expressions, namely FLWORs, quantified expressions, switch expressions, and global variable declarations, allow the use of a type declaration to force the static type of an expression. A type declaration uses the keyword as
, followed by a sequence type. The sequence types used in these expressions follow the syntax described in “Sequence Types”.
Sequence types can be used in the for
, let
, window
, and group by
clauses of FLWORs to declare the type of variable being bound. In this case, the type declaration appears immediately after the variable name, as in Example 15-6.
declare
namespace
prod
=
"http://datypic.com/prod"
;
for
$
prod
as
element
(
*
,
prod:ProductType
)
in
doc
(
"catalog.xml"
)/
catalog
/
*
order by
$
prod
/
name
return
$
prod
/
name
The sequence type element(*, prod:ProductType)
is specified as the type of the variable $prod
. Without the type declaration, this query might raise a type error when using a processor that implements static typing, if the schema allows the possibility of catalog
having children that don’t themselves have name
children. The type declaration serves as a way of telling the processor, “I know that all the children of catalog
will be valid elements of type prod:ProductType
, which all have name
children, so don’t raise a static error. If it turns out I’m wrong, you can raise a dynamic error later.”
With the type declaration, this error checking is postponed to evaluation time. When the query is evaluated, if the value of $prod
is not of type prod:ProductType
, a type error is raised. Note that the purpose of the sequence type specification is not to filter out items that do not conform, but to raise type errors when non-conforming items are encountered.
Unlike type declarations in function signatures, no conversions take place. Untyped atomic data won’t be converted to the required type, numeric type promotion won’t happen—in fact, you won’t even get atomization. The only difference allowed between the actual type of the value and the declared type is subtype substitution: if the required type is xs:decimal
, for example, the supplied value can be xs:integer
, but it can’t be a node containing an xs:integer
.
Quantified expressions also allow sequence types to be specified for variables, using a similar syntax, as in Example 15-7.
every
$
number
as
element
(
*
,
xs:integer
)
in
doc
(
"catalog.xml"
)//
number
satisfies
(
$
number
>
0
)
In this case, the $number
variable is given the sequence type element(*, xs:integer)
. If any of the items returned by the expression doc("catalog.xml")//number
do not match that sequence type, a type error is raised. In order for a number
element to match the sequence type, it would have to have been validated by a schema and annotated with the type xs:integer
. It is not enough that it contains a value that can be cast to xs:integer
.
An optional type declaration can be specified in a global variable declaration, as in:
declare variable $prodCount as xs:decimal
:= count(doc("catalog.xml")//product/number);
which associates $prodCount
with the static type xs:decimal
. As with other type declarations, this type declaration does not change or cast the value in any way. It is simply used to reassure the processor that the value of $prodCount
will always be a decimal number, so it can be used in arithmetic operations, for example. This is especially useful for external variables, since their static type cannot be determined any other way.
zero-or-one
, one-or-more
, and exactly-one
FunctionsThree functions relate specifically to static typing: zero-or-one
, one-or-more
, and exactly-one
. These functions are useful when static typing is in effect, to override apparent static type errors.
Each of the functions takes a single argument and either returns the argument as is or raises an error if the argument is a sequence containing the wrong number of items. For example, when calling the zero-or-one
function, if the argument is a sequence of zero or one items, it is returned. If it is a sequence of more than one item, error FORG0003
is raised.
Earlier in this chapter, we saw how the expression:
number(doc("prices.xml")//prod[@num = 557]/price)
will cause a static error when static typing is in effect. This is because, as far as the processor knows, there could be more than one price
element that matches that criterion, while the number
function’s signature requires that only zero or one item be provided. A static error can be avoided by using the expression:
number (zero-or-one(
doc("prices.xml")//prod[@num = 557]/price))
In this case, no static error is raised. Rather, a dynamic error is raised if more than one price
element is returned by the path expression. This is useful if you know that there will only be one product with number 557 in the document, and wish to override the static error.