Chapter 15. Static Typing

Errors in a query can be raised in either the static analysis phase or the dynamic evaluation phase. These two phases are roughly analogous to compiling and running program code. Certain XQuery implementations take a more aggressive approach to finding type-related errors in the static analysis phase. These implementations are said to support static typing.

What Is Static Typing?

Static typing, as the term is used in XQuery, refers to raising all possible type errors at analysis (compile) time rather than evaluation (run) time. This is sometimes referred to as pessimistic static typing, where the philosophy is to raise any errors that could possibly happen, not just those that it knows will happen. The static typing feature of XQuery is optional; implementations are not required to support static typing, and many do not fully support it.

The fact that a processor doesn’t support this feature doesn’t mean that it is doing no compile-time analysis. It might use the analysis transparently for optimization purposes, or it might raise some errors at compile time; but in this case, it will raise errors optimistically. It will only raise errors when it can see that there is definitely something wrong, like in the expression "x" + 3, and not simply in cases of ambiguity, as in $x + 3, where the value of $x depends on some input data.

Static typing has the advantages of allowing type errors to be caught earlier and more reliably, and can help some implementations optimize queries. However, as you will see in this chapter, it can also be an irritation to query authors in that it raises many “false” errors.

As part of the static typing process, the processor checks all expressions in a query and assigns them a static type, which is the supplied type of the expression. For example, the expression "abc" is assigned the static type of xs:string. The expression count(doc("catalog.xml")//product) is assigned the static type of xs:integer, which is the return type of the count function.

A type error is raised when a function or operator expects a value of a certain type and a parameter or operand of an incompatible type is used.

Obvious Static Type Errors

A number of obvious type errors can be caught in the analysis phase, for example:

  • Passing an integer to a function that expects a string, as in upper-case(2)

  • Attempting a cast between two types that do not allow casting, as in current-date() cast as xs:integer

  • Attempting to add two strings, as in "abc" + "def"

  • Passing a sequence of multiple values to a function or operation that expects a single atomic value, as in substring( ("a", "b"), 3)

All of these examples will raise type error XPTY0004, no matter what the input document contains. Many implementations that do not support the static typing feature will also raise these errors at analysis time, since they will always result in an error.

Static Typing and Schemas

Static typing also takes into account any in-scope schema definitions. A schema can make static typing much more useful by providing the processor with extra information about the input documents, namely:

Specific types

If the number element is declared to be of type xs:string, it is obviously an error to try to multiply it by 2.

Cardinalities

If a product can have more than one name child, you don’t want to use the expression product/name as an argument to the substring function, because the substring function only accepts a single string (or the empty sequence), not a sequence of multiple strings.

Allowed names

If you refer to an element produt (misspelled) in your query, it must be an error, because no produt element is declared in the schema.

Allowed paths

The path catalog/number contains an error because the schema does not allow number to be a child of catalog, even if both of those elements are declared in the schema.

None of the above errors could be caught during the static analysis phase if no schema were present. Sometimes this type of feedback can be extremely useful. For one thing, it can lead to queries that are more robust. You may not have envisioned a product with more than one name in your test data, but you would have found this error the hard way later when querying some new input data that happened to have that characteristic.

Static typing can also make query debugging and testing much easier. If you get an error message saying that catalog/number will always return the empty sequence (and therefore is not a valid path), it is much more useful than getting no results from your entire query and wondering why. It can relieve the burden on you to come up with test data that addresses every single possible combination of elements and values in an input document.

Raising “False” Errors

The downside of static typing is that sometimes the errors raised are less useful. Sometimes you know that an error situation would never arise in your input document, even if the schema might allow it. Suppose you want to substring the name of a single product, based on its product number. You might use the expression:

substring(doc("catalog.xml")//product[number = 557]/name, 1, 10)

However, if static typing is in effect, this expression causes a static error. This is because, as far as the processor knows, there could be more than one name element that matches that criterion, but the substring function’s signature requires that only zero or one item be provided as the first argument. You may know for sure that no two products will have the same product number (perhaps because you are familiar with the application that generates the XML documents), but the processor doesn’t know that.

This particular error can be avoided by calling the zero-or-one function, described in “The zero-or-one, one-or-more, and exactly-one Functions”.

Static Typing Expressions and Constructs

It is useful to have expressions and functions that you can use in your query to get around these false static errors. These constructs include treat and typeswitch expressions, type declarations, and the zero-or-one, one-or-more, and exactly-one functions.

The Typeswitch Expression

The typeswitch expression provides a convenient syntax for performing special processing based on the type of an expression. An example is shown in Example 15-1. This example uses user-defined schema types, which are explained in the previous chapter.

Example 15-1. A typeswitch expression
declare namespace prod = "http://datypic.com/prod";
typeswitch ($myProd)
  case element(*, prod:HatType) return xs:string($myProd/size)
  case element(*, prod:ShirtType)
       return xs:string(concat($myProd/size/@system, ": ",
                               $myProd/size))
  case element(*, prod:UmbrellaType) return "none"
  default return "n/a"

The example assumes that $myProd is bound to a product element. In the schema, assume that product elements are declared to have type prod:ProductType. However, in the input document, a product element may carry an xsi:type attribute that indicates another type that is derived by extension from prod:ProductType, such as prod:HatType, prod:ShirtType, or prod:UmbrellaType.

prod:ProductType itself does not allow a size child. Depending on which subtype it has, it may or may not have a size child. The typeswitch expression will return a different value depending on the type annotation of the product element.

The syntax of a typeswitch expression is shown in Figure 15-1. The typeswitch keyword is followed by an expression in parentheses (called the operand expression), which evaluates to the sequence whose type is to be evaluated. This is followed by one or more case clauses, plus a required default clause that indicates the value if none of the case clauses applies.

Figure 15-1. Syntax of a typeswitch expression

The processor uses sequence type matching (described in “Sequence Type Matching”) to determine whether a case clause applies. This means that if the type of the items in the sequence is the same as, or is derived from, the type identified by the case clause, it matches. If more than one case clause is applicable, the first one is used. Remember, with sequence type matching, it is not enough that the items are valid according to the specified type, they must actually have been validated and have that type (or a type derived from it) as their type annotation.

Starting in version 3.0, you can specify multiple sequence types, separated by vertical bars, in a single case clause. For example, case xs:float | xs:double could be used to choose that case if the expression matches either sequence type xs:float or xs:double.

Each of the case and default clauses can have an optional variable name before the return keyword. That variable is bound to the value of the operand expression. This is useful if the return clause is dependent on the sequence being tested. In Example 15-2, the $h and $s variables are bound to the $myProd value, and the return clause references the variable.

Example 15-2. Binding variables to typeswitch expressions
declare namespace prod = "http://datypic.com/prod";
typeswitch ($myProd)
  case $h as element(*, prod:HatType) return xs:string($h/size)
  case $s as element(*, prod:ShirtType)
       return xs:string(concat($s/size/@system, ": ", $s/size))
  case element(*, prod:UmbrellaType) return "none"
  default return "n/a"

The typeswitch expression can serve as shorthand for multiple if-then-else expressions that use instance of expressions to determine the type of the variable. Example 15-3 shows this alternative, which is similar to Example 15-2.

Example 15-3. Alternative to a typeswitch expression
declare namespace prod = "http://datypic.com/prod";
if ($myProd instance of element(*, prod:HatType))
then xs:string($myProd/size)
else if ($myProd instance of element(*, prod:ShirtType))
     then xs:string(concat($myProd/size/@system, ": ", $myProd/size))
     else if ($myProd instance of element(*, prod:UmbrellaType))
          then "none"
          else "n/a"

However, there is an important difference between the two: an implementation that supports static typing will raise a type error with Example 15-3. This is because, as far as the processor knows, $myProd is of type prod:ProductType, which does not allow a size child. Even though the processor is aware that there are subtypes that allow a size child, the static typing feature does not extend to parsing out the test expressions (after if) to determine that you would never evaluate the expression $myProd/size on anything that didn’t allow a size child. The typeswitch expression in Example 15-2, on the other hand, assures the processor that the branch that contains $h/size will only ever be evaluated for elements of type prod:HatType. Remember that static typing is pessimistic; it will give you an error if anything could go wrong, not just if it knows that things will go wrong.

The Treat Expression

The treat expression, like the typeswitch expression, is used to assure the processor that only values of a certain type will participate in a particular function or operation. The syntax of a treat expression is shown in Figure 15-2.

Figure 15-2. Syntax of a treat expression

Building on the prod:ProductType/prod:HatType example from the previous section, suppose you would like to display the size of a product, if it is a hat. Although prod:ProductType doesn’t allow a size child, prod:HatType does. You could use the query shown in Example 15-4.

Example 15-4. A query without a treat expression
declare namespace prod = "http://datypic.com/prod";
if ($myProd instance of element(*, prod:HatType))
then <p>The size is: {data($myProd/size)}</p>
else ()

It tests to see if the product is a hat, and if it is, constructs a p element that contains its size. Unfortunately, an implementation that supports static typing will raise a type error with this query. This is because, as far as the processor knows, $myProd has type prod:ProductType, which does not allow a size child. As discussed in the previous section, it does not matter that you check the type of $myProd in the enclosing test expression.

Example 15-5 shows a revised query that uses a treat expression to assure the processor that $myProd is indeed an element of type prod:HatType.

Example 15-5. A query with a treat expression
declare namespace prod = "http://datypic.com/prod";
if ($myProd instance of element(*, prod:HatType))
then
  <p>The size is: {data( ($myProd treat as element(*, prod:HatType))/size)}</p>
else ()

Unlike a cast expression or a type constructor, the treat expression does not actually change the type of $myProd. It doesn’t need to, because the type of $myProd should already be prod:HatType or some matching type. Like other static-typing-related expressions, it simply postpones any errors to runtime by saying, “I know that all the values are going to be valid prod:HatType values, so don’t raise an error during the analysis phase.”

If it turns out later during the evaluation phase that there is a $myProd value that does not match prod:HatType, the error is raised at that time. The rules of sequence type matching are used to determine whether the value matches. In this particular example, it will never raise this error because it checks the type of $myProd before evaluating the /size path.

If you’re familiar with casts in Java or C#, you’ll recognize that most casts in those languages are assertions (like treat as) rather than actual type conversions. XQuery uses cast as to mean a type conversion, and treat as to mean a type assertion.

Type Declarations

Some expressions, namely FLWORs, quantified expressions, switch expressions, and global variable declarations, allow the use of a type declaration to force the static type of an expression. A type declaration uses the keyword as, followed by a sequence type. The sequence types used in these expressions follow the syntax described in “Sequence Types”.

Type Declarations in FLWORs

Sequence types can be used in the for, let, window, and group by clauses of FLWORs to declare the type of variable being bound. In this case, the type declaration appears immediately after the variable name, as in Example 15-6.

Example 15-6. A FLWOR with a type declaration
declare namespace prod = "http://datypic.com/prod";
for $prod as element(*, prod:ProductType) in doc("catalog.xml")/catalog/*
order by $prod/name
return $prod/name

The sequence type element(*, prod:ProductType) is specified as the type of the variable $prod. Without the type declaration, this query might raise a type error when using a processor that implements static typing, if the schema allows the possibility of catalog having children that don’t themselves have name children. The type declaration serves as a way of telling the processor, “I know that all the children of catalog will be valid elements of type prod:ProductType, which all have name children, so don’t raise a static error. If it turns out I’m wrong, you can raise a dynamic error later.”

With the type declaration, this error checking is postponed to evaluation time. When the query is evaluated, if the value of $prod is not of type prod:ProductType, a type error is raised. Note that the purpose of the sequence type specification is not to filter out items that do not conform, but to raise type errors when non-conforming items are encountered.

Unlike type declarations in function signatures, no conversions take place. Untyped atomic data won’t be converted to the required type, numeric type promotion won’t happen—in fact, you won’t even get atomization. The only difference allowed between the actual type of the value and the declared type is subtype substitution: if the required type is xs:decimal, for example, the supplied value can be xs:integer, but it can’t be a node containing an xs:integer.

Type Declarations in Quantified Expressions

Quantified expressions also allow sequence types to be specified for variables, using a similar syntax, as in Example 15-7.

Example 15-7. A quantified expression with a type declaration
every $number as element(*, xs:integer) in
  doc("catalog.xml")//number satisfies ($number > 0)

In this case, the $number variable is given the sequence type element(*, xs:integer). If any of the items returned by the expression doc("catalog.xml")//number do not match that sequence type, a type error is raised. In order for a number element to match the sequence type, it would have to have been validated by a schema and annotated with the type xs:integer. It is not enough that it contains a value that can be cast to xs:integer.

Type Declarations in Global Variable Declarations

An optional type declaration can be specified in a global variable declaration, as in:

declare variable $prodCount as xs:decimal
  := count(doc("catalog.xml")//product/number);

which associates $prodCount with the static type xs:decimal. As with other type declarations, this type declaration does not change or cast the value in any way. It is simply used to reassure the processor that the value of $prodCount will always be a decimal number, so it can be used in arithmetic operations, for example. This is especially useful for external variables, since their static type cannot be determined any other way.

The zero-or-one, one-or-more, and exactly-one Functions

Three functions relate specifically to static typing: zero-or-one, one-or-more, and exactly-one. These functions are useful when static typing is in effect, to override apparent static type errors.

Each of the functions takes a single argument and either returns the argument as is or raises an error if the argument is a sequence containing the wrong number of items. For example, when calling the zero-or-one function, if the argument is a sequence of zero or one items, it is returned. If it is a sequence of more than one item, error FORG0003 is raised.

Earlier in this chapter, we saw how the expression:

number(doc("prices.xml")//prod[@num = 557]/price)

will cause a static error when static typing is in effect. This is because, as far as the processor knows, there could be more than one price element that matches that criterion, while the number function’s signature requires that only zero or one item be provided. A static error can be avoided by using the expression:

number (zero-or-one(doc("prices.xml")//prod[@num = 557]/price))

In this case, no static error is raised. Rather, a dynamic error is raised if more than one price element is returned by the path expression. This is useful if you know that there will only be one product with number 557 in the document, and wish to override the static error.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset