Chapter 5. Query

This chapter covers

  • Query overview
  • CMIS Query syntax
  • Advanced Query functions
  • Full-text search syntax

In the last chapter, we spent a lot of time describing types in order to prepare you for this chapter. As we mentioned at the beginning of chapter 4, without metadata you wouldn’t have an elegant method for narrowing your searches. Remember the example from the beginning of chapter 4, where we were searching for a specific photo of an elephant? Flexible query capabilities might not be a big deal when you’re shuffling through your filing cabinet at home, but wait until you’re searching on the scale of Enterprise Content Management systems, where you might be talking about billions of documents. At that scale, you’d better be packing some powerful tools for query, or have a lot of free time.

Luckily, CMIS defines a powerful and flexible way to describe searches, and it does this using a syntax that you’ve probably already been using for years—SQL. As you get deep into this chapter, you may start to feel a little dizzy, but don’t be discouraged. This chapter is hands-down the most difficult one in part 1, and one of the most difficult in the whole book. The concepts introduced here are equally powerful and complex. The chapter includes a lot of detail that you may not need at this moment, but we’ll cover the subject comprehensively. We packed this chapter with tons of examples so that later, when you need to know the syntax of something tricky, odds are you’ll be able to find something here to copy and paste to get you up and running.

Therefore, don’t worry about absorbing all of this in the first pass. The chapter is broken up into many small chunks so you’ll be able to find what you’re looking for later. But if you read it all the way through, we believe the path we’re taking you on is the best route for a clear understanding. We’ll start with the basics and finish up with the extensions CMIS has added to make certain ECM functions more natural when used as part of a SQL query.

5.1. Query: a familiar face on search

As you may have guessed by now, this chapter will teach you everything you need to know in order to produce an effective CMIS query. Or, stated a different way, you’ll understand how to use CMIS to filter out all of the other noise in order to find the data you’re looking for.

One of the stated goals of the CMIS specification was to take advantage of technologies and standards that were mature and accepted, wherever possible. We don’t want to reinvent the wheel. At the time the CMIS Technical Committee began work on this specification in 2008, SQL had already been around as a standard for decades. It was for this reason that the nearly universally known (at least among developers) SQL syntax was chosen as the way to describe these queries. This is likely one of the reasons that CMIS adoption has been so successful across the industry.

5.1.1. Prerequisite for this chapter: SQL basics

CMIS 1.0 and 1.1 Query is based on SQL-92 (ISO/IEC 9075). In order to avoid droning on about a subject that most readers of this book will consider basic knowledge, we’ll make one assumption: that you have a high-level understanding of SQL query syntax. Nothing advanced is required. As long as you can look at a simple SELECT statement without crossing your eyes, you’ll be OK.

If you’re saying to yourself “SELECT what?” you might want to take a few minutes to read a brief introduction to SQL. A quick internet search will turn up plenty of information, because we’re talking about a standard that’s been firmly established for nearly 30 years. Even the introduction to SQL in Wikipedia (http://en.wikipedia.org/wiki/SQL) will suffice to explain the key concepts.

5.1.2. Exercises in this chapter and the InMemory server

For most of the exercises in this chapter, we’ll continue to use the CMIS InMemory Repository package that you downloaded in chapter 1. You may remember from the previous chapter that quite a bit of sample metadata comes preinstalled with the InMemory server for audio files and other common document types, such as PDFs. We’ll base our queries around these types so you can run the same queries locally, rather than viewing only static examples.

5.2. Introduction to the CMIS Query language

For a quick review, let’s look at the components of a typical database. A relational database is composed of tables, columns, and rows. You can also envision the object type as a spreadsheet grid, with the vertical columns as the properties and the horizontal rows as the individual objects. Finally, the row headings are part of the schema. Figure 5.1 shows such a view.

Figure 5.1. Viewing a list of documents as a spreadsheet with columns as properties and rows as object instances

This table analogy maps easily to the CMIS data model, where object types have property definitions and the data is the instances of objects. By mapping a relational view on the CMIS data model, you can see why the CMIS specification has defined its Query language based on, and extended from, the SQL-92 grammar. It fits perfectly.

CMIS also has extended the Query grammar to make it easier to filter your query results based on multivalued properties, full-text search, and folder membership. Don’t worry about the details of these extensions for now. We’ll go into each one later in the chapter, with examples, and you’ll see how powerful these queries can be.

CMIS SQL is read-only

Only a subset of the SQL-92 grammar related to SELECT is included in the CMIS Query language. Specifically, you won’t be able to do data manipulation to modify the result set data directly.

5.2.1. Reviewing clauses of the SELECT statement

Because we’ll work with examples of all of these, the following list contains the four basic clauses of the SELECT statement. Think of this as a refresher and the start of an agenda for the next few sections.

  • SELECT—The properties that will be returned for each object in the result set; you can call them “virtual columns.”
  • FROM—The queryable object type; you can call it a “virtual table.”
  • WHERE—An optional clause to specify the conditions on the virtual columns.
  • ORDER BY—An optional clause to specify how the objects in the result set will be sorted based on the virtual columns.

Most developers are familiar with these clauses, and that’s the point. If you’re a developer, you’re already familiar with large parts of CMIS before you’ve even read the first page of the specification.

5.2.2. Checking Query capabilities on a service

In the previous chapter, you learned about object type definitions, including their attributes, property definitions, and the inheritance hierarchy. Some of this information is directly applicable to the repository’s ability to support querying on the object type. Before trying to construct a query, though, you’ll need to check two things: the level of the CMIS repository’s Query support, and whether or not the particular object type has been enabled for query.

You might remember that one of the repository’s data fields is capabilityQuery. As long as its value isn’t set to none, the repository supports metadata queries and/or text search. Once you know that Query is supported by your CMIS repository, you’ll need to know a few object type attributes in order to construct a CMIS Query. Here are the attributes you should be aware of:

  • queryable—This Boolean attribute must be true to be able to use the object type in a CMIS Query and have the objects from this type be returned as part of the Query result set. For example, the CMIS specification includes an object type called cmis:relationship, which is used to establish relationships or associations between objects. If you look at the type definition for cmis:relationship, you’ll see that it’s not queryable. Therefore, you can never have a query that says SELECT * FROM cmis:relationship.
  • includedInSuperTypeQuery—If this Boolean attribute is true, then the objects of this object type may be returned when you query against one of its ancestor object types. If this attribute is false, the objects in the object type may still be returned when its queryable attribute (see the previous item in this list) is true. For example, included among the sample object types in the InMemory server that accompanies this book is a type called cmis:lyrics. Its parent type is cmisbook:text, whose parent is cmis:document. Because includedInSuperTypeQuery is set to true for cmis:lyrics and cmisbook:text, queries that select from cmis:document may return instances of cmis:lyrics because cmis:document is a supertype of cmis:lyrics.
  • queryName—The queryName of an object type is equivalent to the table name used in the FROM clause to identify the object type. This is case sensitive. For example, an object type might have an ID of cmisbook:recordLabel, but its queryName might be cmisbook:label. When writing CMIS queries, you must always use the value of the type definition’s queryName, not its type ID, in the FROM clause.
Setting up sample data

If you haven’t done this already, now is a good time to add a few of the audio files into the InMemory server, so you can experiment with more varieties of queries. If you don’t, you can still work with the documents that already exist in the server, but the query results may not be as interesting without the diversity of property values to query on.

In the next section, we’ll look at these attributes in the CMIS Workbench.

5.2.3. Try it—checking the Query capabilities of a CMIS service

For this exercise, take a quick look at the repository info for the InMemory Repository (CMIS Workbench > Repository Info). Under Capabilities, you’ll see that Query is BOTHCOMBINED. That means you can create powerful queries with metadata queries and full-text searches together in one single SQL query statement.

Because you’re working in CMIS Workbench, you can take a look at the attributes of the object types. Go to the CMIS Workbench > Types. Click on Audio File (cmisbook:audio) in the left pane, and you can see its attributes in the upper-right pane, with queryable set to Yes (see figure 5.2). This means that the Audio File object type can be used in your CMIS SQL. Also note the queryName is cmisbook:audio, so that’s the “virtual table” name you’ll use in your SQL query. In the same upper-right pane, you can see that Included in Super Type Queries is set to Yes for object type Audio File. Recall that in chapter 4, you ran code to programmatically examine the attributes for each of the types in the hierarchy. These are the same type attributes you’re looking at now.

Figure 5.2. Examine the type attributes for cmis:audio using the CMIS Workbench Types view.

5.2.4. Try it—your first CMIS Query

Even though you’re starting to see how Query works, we’ll start with the simplest query possible. This will give you a taste of what to expect later as we fill in the blanks.

In this exercise, you’ll query on the base object type, cmis:document. Note that it isn’t a good idea to run this query on a large production-sized system with document objects in the millions, or more.

If you go to the CMIS Workbench and click on Query, you can run the default SQL that’s in the Query pane. Click the Query button, and you’ll see query results with all the document properties:

SELECT * FROM cmis:document

Figure 5.3 shows the output of this query.

Figure 5.3. Simple query results executed in CMIS Workbench

Take a minute to scroll right in the query output to see the object property values for the query results. You may have to widen the columns to see the column names and values. You can also change the order of the columns by dragging them to the right or left. Stop when you get to the cmis:objectTypeId column. You can see that the objects that have been returned are of many different object types, such as cmisbook:note and cmisbook:audio.

Even though you searched for cmis:document objects, because cmisbook:note is a subtype of cmis:document, and its includedInSuperTypeQuery attribute is true, objects of cmisbook:note are also returned. Had includedInSuperTypeQuery been false, the query wouldn’t have returned any cmisbook:note objects. As for cmisbook:audio, it’s a subtype of cmisbook:media, which is itself a sub-subtype of cmis:document, the object type in the SQL query.

Try the following queries to see that you can specify non-CMIS object types. First, try this (results shown in figure 5.4) :

Figure 5.4. Simple Query for cmisbook:note objects executed in CMIS Workbench

SELECT * FROM cmisbook:note

Next, try this one (results shown in figure 5.5):

Figure 5.5. Simple Query for cmisbook:audio objects executed in CMIS Workbench

SELECT * FROM cmisbook:audio

Now that you’re getting comfortable executing these queries from the graphical comfort of CMIS Workbench, let’s move into making queries programmatically.

5.2.5. Try it—running a query from code

We showed you how easy it is to execute a simple query from CMIS Workbench using the Query GUI. But how hard is this to do with OpenCMIS in code? It turns out to be as easy as you’d have hoped. In this example, you’ll run the same query you saw in figure 5.3, but run it in the Groovy console to give you a chance to compare and contrast. You’ll see that you’re still able to submit the query in much the same way in most cases. Listing 5.1 shows this same simple query, but it’s limited to five results to save space.

Listing 5.1. Generating a query with OpenCMIS code in the Groovy console

Figure 5.6 shows the output of this code in the bottom output pane of the Groovy console.

Figure 5.6. Output from the simple query example in listing 5.1

Now you can see the direct correlation between running a query string in the Query GUI and running a query from code. We’ll focus strictly on the query syntax for the rest of this chapter, but you’ll have plenty of opportunities to see query code in part 2 of the book.

Next we’ll dig a bit deeper into the queryable aspects of the properties themselves.

5.2.6. Checking query-related attributes for properties

Now that you’ve played with object types as “virtual tables” in the FROM clause, we can move on to the second set of information that you’ll need to check on—the object type properties and their definitions. The property definitions are involved in the other three clauses in the SELECT statement: SELECT, WHERE, and ORDER BY.

Before using a particular object type property as a virtual column in the query, you’ll need to check whether or not the property can be used in the query. Here are the relevant object type property definitions:

  • queryable—This Boolean attribute must be true to be able to use this property in the WHERE clause and have the values be returned. If this attribute is false, you can still specify the property in the SELECT clause to return the property values, but it can’t be in the WHERE clause.
  • queryName—The queryName of this property. You can think of it as the name of the virtual column from the spreadsheet example at the beginning of this chapter. The property can be directly defined or inherited by the object type in the FROM clause. You can also specify the CMIS properties that are defined in the root object types, such as cmis:name and cmis:creationDate. Note that the name is case sensitive.
  • orderable—This Boolean attribute must be true to be able to use this property in the ORDER BY clause. A common, sometimes required, DBMS practice is that the properties used in the ORDER BY clause must also be in the SELECT clause.

It’s easy to see these attributes from the Workbench. Go back to the Types pane, expand CMIS Document, and click on the Note (cmisbook:note) type. In the bottom-right pane are the object properties ordered alphabetically by their ID and their attribute definitions. Locate one of the predefined CMIS properties, such as cmis:name, that exists for all document object types and descendant types. Examine its attributes to make sure you can use cmis:name in your SQL query. Figure 5.7 shows the Types window displaying the cmis:name information for cmisbook:note.

Figure 5.7. Examining the queryable attribute for cmis:name

Scrolling toward the end of the properties, find the property names with the prefix of cmisbook:—these are the custom properties defined for Note. The other properties are inherited from cmis:document and exist for all documents in the repository. Locate the Archived property from the list, and then scroll to the right to see its property definition attributes. Familiarize yourself with the property and how you can use it in your queries.

That’s it for property attributes. Next we’ll look at the hierarchical relationships between the types and how that relates to the search scope.

5.2.7. Search scope

Now that you understand the basics of which object types and which properties can be queried, you’ll need to determine the scope of your queries. We briefly touched on scoping and inheritance when we described the object type’s includedInSuperTypeQuery attribute. You’ll also recall the type inheritance we discussed in chapter 4. This also applies to Query.

We can elaborate on this concept using the existing document object types in the InMemory server, as shown in figure 5.8.

Figure 5.8. Three Query scopes, A, B, and C, each with more properties

Querying on CMIS Document (A) will return matches from its own object type (A) and also all of its descendant object types, (B) and (C). Querying on Text Document (B) will return matches from Text Document (B) and Lyrics (C) object types.

5.3. Components of a query

Armed with the basics of object types and properties as tables and columns, and knowing when you can use them in a CMIS Query, you’re ready to take a look at the syntax of the supported SQL grammar. Because we’ll be talking in terms of SQL with its relationship database references, we’ll mix the jargon and refer to object types as tables and properties as columns. The CMIS data model does map nicely to the relational model, and it helps to think in terms of tables and columns.

For those of you brave enough to read the Backus-Naur Form (BNF) grammar for the CMIS SQL query syntax, we have it in appendix B for your reference in graphical form. You’ll also find the BNF grammar in section 2.14.2.1 of the CMIS 1.1 specification document as plain text. One look and you’ll agree—it’s not for the faint of heart. In the rest of this chapter, we’ll explain the syntax in more user-friendly terms, along with lots of examples. We hope that you’ll only need to refer to the BNF grammar for the more complex queries. Take your time to explore the query syntax by entering the SQL examples that follow into the CMIS Workbench Query editor.

BNF grammar

BNF (Backus-Naur Form) is a computer science term for a notation technique used to describe the syntax of various languages. It’s necessary for official language specifications like the OASIS CMIS specification, which must be precise in its definitions in order to avoid any misunderstandings among vendors.

5.3.1. The SELECT clause

The SELECT clause describes the virtual columns or properties that will be included in the result set. It can be a comma-separated list of one or more property queryNames, or * to return all single-valued properties. Some CMIS repositories may also return multivalued properties for the *, but it’s not a required implementation.

You can specify properties defined specifically for the object type and also the predefined CMIS properties from which the object type inherits.

Aliases can be defined by adding the string AS and the alias name to the property queryName. As you can see from the following example 4, you can alias the table name (L), and then reference the qualified property with another alias (myTitle). Using aliases makes it easier to refer to tables and properties in later parts of the query.

Here are the examples:

1.  SELECT * FROM cmis:folder

2.  SELECT D.* FROM cmis:document D

3.  SELECT cmisbook:author, cmisbook:songtitle, cmis:objectId FROM cmisbook:lyrics

4.  SELECT L.cmisbook:author, L.cmisbook:songtitle AS myTitle FROM cmisbook:lyrics L

Query result sets

Any time you submit a successful query, a set of zero to many objects (or rows) is returned. These objects only consist of properties that you’ve specified in your SELECT clause. For each of the properties, the name of the property will be the same as the queryName of the property definition. If an alias is used for the SELECT property, the alias will be the name in the result set.

For example, query 4 from the previous list will return cmisbook:author and myTitle as the names of the properties in the result rows, as shown in figure 5.9.

Figure 5.9. Query with aliases on the type and column

FROM clause, queryable, and joinCapability

At this point, you should be familiar with the FROM clause. The FROM clause describes the virtual table(s) or object type(s) against which you want to run your query. The object type must have its queryable attribute set to TRUE in order to use its queryName in the FROM clause. As in the SELECT clause, aliases can be defined for the object type by adding the string AS and the alias name to the table queryName.

If you want to query against data from multiple object types by specifying more than one object type in the FROM clause, you must first check that the joinCapability is supported on your CMIS repository. Not all CMIS repositories support the joinCapability. See section 5.3.4 on the JOIN clause for more details.

5.3.2. WHERE clause

The WHERE clause adds the constraints and conditions that objects must satisfy to be returned as a result for the query. As with the SELECT clause, you must specify the queryNames of the properties defined as queryable.

The CMIS query syntax supports the following restricted set of SQL-92 query predicates for single-valued queryable properties (see table 5.1). But you can’t use all of the predicates for all of the property data types. The rules are logical for each data type. For example, in table 5.1, you can see that Boolean properties can only use the equality (=) comparison test. It’s either equal to TRUE or equal to FALSE.

Table 5.1. Supported SQL-92 predicates, associated operators, and data types

Predicate

Operators

Data types

Comparison =, <>, <, <=, >, >= DateTime, Decimal, Integer
  =, <> ID, String, URI
  = Boolean
IN [NOT] IN DateTime, Decimal, ID, Integer, String, URI
LIKE [NOT] LIKE String, URI
NULL IS [NOT] NULL All data types
Predicates

In case this term is new to you, a predicate is another query condition that evaluates to TRUE or FALSE.

Instead of describing the format of the data type literals, we’ll show them through our cookbook-style example queries in the following sections. They follow the SQL convention as follows:

  • Numeric literals aren’t quoted.
  • Character literals are quoted.

The timestamp literal is a little different and we may need to reference the syntax now and then. What follows now are examples for all of the predicates listed in table 5.1. We’ll start out with the predicate syntax for single-valued properties, and then follow that with multivalued property predicates.

SQL examples in this chapter

It’s important to note that the many examples shown in this chapter are designed to give examples of syntactically correct queries. Many of them will return nonzero results when executed against the sample InMemory server, and others will not. All are valid, however. We encourage you to experiment and add additional objects to the repository to address specific queries that are of interest.

Comparison predicate

You’ve seen in table 5.1 the basic comparison operators that the CMIS Query language supports (=, <>, <, <=, >, >=). They’re the common SQL comparison operators that you can use on single-valued properties.

The following are some additional specifications about the operators with respect to the data types. (Again, the syntax here is for single-valued properties. Multivalued property comparisons will be discussed later in the chapter.)

  • Boolean comparisons are only equality tests, either equal to true or equal to false. The Boolean literal doesn’t need to be quoted and the case of the literals doesn’t matter (TRUE or true, FALSE or false).
  • String, ID, and URI comparisons are case sensitive and limited to equal or not equal. These literals will need to be enclosed in single quotes.
  • DateTime comparisons are chronological, and the granularity of the time portion of the timestamp may be repository-dependent, based on how the timestamp is represented in the database. A DateTime literal has this format: TIMESTAMP 'YYYY-MM-DDThh:mm:ss.SSSZ'. The SSS part of the timestamp is for fractions of a second. The Z stands for Zulu time, otherwise known as GMT. Instead of specifying a time in GMT, the time zone offset can be provided using this syntax: TIMESTAMP 'YYYY-MM-DDThh:mm:ss.SSS{+hh:mm | -hh:mm}'.
Try it—comparison predicate

Please try the following examples in your local CMIS Workbench for the six different types, or play around with your own variations:

  • Boolean: SELECT * from cmisbook:note where cmisbook:noteArchived = true
  • DateTime using GMT time or time zone offset: SELECT * FROM cmis:document WHERE cmis:lastModificationDate >TIMESTAMP '2012-07-27T16:23:02.390Z'SELECT * FROM cmis:document WHERE cmis:creationDate < TIMESTAMP '2013-07-27T16:23:02.390+07:00'
  • Decimal: SELECT cmis:name, cmisbook:videoDuration FROM cmisbook:videoWHERE cmisbook:videoDuration > 120.0
  • ID: SELECT cmis:name, cmis:objectId FROM cmis:folderWHERE cmis:objectId <> '100'
  • Integer: SELECT cmis:name, cmis:contentStreamLength FROM cmis:documentWHERE cmis:contentStreamLength >= 34000
  • String: SELECT cmis:name, cmis:objectId FROM cmis:documentWHERE cmis:name = 'welcome.txt'
IN predicate

The IN predicate is used to specify a set of values for a single-valued property, any of which can be matched, and the owning object is returned as a result.

If you’re familiar with SQL, you know that the IN predicate is different from BETWEEN, which is used to specify a range with a starting and an ending value. CMIS Query doesn’t support BETWEEN directly, but you can construct a query with similar results by using both the less than/equal to (<=) and the greater than/equal to (>=) comparison operators. For example, to return all objects created on a specific date based on the GMT, you can use two comparisons with two timestamps. Depending on the CMIS client, you may see the timestamp property values displayed in current time, and not GMT time:

SELECT * FROM cmis:document WHERE cmis:creationDate >=
     TIMESTAMP '2012-07-27T00:00:00.000Z'
     AND cmis:creationDate < TIMESTAMP '2012-07-28T00:00:00.000Z'
Some additional specifications about the IN operator and the data types

  • Boolean properties can’t be used.
  • String, ID, and URI literals are case sensitive.
  • The NOT operator can be used in conjunction with the IN predicate for a negative test.
Try it—IN predicate examples

Try the following examples in your local CMIS Workbench for these five different types, or play around with your own variations:

  • String: SELECT * FROM cmisbook:text where cmisbook:authorIN ('Jane Taylor', 'Geoffrey Chaucer')
  • DateTime: SELECT * FROM cmisbook:image WHERE cmis:creationDateNOT IN ( TIMESTAMP '2011-06-30T12:00:00.000Z',TIMESTAMP '2012-06-30T12:00:00.000+00:00')
  • ID: SELECT * FROM cmis:document WHERE cmis:objectId IN('130','131','132','133')
  • Integer: SELECT * FROM cmisbook:audio WHERE cmisbook:yearNOT IN (1988, 1990)
  • Decimal: SELECT * FROM cmisbook:video WHERE cmisbook:videoDurationNOT IN (0, 60.0, 120.0)
LIKE predicate

Using wildcards with a LIKE predicate, you can query for specific patterns in String and URI properties. The NOT operator can be used in conjunction with the LIKE predicate for a negative test.

The most commonly used wildcard is the percent symbol (%). In a LIKE predicate, % means zero or more occurrences of any character. Another wildcard is the underscore (_), which matches exactly one character.

Depending on how the String and URI properties are defined in the repository, their values may be padded with spaces, which means you may need to add a trailing wildcard for a match.

Wildcard queries are powerful, but they do incur performance costs, so you shouldn’t overuse the LIKE predicate. You should also try not to have a wildcard at the beginning of your pattern, and try to be as specific as you can.

Escaping rules

Escaping rules for your LIKE operations can be a bit tricky at times. Here are a few cookbook examples that may come in handy next time you’re crafting some tricky WHERE clauses.

To match a percent sign or underscore in a LIKE predicate, the escape character backslash () must precede the % or _. This example returns all of the lyrics for song titles that start with “Sacred_”:

SELECT cmisbook:songtitle FROM cmisbook:lyrics
WHERE cmisbook:songtitle LIKE 'Sacred\_%'

This example returns all of the lyrics in which the author name ends with “%Bleu”:

SELECT cmisbook:author FROM cmisbook:lyrics
WHERE cmisbook:author LIKE '%\%Bleu'

You can add new documents in the InMemory server with string properties that have % or _ in their values, such as in the cmisbook:songtitle or cmisbook:author properties, and try the LIKE queries out. Remember that LIKE queries are case sensitive, so your case must match to get results returned.

In case you’re wondering about matching quotation marks, you don’t need any escaping for double quotes. This example returns all of the lyrics for song titles that start with “My”:

SELECT cmisbook:songtitle FROM cmisbook:lyrics
WHERE cmisbook:songtitle LIKE '"My%'

You’ll need to add an escape character before single quotes. The CMIS specification states that the escape character can be either a backslash or the other common escape character for a single quote—another single quote. The following two examples should both work to return all of the lyrics for documents where the song titles start with “David’s”:

SELECT cmisbook:songtitle FROM cmisbook:lyrics
WHERE cmisbook:songtitle LIKE 'David''s%'

SELECT cmisbook:songtitle FROM cmisbook:lyrics
WHERE cmisbook:songtitle LIKE 'David's%'

Finally, if you want to match the backslash character itself, add another backslash. This example matches “backslash” in the song title:

SELECT cmisbook:songtitle FROM cmisbook:lyrics
WHERE cmisbook:songtitle LIKE 'back\slash%'
Try it—LIKE predicate examples

Try the following examples in your local CMIS Workbench, or play around with your own variations:

  • Percent symbol wildcard (%) example #1: SELECT * FROM cmisbook:media WHERE cmis:contentStreamMimeTypeLIKE 'audio%'
  • Percent symbol wildcard (%) example #2: SELECT * FROM cmis:document WHERE cmis:name LIKE '%Document%'
  • Underscore (_) wildcard: SELECT * FROM cmisbook:lyrics WHERE cmis:createdBy LIKE 'syste_'
  • NOT LIKE: SELECT * FROM cmisbook:note WHERE cmis:versionLabel NOT LIKE 'V 0._'
NULL predicate

The NULL predicate tests whether or not a property’s value has been set. CMIS doesn’t allow properties with a NULL value, so this predicate will only test whether or not the property has been set. The NOT operator can be used in conjunction with the NULL predicate for a negative test.

You can use this predicate for both single- and multivalued properties. Here are some examples of a NULL predicate on a single String property and a multi-DateTime property:

SELECT * FROM cmisbook:note WHERE cmis:checkinComment IS NULL
SELECT * FROM cmisbook:note WHERE cmisbook:noteReminders IS NOT NULL

You can also add a NULL condition in your query for inequality comparison. For example, if you want to see all Note documents that don’t have a link of resource.txt, you’ll probably also expect to see documents that didn’t set this property to any value. Try running the following two SQL queries, and you’ll see what we mean:

SELECT * FROM cmisbook:note WHERE cmis:versionLabel <> 'comment'
SELECT * FROM cmisbook:note WHERE cmis:versionLabel <> 'comment'
     OR cmis:versionLabel IS NULL

Or try testing a Boolean property for FALSE, or not set at all:

SELECT * from cmisbook:note where cmis:isVersionSeriesCheckedOut is null OR
     cmis:isVersionSeriesCheckedOut = FALSE

Some CMIS repositories may have already taken this into account, and you don’t need to add a NULL predicate in those cases, but it’s nice to know that this is how you can accomplish the same effect. Another useful reason to add a NULL condition is to test for an empty String property.

Some repositories allow you to store an empty String as a valid value, whereas others treat it as not set. You can accommodate both implementations with a query like the following:

SELECT * FROM cmis:document WHERE cmis:lastModifiedBy = ''
     OR cmis:lastModifiedBy IS NULL
Multivalue predicate

You may have noticed that the previous sections describe query syntax only for single-valued properties. What if you want to query on a multivalued property? Multivalued properties can have more than one value, and not all CMIS repositories support them. You can always check the cardinality of the object type property definition to find out whether a property is single-valued (single) or multivalued (multi). You saw earlier in the CMIS Workbench Types window how you can select an object type and see the property definition attributes in the lower-right pane.

The Query syntax is a bit more limited for multivalued properties. You can perform equality tests to find a specific value in any of the multiple values of the property. More complex queries for ranges and wildcard searches aren’t applicable here.

CMIS syntax extends the SQL-92 syntax to use the ANY quantifier for multivalue properties (see table 5.2). If you’re already familiar with SQL-92, you’ll recognize the syntax. We’ll discuss the quantified comparison predicate and the quantified IN predicate next. The syntax for the NULL predicate is the same for both single-valued and multivalued properties—please refer to the previous section on the NULL predicate.

Table 5.2. Supported SQL-92 multivalue predicates, associated operators, and data types

Predicate

Operator

Data types

Quantified comparison = ANY Multivalued properties of all data types
Quantified IN [NOT] IN Multivalued properties of all data types except Boolean
NULL IS [NOT] NULL Multivalued properties of all data types
Quantified comparison predicate

The following syntax for the quantified comparison predicate is only used for querying a multivalued property for any of its values matching a literal. In addition, you can only use the equality test (=). Unlike the single-valued property queries, the literal is on the left side of the equal sign, and ANY followed by the property queryName is on the right side of the equal sign. Here are two examples:

SELECT * FROM cmisbook:image WHERE -7 = ANY cmisbook:timeZoneOffset
SELECT * FROM cmisbook:pdf WHERE 'rome' = ANY cmisbook:pdfKeywords
Quantified IN predicate

If you want to compare a multivalued property with a list of values, you can use the quantified IN predicate in your query. This syntax is only used for querying a multivalued property for any of its values matching one of the literal values in the IN list. The SQL does exactly what it says: return the object as a match when any of the multivalued property values is among the specified values.

The same data types that support the IN predicate for single-valued properties are allowed for the multivalued properties—that is, this predicate doesn’t support Boolean multivalued properties. The NOT operator can be used in conjunction with the quantified IN predicate for a negative test, where none of the multivalued property values matched the list of literals. Here are two examples:

SELECT * FROM cmisbook:note WHERE ANY cmisbook:noteLinks
     IN ('http://www.apachecon.eu/','http://www.ibm.com')
SELECT * FROM cmisbook:officeDocument WHERE ANY cmisbook:keywords
     NOT IN ('rome', 'raven', 'cmis') OR cmisbook:keywords IS NULL
Logical operators (), AND, OR, and NOT

You’ve seen the logical operator NOT used for negating the condition that comes next (IN, LIKE, IS, NULL). In one of the examples, we also snuck in the use of the OR operator to expand the returned query result set if the row satisfied either of the two conditions. You can also use the AND logical operator to restrict the result set to rows that satisfy both of the two conditions.

The following example will return all cmisbook:note documents except the ones that have a cmisbook:noteLinks with the value of resource.txt or test.txt, including the ones that didn’t set a property value for cmisbook:noteLinks:

SELECT * FROM cmisbook:note WHERE ANY cmisbook:noteLinks
     NOT IN ( 'resource.txt' , 'test.txt') OR cmisbook:noteLinks IS NULL

The next example uses the AND operator to return all documents that are checked out by user abrown:

SELECT * FROM cmis:document WHERE cmis:isVersionSeriesCheckedOut = true AND
     cmis:versionSeriesCheckedOutBy = 'abrown'

If you want to have more than two conditions that mix the ANDs and ORs, you need to use parentheses to clarify the order in which these conditions are evaluated. CMIS query syntax doesn’t specify any implied order of precedence, although the standard order is parentheses first, then NOT, AND, and OR last. Because it’ll be up to the CMIS server implementation, it’s safest to use parentheses in your SQL to ensure that the conditions are evaluated in the order you’ve specified.

In the first of the following two examples, you might think you’re looking for notes that aren’t 287, but the archived document 287 will be returned because you’ve evaluated the AND operator first. In contrast, the second example won’t return document 287:

SELECT * FROM cmisbook:note where cmisbook:noteArchived = TRUE
     OR cmisbook:noteArchived = FALSE AND cmis:objectId <> '287'
SELECT * FROM cmisbook:note where (cmisbook:noteArchived = TRUE
     OR cmisbook:noteArchived = FALSE) AND cmis:objectId <> '287'

Also note that the objectId values may be different in your own InMemory Repository, so you may have to adjust the queries accordingly.

5.3.3. Ordering and limiting query results

With all the query results that are returned, you probably want to see them in some order that makes sense to you. This calls for adding an ORDER BY clause to your query. The ORDER BY clause comes at the end of the query, after the WHERE clause. It consists of tuples of sorting information—namely, what property you want to sort by, and how you want the results to be sorted, either in ascending or descending order. You can have more than one sorting property in the ORDER BY clause. The first tuple is the primary sort specification, the next tuple is the secondary, and so on.

The properties in the ORDER BY clause must have their attribute orderable set to TRUE, and they must also be specified in the SELECT clause. Some CMIS server implementations may be more lenient about these two requirements.

Because the orderable attribute for a property is supposed to apply to all queries, getChildren, and getCheckedOutDocs, the orderable attribute might be set to false if the CMIS implementation doesn’t support sorting on the property in getChildren (for example).

As for requiring sorting properties to be in the SELECT clause, some CMIS implementations may allow the sorting of CMIS properties and/or custom properties without returning their values in SELECT. But it’s a good practice to have the sorting property returned, as you’re probably interested in seeing the values of the property anyway.

You can order in ascending (ASC) order or in descending (DESC) order. The collation order is repository-specific, and the repository determines the ascending and descending rules. If the collation order isn’t specified, the repository will use the default sort order.

The CMIS Workbench shows the orderable attribute in the object type property definitions. You can also find out programmatically by requesting the type definition for an object type. Recall that we’ve done this using the Groovy console in chapter 4. Only single-valued properties of all data types can be orderable. It makes sense that multivalued properties aren’t orderable.

Here are two ORDER BY examples:

SELECT cmis:name, cmis:contentStreamLength FROM cmisbook:media
     ORDER BY cmis:contentStreamLength ASC
SELECT cmis:name, cmis:objectId FROM cmis:document
     ORDER BY cmis:name ASC, cmis:objectId DESC

5.3.4. Joins and determining repository support

A powerful query feature we’ve yet to cover is the capability to join object types based on a common property key value. Using relational database table jargon, we can say that a join allows you to combine and associate tables dynamically during a SELECT query, so that the rows from multiple tables can be treated as if from the same table, and a single set of query results can be returned.

This is the SQL JOIN feature, but not all CMIS repositories support JOIN queries. That’s why you have to check for the support in the repository’s capabilities list. You’ll even find a couple of levels of support within the list of those that support JOINs.

In our earlier exercises viewing InMemory capabilities, recall that the simple InMemory server doesn’t support JOINs (capabilityJoin = NONE). Therefore, you won’t be able to run any JOIN queries on the InMemory server. But we’ll continue to use the familiar object types, such as cmisbook:media and cmisbook:text and their properties in our JOIN examples.

If you do have access to a CMIS server that supports JOIN queries, it’s a good idea to learn more about this advanced topic of SQL JOINs in relational databases. Here, we’ll assume you have a basic knowledge of JOINs, and we’ll go through the CMIS-specific syntax, which is more limited in features than the variations allowed in SQL-92. Once you’re familiar with the JOIN syntax and the data model specific to your CMIS server, you can use the CMIS Workbench to create documents and objects that can be joined, and test your JOIN queries.

The descriptions in this section will use the relational database jargon, such as rows and tables, because it’s easier to visualize joining tables, as opposed to joining object types and objects.

The next three subsections will iterate through the three levels of repository JOIN support you’re likely to encounter. These three levels are called none, inneronly, and innerandouter.

capabilityJoin = none

The JOIN clause isn’t allowed in a query when a server has capabilityJoin set to none. If you try to run a JOIN query, the server will return an error.

capabilityJoin = inneronly

Only INNER JOINs are allowed in the query if capabilityJoin is set to inneronly. For INNER JOINs, only the rows that satisfy the JOIN condition are included in the results. You can abbreviate INNER JOIN to JOIN in the SQL. Here’s an example:

SELECT M.*, T.cmis:name textname FROM cmisbook:media
AS M JOIN cmisbook:text AS T ON M.cmis:createdBy = T.cmis:lastModifiedBy
capabilityJoin = innerandouter

Both INNER JOINs and LEFT OUTER JOINs are supported when capabilityJoin is set to innerandouter. For LEFT JOIN queries, all of the rows from the left table are returned, regardless of whether or not the JOIN condition (ON) is true. When a row has unmatched columns, these columns will still be included in the result set if they’re SELECTed but with a NULL value. You can abbreviate LEFT OUTER JOIN to LEFT JOIN in the SQL.

In the following example, we’ll change our previous INNER JOIN SQL to a LEFT OUTER JOIN. More results will be returned, and they’ll now include all the cmisbook: media documents in the system, even if their creator never modified a cmisbook: text file (ON condition). In those cases, the text name for the resulting row will be NULL. Look at this example:

SELECT M.*, T.cmis:name textname FROM cmisbook:media
AS M LEFT JOIN cmisbook:text AS T ON M.cmis:createdBy
    = T.cmis:lastModifiedBy
Multiple joins

As in relational database SQL, you can have more than one JOIN in your SELECT query to JOIN with more than one table. The syntax rule for nested JOINs follows the basic SQL rules. But parentheses are required around the JOIN-ON syntax (for example, table2 JOIN table3 ON t2.A = t3.B), as in this example:

SELECT M.cmis:name AS mName, M.cmis:objectId AS mID, T.cmis:createdBy
     AS creatorName, N.cmis:name AS noteName FROM (cmisbook:media AS M JOIN
     cmisbook:text AS T ON M.cmis:createdBy = T.cmis:lastModifiedBy) INNER
     JOIN cmisbook:note AS N ON N.cmis:createdBy = T.cmis:createdBy WHERE
     N.cmisbook:noteArchived = TRUE
Performance of joins

JOINs can be resource intensive and may degrade your system performance, which means you should always try to minimize the number of tables you JOIN, particularly in frequently run queries.

General join limitations in CMIS

Here are some more notes and limitations you should know for the CMIS JOIN syntax:

  • Only explicit JOINs are supported, using the JOIN ... ON syntax. Don’t use the implicit JOIN syntax, where you only specify multiple tables in the FROM clause; for example, SELECT * FROM Object1, Object2. The implicit JOIN syntax isn’t supported.
  • Only equijoin is supported, where the JOIN condition in the ON clause can only be an equality test between the object properties. The object properties can be of any data type. The object properties in the JOIN condition don’t have to have the same name, but the comparison operator must be the equal sign (=). Here’s an example: SELECT M.*, T.cmis:name textname FROM cmisbook:media AS M JOINcmisbook:text AS T ON M.cmis:createdBy = T.cmis:lastModifiedBy WHEREM.cmis:createdBy <> 'unknown'
  • The object properties used in the ON clause to JOIN the tables can only be single-valued properties. You can’t specify a multivalued property. It wouldn’t make sense anyway.
  • RIGHT JOIN and FULL JOIN aren’t supported.

That’s it for all of the portions of CMIS SQL that are part of the standard SQL-92. Up to this point, if you’re experienced with using SQL in general, this should all have felt familiar—we hope even natural. Now that we’ve finished covering the standard parts of CMIS SQL, all we have left are a few small parts that have been extended for ECM. Hang on, we’re almost finished.

5.4. CMIS SQL extension functions

As we mentioned earlier in the chapter, CMIS extends SQL-92 in a few ways that make sense for ECM systems. Specifically, these extensions are CONTAINS(), SCORE(), IN_FOLDER(), and IN_TREE(). This section will cover each of them with examples. We’ll start with CONTAINS() and full-text searching.

5.4.1. CONTAINS(): full-text search

One of the most powerful CMIS query features is the ability to search against the document content, sometimes called full-text search. The CONTAINS() function is used to express the text-search conditions for the query. You can search for words or phrases with wildcards for matches on substrings. It’s much more powerful than the = and LIKE predicates, which require exact patterns and are case sensitive.

About CMIS InMemory Repository and CONTAINS()

Although InMemory reports BOTHCOMBINED, its ability to do full-text search is greatly exaggerated. It’s more of a test/static implementation than the type of full-text search you’ll find in any enterprise-level content management system. If you want to exercise all of the stuff you’ll be learning in this section, it’ll be better to try the examples with a real server. Consult table 1.1 in chapter 1 for a list of available CMIS ECM servers. The same is true for the static nature of the InMemory’s SCORE() function, which we’ll cover shortly.

Repository-level full-text search capabilities

Full-text search capabilities require the CMIS repository to have a text-search engine to perform the indexing of the documents, and to search against the index. Not all CMIS repositories support full-text search. You should check the repository capability, capabilityQuery, to see what types of queries are supported.

At the beginning of this chapter, you checked the InMemory Repository information and determined that your server supports BOTHCOMBINED. Here are those capabilityQuery attribute values as they relate to full-text search:

  • capabilityQuery = none—The CONTAINS() function isn’t supported and can’t be used in the CMIS SQL queries for this repository.
  • capabilityQuery = metadataonly—The CONTAINS() function isn’t supported and can’t be used in the CMIS SQL queries for this repository.
  • capabilityQuery = fulltextonly—The CONTAINS() function is the only condition allowed in the WHERE clause. The CMIS queries are limited to full-text search of document contents. Example: SELECT cmis:name, cmis:objectId FROM cmis:document WHERE CONTAINS('document')
  • capabilityQuery = bothseparate—The repository supports full-text searching against the document content and querying against object properties, but they can’t be in the same SQL query. Somehow the CMIS client must manage the query results separately with separate SQL queries. Example: SELECT * FROM cmis:document WHERE CONTAINS('document') SELECT cmis:name, cmis:objectId FROM cmis:document WHERE cmis:name LIKE 'update%'
  • capabilityQuery=bothcombined—The repository supports full-text searching against the document content and querying against object properties, and they can be in the same SQL query, joined together with AND. Example: SELECT cmis:name, cmis:objectId FROM cmis:document WHERE CONTAINS('document') AND cmis:name LIKE 'update%'
Type-level full-text search support

Besides checking that the CMIS repository supports full-text search, you’ll need to know whether the particular object type that you want to search on has been defined to be text-searchable. This information has been set in the object type definition attribute, fulltextindexed. If the value of this Boolean attribute is TRUE, the document content is text-indexed and can be searched using the CONTAINS() function.

The CMIS Workbench shows the fulltextindexed attribute for all document object types, but you can also check this value programmatically using the techniques we covered in chapter 4.

Depending on the implementation and support of the repository, some repositories may also text-index the object properties along with the document content. This means that you can use the CONTAINS() function and the powerful text-search engine to search on the property values (mostly String properties).

The text-search grammar defined in the CMIS query is deliberately small and generic to account for the many text-search engines and their varying levels of search capabilities. You should find that the syntax is sufficient for the average user who’s accustomed to the Google keyword search.

About text-search engine implementations

Because the CMIS specification is meant to be generic, the text-search results returned from different repositories are dependent on the underlying text-search server, how it’s configured, and how the CMIS server has chosen to implement the CMIS text-search syntax. If you want to learn more about text search in relational databases, and how it uses linguistic processing to determine the matches, see the documentation for your particular search engine for the details.

CONTAINS() syntax

Because CONTAINS() is a function, we’ll start off with a normative description of its input and output, and then we’ll follow up with plenty of examples.

Here’s the syntax:

CONTAINS ( [ <qualifier> ,] ' <text search expression> ' )

In this statement, qualifier is an optional parameter for the name of the “virtual table” or object type’s queryName. Usually the table is implied from the FROM clause of the SQL. If the query is a JOIN, you must specify in which table the CONTAINS() function is to be applied.

The text-search expression is a character string enclosed in single quotes that specifies the text-search criteria. You enter words (or terms) in order to find documents that contain the words. You can also refine your searches with some additional options:

  • Phrases are denoted by enclosing words in double quotes.
  • Terms separated by whitespace are ANDed together. AND is implied, and it has a higher precedence than OR.
  • Terms separated by OR are ORed together. OR is a reserved word and shouldn’t be used as a search term.
Use of OR in searches

“Or” shouldn’t be used as a search term. But even if you think you want to search for the word “or,” you probably wouldn’t find it because text indexers often filter out common words to improve storage and performance.

  • Use the minus sign (-) as a modifier to exclude documents that contain the word. You can prefix a word or a phrase with the minus sign.
  • Terms can contain wildcards. The wildcard character * substitutes for zero or more characters. The wildcard character ? substitutes for exactly one character.
  • Use the backslash () as the escape character when you want to search for special characters, such as the following, in your text-search SQL:

    • Minus sign (-)
    • Asterisk sign (*)
    • Question mark (?)
    • Double quote (")
    • Single quote (')
    • Backslash ()

Now let’s look at some examples.

CONTAINS() examples

For the following examples, you can create a few documents using the Workbench, or update the content stream of existing documents with a file of your own (in the CMIS Workbench main window, click on a document, click on the Actions tab, and then specify your own local file to be used for Set Content Stream). Then experiment with the text-search syntax by adding modifiers and operators (see table 5.3) to your own terms in the CONTAINS() SQL query.

Table 5.3. CONTAINS() modifiers and operators

Modifiers and operators

Example

Query returns documents that contain the following

Implied AND CONTAINS('document folder') Both terms, “document” and “folder”
OR CONTAINS('document OR folder') Either “document” or “folder”
- CONTAINS('document –folder') “Document” but not “folder”
* CONTAINS('class*') Words matching the combinations of the wildcard pattern, such as “class” or classic”
* CONTAINS('c*ss') Words matching the combinations of the wildcard pattern, such as “class” and cross”
* CONTAINS('*lass') Words matching the combinations of the wildcard pattern, such as “lass” and class”
? CONTAINS('clas?') Words matching the combinations of the wildcard pattern, such as “class” and clasp”
? CONTAINS('temp?r') Words matching the combinations of the wildcard pattern, such as “temper”
? CONTAINS('?olor') Words matching the combinations of the wildcard pattern, such as “color” and dolor”
Double-quoted phrase CONTAINS(' "class hierarchy" ') ' The exact phrase, “class hierarchy”
CONTAINS() escaping

Escape characters are needed in a text-search string whenever you want to search on a particular character that has a special use in text search. For example, we talked about using the minus sign as an exclusion character. If you want to search for the minus sign, you’ll need to add the escape character (the backslash) before the minus sign.

Between CMIS specification versions 1.0 and 1.1, the list of characters that need escaping, and the requirements for constructing the text-search string with respect to escaping, have changed. Even in version 1.0–compliant servers, there may be implementation differences for the use of escape characters in text search because of the generality of the specifications.

Another consideration when searching for a special character that needs escaping is that depending on the configuration of the text-search server, these characters might be considered delimiters and might not be text-indexed at all, resulting in no match even if it’s properly escaped.

In version 1.0, only two characters need escaping in a text-search string: the single quote and the backslash. Any other occurrence of the backslash is an error. It’s left to the individual CMIS server implementations to interpret how to handle other special characters used in CONTAINS(), such as the minus sign.

In version 1.1, you’ll need to think of the entire CONTAINS() SQL as having two separate grammars: a query statement–level grammar, and a text-search expression–level grammar. The statement-level grammar will parse through SQL, identifying the SELECT, the FROM, the WHERE, and the CONTAINS() functions, and their parameters. At this level, the grammar knows about single-quoted character strings. Like the CMIS 1.0 specifications says, you’ll need escape characters for single quotes and backslashes at this level.

The second level is the text-search expression. Once the text-search expression is isolated, you’ll realize that this expression has some more special characters that will need to be escaped, including *, ?, -, plus the original and '.

Now it’s definitely time for some examples. Let’s look in table 5.4 at the list of special characters that need escaping, and their corresponding syntax according to CMIS 1.0 and CMIS 1.1. Some of the characters don’t have CMIS 1.0 examples because it depends on the CMIS server implementation.

Table 5.4. Table of CONTAINS() escape examples

Special character

Query result contains

CONTAINS() syntax

Single quote (') d'Aconia CMIS 1.0: CONTAINS('d'Aconia') CMIS 1.1: CONTAINS('d'Aconia')
Backslash () oot CMIS 1.0: CONTAINS('\root') CMIS 1.1: CONTAINS('\\root')
Asterisk (*) *atlas CMIS 1.1: CONTAINS('\*atlas')
Question mark (?) shrugged? CMIS 1.1: CONTAINS('shrugged\?')
Dash (-) value-for-value CMIS 1.1: CONTAINS('value\-for\-value')
Additional constraints on CONTAINS()

The CMIS specification doesn’t dictate whether the text search is case sensitive or not (whether searching for “TEST” and “test” will return different matches). Most of the text-search servers in the market aren’t case sensitive, but it will be up to the repository implementation.

The CONTAINS() function returns TRUE when the document object is considered relevant with respect to the text-search expression, and it returns FALSE when the object isn’t relevant.

The CONTAINS() function call can only be ANDed with the combined result of all the other conditions. Here’s an example:

SELECT * FROM cmis:document WHERE CONTAINS('documents') AND (cmis:createdBy = 'system' OR cmis:lastModifiedBy =
     'system')

In one SQL query statement, you can only have one CONTAINS() function call. One of the reasons for this is because of the syntax of the CMIS Score() function. Because the Score() function doesn’t take any parameters, it’s implicitly tied to one and only one CONTAINS() function in the same query. This is the perfect lead-in to our next topic, the Score() function.

5.4.2. Score()

The Score() function allows you to quantify how relevant your search result is in matching the criteria in the CONTAINS() text-search function.

This function returns a floating point relevance score between 0 and 1 to show how well the document satisfies the text-search portion of the query. How the score is calculated depends on the repository and text-search server. If the Score() for a particular document is 0, then it didn’t satisfy the CONTAINS() function’s criteria. In practice, you’ll never see any documents with a 0 text-search score in your query results.

The Score() function doesn’t take any parameters, and it returns a numeric representation of the relevance of all the documents that satisfy the CONTAINS() function in the query. There can only be one CONTAINS() function, which means there can be at most one Score() function call in a text-search query, too.

The CMIS specification has a limitation on how the Score() function can be used in a query. It can only be part of the SELECT clause. This doesn’t mean you can’t use the document score in other parts of the query, though, such as the ORDER BY clause. After all, this is probably the most common way to use the relevance score—to order the query results so that the most relevant results are returned first in the results. To do that, you need to define an alias for the Score() function, and use the alias in the ORDER BY clause.

Try this example query:

SELECT cmis:name, cmis:objectTypeId, SCORE() AS myscore
     FROM cmis:document WHERE CONTAINS('row') ORDER BY myscore DESC
Score() implementation in the CMIS InMemory Repository

You may notice that the scores all come back with the same number in your InMemory server. Again, this is repository-specific, and the ability of the CMIS repository to perform text searches doesn’t always mean that they’re also able to return a conversion of the back-end text-search engine’s relevance score into a range from 0 to 1. Because InMemory is only a test server, it returns a static value for all cases. Also note that not all text-search engines use the range of 0 to 1; some use 0 to 100, and others may use 0 to 1,000. It’s possible that the scores are therefore implemented to always return the full score for all of the matches in the query results. But be assured that the query results you receive will be sorted by their relevance, as you specified in the ORDER BY clause.

The default queryName for the Score() function is SEARCH_SCOPE, so if you don’t specify an alias, the scores will be returned under the alias name of SEARCH_SCOPE. Here’s an example:

SELECT SCORE() FROM cmis:document WHERE CONTAINS('row')

The CMIS specification doesn’t specifically prohibit the use of the alias of Score() in the WHERE clause.

5.4.3. Navigational functions

We’re getting to the end now—only two more functions left to discuss. Both of these are CMIS extensions, like CONTAINS() and Score(), but these deal with folder containment. They’re In_Folder() and In_Tree().

In_Folder()

In_Folder() is an interesting extension to the SQL syntax, tailored to the content management crowd. The In_Folder() function can be used to return all matches that reside in a specific folder. This is a powerful scoping feature for querying under a particular folder.

Here’s the syntax:

IN_FOLDER( [ <qualifier>, ] <folder id> )

The first input parameter, <qualifier>, is an optional parameter that indicates the virtual table to which the In_Folder() function should be applied. This is the queryName of the type of objects you want to return, which should be one of the object types in the FROM clause. The same table alias should be used here as when it’s specified in the FROM clause.

For example, if you want the first-level documents of the folder /texts, whose cmis:objectId is 118, the following three SQL examples will return the same results. (Again, please note that the objectId values may be different in your local InMemory Repository.)

SELECT * FROM cmis:document WHERE IN_FOLDER('118')
SELECT * FROM cmis:document WHERE IN_FOLDER(cmis:document,'118')
SELECT D.* FROM cmis:document AS D WHERE IN_FOLDER(D,'118')

The <qualifier> becomes a mandatory parameter when the query is a JOIN query and the SQL has more than one virtual table. For JOINs, you’ll get an error message if you don’t specify the table name in the In_Folder() function:

SELECT D.* FROM cmis:document AS D JOIN cmis:folder AS F ON D.cmis:createdBy
     = F.cmis:createdBy WHERE IN_FOLDER(D,'118')

In the previous example, IN_FOLDER(D, '118') will return cmis:document objects residing in folder 118. If you change the qualifier to the other table, IN_FOLDER(F, '118') will return cmis:folders in folder 118.

Note that In_Folder() isn’t only limited to returning base cmis:document or cmis:folder object types. You can restrict the object type to any object type in FROM, and type inheritance still applies. The following example SQL will return all cmisbook: text objects and their descendant object types, including cmisbook:lyrics and cmisbook:poem in the folder /texts.

SELECT cmis:name, cmis:objectTypeId FROM cmisbook:text WHERE IN_FOLDER('118')

The second parameter for In_Folder() is the <folder id>. This should be the cmis:objectId of a folder. Remember that this is an ID parameter, and not the folder name or the path name.

Even though this is a useful function extension and it has a simple syntax, as always, you should take care to construct a concise query. The In_Folder() function isn’t necessarily easy for a CMIS repository to implement, and it may be performance intensive—like our next CMIS extension, the In_Tree() predicate function.

In_Tree()

The In_Tree() function is even more powerful than its In_Folder() cousin. This function will return all descendant objects under the specified folder tree. For example, if the specified folder has three more levels of subfolders, In_Tree() will return matches from all three levels. Here’s the syntax:

IN_TREE( [ <qualifier>, ] <folder id> )

In_Tree() has the same syntax as In_Folder(), with two parameters. <qualifier> is the optional virtual table queryName or alias, and <folder id> is the cmis:objectId of the relative root folder that you want to scope your query to. You can go back to the In_Folder() section to review the details of the parameters and the examples.

Let’s compare the two functions In_Folder() and In_Tree() against the InMemory server. The first of the two following SQL statements calls In_Folder() to return all folders in the root folder, /, which has a cmis:objectId equal to 100. The second changes the function call to In_Tree(), and it returns all folders and subfolders under the same root folder:

SELECT cmis:path FROM cmis:folder WHERE IN_FOLDER('100') ORDER BY cmis:path
SELECT cmis:path FROM cmis:folder WHERE IN_TREE('100') ORDER BY cmis:path

One thing you might’ve noticed about In_Folder() and In_Tree() is that you can only return query results of the same object type, such as all folders or all documents. This means SQL can’t return both folders and documents. You’ll have to make separate SQL statements to get each object type.

This same limitation affects all CMIS queries, where the results are bound by the same object type and its descendant object types. It’s more obvious with these folder function calls because you’re used to browsing a directory structure, opening folders, and seeing all their content. For those simple scenarios, you should use the folder API calls, such as getChildren(), to get all the object type instances in the folder.

Again, we’ll repeat our warning about the performance implications of the In_Tree() function call. This is one of the more powerful query capabilities that all CMIS repositories must implement, and it could also require more database processing. Please be careful and monitor your use of the In_Folder() and In_Tree() function calls.

5.5. Summary

In this chapter, you were introduced to the key high-level concepts of SQL queries, and along the way you were shown how these ideas map to CMIS SQL concepts. We then dove down into all of the details of Query, from the main clauses that make up a CMIS query, to grinding through all of the predicates, to reviewing the extension functions that were added for CMIS. We also used the CMIS Workbench’s query panel to interactively execute queries and view their results, and we ran queries from the Groovy console.

If you understood the ideas presented in this chapter, you’re now competent in CMIS Query functionality. Congratulations! We know this chapter was no cakewalk. The concepts you’ve learned here, when added to the repository basics you learned in chapters 1 through 4, make you fully prepared to do some real work (and have some real fun) in part 2, where we’ll build a music server.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset