The Hibernate Search DSL

Chapter 1, Your First Application, introduced the Hibernate Search DSL, which is the most straightforward approach for writing search queries. When using the DSL, method calls are chained together in such a way that the series resembles a programming language in its own right. If you have worked with criteria queries in Hibernate ORM, then this style will appear very familiar.

Whether you are using the traditional FullTextSession object or the JPA-style FullTextEntityManager object, each passes a Lucene query that was generated by the QueryBuilder class. This class is the starting point for the Hibernate Search DSL, and it offers several Lucene query types.

Keyword query

The most basic form of search, which we have glimpsed at already, is the keyword query. As the name suggests, this query type searches for one or more particular words.

The first step is to obtain a QueryBuilder object, configured for searching on a given entity:

...
QueryBuilderqueryBuilder =
   fullTextSession.getSearchFactory().buildQueryBuilder()
      .forEntity(App.class ).get();
...

From there, the following diagram describes the possible flows. Dotted gray arrows represent optional side paths:

Keyword query

Keyword query flow (dotted gray arrows represent optional paths)

In the actual Java code, the DSL for a keyword query would look similar to the following:

...
org.apache.lucene.search.Query luceneQuery =
   queryBuilder
   .keyword()
   .onFields("name", "description", "supportedDevices.name",
         "customerReviews.comments")
   .matching(searchString)
   .createQuery();
...

The onField method takes the name of a field that is indexed for the relevant entity. If the field is not included in that Lucene index, then the query will break. Associated or embedded object fields may also be searched, using the format "[container-field-name].[field-name]" format (for example, supportedDevices.name).

Optionally, one or more andField methods may be used to search multiple fields. Its parameter works in the exact same way as onField. Alternatively, you can declare multiple fields all in one step with onFields, as shown in the preceding code snippet.

The matching method takes the keyword(s) for which the query is to be searched. This value will generally be a string, although technically the parameter type is a generic object in case you use a field bridge (discussed in the next chapter). Assuming that you pass a string, it may be single keyword or a series of keywords separated by whitespace. By default, Hibernate Search will tokenize the string and search for each keyword individually.

Finally, the createQuery method terminates the DSL and returns a Lucene query object. That object may then be used by FullTextSession (or FullTextEntityManager) to create the final Hibernate Search FullTextQuery object:

...
FullTextQuery hibernateQuery =
   fullTextSession.createFullTextQuery(luceneQuery, App.class);
...

Fuzzy search

When we use a search engine today, we take for granted that it will be smart enough to fix our typos when we are "close enough" to the correct spelling. One way to add this intelligence to Hibernate Search is by making plain keyword queries fuzzy.

With a fuzzy search, keywords match against fields even when they are off by one or more characters. The query runs with a threshold value ranging from 0 to 1, where 0 means that everything matches, and 1 means that only exact matches are acceptable. The fuzziness of the query depends on how close to zero you set the threshold.

The DSL starts with the same keyword method and eventually resumes the keyword query flow with onField or onFields. However, in between are some new flow possibilities, shown as follows:

Fuzzy search

Fuzzy search flow (dotted gray arrows represent optional paths)

The fuzzy method simply makes a normal keyword query "fuzzy", with a default threshold value of 0.5 (for example, balanced between the two extremes). You can proceed from there with the regular keyword query flow, and that would be perfectly fine.

However, you have the option of calling withThreshold to specify a different fuzziness value. In this chapter, versions of the VAPORware Marketplace application add fuzziness to the keyword query, with a threshold value of 0.7. This is strict enough to avoid too many false positives, but fuzzy enough that a misspelled search for "rodio" will now match against the "Athena Internet Radio" app.

...
luceneQuery = queryBuilder
   .keyword()
   .fuzzy()
   .withThreshold(0.7f)
   .onFields("name", "description", "supportedDevices.name",
      "customerReviews.comments")
   .matching(searchString)
   .createQuery();
...

In addition to (or instead of) withThreshold, you may also use withPrefixLength to adjust the query fuzziness. This integer value is a number of characters at the beginning of each word that you want to exclude from the fuzziness calculation.

Wildcard search

The second variation on a keyword query doesn't involve any higher math algorithms. If you have ever used a pattern like *.java to list all files in a directory, then you already have the basic idea.

Adding the wildcard method causes a normal keyword query to treat a question mark (?) as a valid substitute for any single character. For example, the keyword 201? would match the field values 2010, 2011, 2012, and so on.

The asterisk (*) becomes a substitute for any sequence of zero or more characters. The keyword down* matches download, downtown, and so on.

The Hibernate Search DSL for a wildcard search is the same as that for a regular keyword query, only with the zero-parameter wildcard method added at the beginning.

Wildcard search

Wildcard search flow (dotted gray arrows represent optional paths)

Exact phrase query

When you type a string of keywords into a search engine, you expect to see results matching one or more of those keywords. Not all of the keywords might be present in each result, and they might not appear in the same order that you typed them.

However, it has become customary that when you place double quotes around a string, you expect the search results to contain that exact phrase.

The Hibernate Search DSL offers a phrase query flow for searches of this type.

Exact phrase query

Exact phrase query flow (dotted gray arrows represent optional paths)

The onField and andField methods behave in the same way as they do with keyword queries. The sentence method differs from matching only in that its input must be a String.

A primitive form of fuzziness is available to phrase queries, by using the optional withSlop clause. This method takes an integer parameter, representing the number of "extra" words that can be found within a phrase before it is no longer considered a match.

This chapter's version of the VAPORware Marketplace application now checks for double quotes around the user's search string. When the input is quoted, the application replaces the keyword query with a phrase query instead:

...
luceneQuery = queryBuilder
   .phrase()
   .onField("name")
   .andField("description")
   .andField("supportedDevices.name")
   .andField("customerReviews.comments")
   .sentence(searchStringWithQuotesRemoved)
   .createQuery();
...

Range query

Phrase queries and the various keyword search types, are all about matching fields to a search term. A range query is bit different, in that it looks for fields that are bounded by one or more search terms. In other words, is a field greater than or less than a given value, or in between two values?

Range query

Range query flow (dotted gray arrows represent optional paths)

When the preceding method is used, the queried field(s) must have values greater than or equal to the input parameter. That parameter is of the generic Object type for flexibility. Dates and numeric values are typically used, although strings are perfectly fine and will be compared based on an alphabetical order.

As you might guess, the next method is a counterpart in which values must be less than or equal to the input parameter. To declare that matches must fall in between two parameters, inclusively, you would use the from and to methods (they must be used together).

An excludeLimit clause may be applied to any of these clauses. It has the effect of making the range exclusive rather than inclusive. In other words, from(5).to(10).excludeLimit() matches a range of 5 <= x < 10. The modifier could have been placed on the from clause rather than the to, or on both of them.

In our VAPORware Marketplace application, we previously declined to annotate CustomerReview.stars for indexing. However, if we had annotated it with @Field, then we could search for all 4- and 5-star reviews with a query similar to the following:

...
luceneQuery = queryBuilder
   .range()
   .onField("customerReviews.stars")
   .above(3).excludeLimit()
   .createQuery();
...

Boolean (combination) queries

What if you have an advanced use case where a keyword, phrase, or range query is not enough by itself, but two or more of them together could meet your requirements? Hibernate Search allows you to mix queries in any combination with boolean logic:

Boolean (combination) queries

Boolean query flow (dotted gray arrows represent optional paths)

The bool method declares that this will be a combination query. It is followed by at least onemust or should clause, each of which takes a Lucene query object of one of the previously discussed varieties.

When a must clause is used, a field must match the nested query in order to match the overall query as a whole. Multiple must clauses may be applied, which operate in a logical-AND fashion. All of them must succeed or else there is no match.

The optional not method serves to logically negate a must clause. The effect is that the overall query will only match if that nested query doesn't.

The should clause roughly approximates a logical-OR operation. When a combination consists only of should clauses, a field need not match all of them. However, at least one must match in order for the query as a whole to match.

Note

You can combine must and should clauses. However, if you do so, then the should nested queries become completely optional. If the must clause succeeds, the overall query succeeds no matter what. If the must clause fails, the overall query fails no matter what. When the two clause types are used together, should clauses serve only to help rank the search results by relevance.

This example combines a keyword query and a range query to look for "xPhone" apps with 5-star customer reviews:

...
luceneQuery = queryBuilder
   .bool()
   .must(
      queryBuilder.keyword().onField("supportedDevices.name")
      .matching("xphone").createQuery()
   )
   .must(
      queryBuilder.range().onField("customerReviews.stars")
      .above(5).createQuery()
   )
   .createQuery();
...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset