So far, we have learned the basics of mapping objects to Lucene indexes. We have seen how to handle relationships with associated entities and embedded objects. However, the searchable fields have mostly been simple string data.
In this chapter, we will look at how to effectively map other data types. We will explore the process by which Lucene analyzes entities for indexing, and the Solr components that can customize that process. We will see how to adjust the importance of each field, to make sorting by relevance more meaningful. Finally, we will conditionally determine whether or not to index an entity at all, based on its state at runtime.
The member variables in a Java class may be of an infinite number of custom types. It is usually possible to create custom types in your database as well. With Hibernate ORM, there are dozens of basic types from which more complex types can be constructed.
However, in a Lucene index, everything ultimately boils down to a string. When you map fields of any other data type for searching, the field is converted to a string representation. In Hibernate Search terminology, the code behind this conversion is called a bridge. Default bridges handle most common situations for you transparently, although you have the ability to write your own bridges for custom scenarios.
The most common mapping scenario is where a single Java property is tied to a single Lucene index field. The String
variables obviously don't require any conversion. With most other common data types, how they would be expressed as strings is fairly intuitive.
The Date
values are adjusted to GMT time, and then stored as a string with the format yyyyMMddHHmmssSSS
.
Although this all happens automatically, you do
have the option to explicitly annotate the field with @DateBridge
. You would do so when you don't want to index down to the exact millisecond. This annotation has one required element, resolution
, which lets you choose a level of granularity from YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, SECOND
, or MILLISECOND
(the normal default).
The downloadable chapter4
version of the VAPORware Marketplace application now adds a releaseDate
field to the App
entity. It is configured such that Lucene will only store the day, and not any particular time of day.
...
@Column
@Field
@DateBridge(resolution=Resolution.DAY)
private Date releaseDate;
...
By default, fields with null values are not indexed
regardless of their type. However, you can also customize this behavior. The @Field
annotation has an optional element, indexNullAs
, which controls the handling of null values for that mapped field.
...
@Column
@Field(indexNullAs=Field.DEFAULT_NULL_TOKEN)
private String description;
...
The default setting for this element is Field.DO_NOT_INDEX_NULL
, which causes null values to be omitted from Lucene indexing. However, when Field.DEFAULT_NULL_TOKEN
is used, Hibernate Search will index the field with a globally configured value.
The name for this value is hibernate.search.default_null_token
, and it is set within hibernate.cfg.xml
(for traditional Hibernate ORM) or persistence.xml
(for Hibernate configured as a JPA provider). If this value is not configured, then null fields will be indexed with the string "_null_"
.
You may use this mechanism to apply null-substitution on some fields, and keep the default behavior on other fields. However, the indexNullAs
element only works with that one substitute value, configured at the global level. If you want to use different null substitutes for different fields or in different scenarios, you must implement that logic through a custom bridge (discussed in the following subsection).
Sometimes you need more flexibility in converting a field to a string value. Rather than relying on the built-in bridge to handle it automatically, you can create your own custom bridge.
To map a single Java property to a single index
field, your bridge can implement one of two interfaces offered by Hibernate Search. The first of these, StringBridge
, is for a one-way translation between a property and a string value.
Let's say that our App
entity has a
currentDiscountPercentage
member variable, representing any promotional discount being offered for that app (for example, 25 percent off!). For easier math operations, this field is stored as a float (0.25f). However, if we ever wanted to make discounts searchable, we would want them indexed in a more human-readable percentage format (25).
To provide that mapping, we would start by creating a bridge class, implementing the StringBridge
interface. The bridge class must implement an objectToString
method, which expects to take our
currentDiscountPercentage
property as an input parameter:
import org.hibernate.search.bridge.StringBridge; /** Converts values from 0-1 into percentages (e.g. 0.25 -> 25) */ public class PercentageBridge implements StringBridge { public String objectToString(Object object) { try { floatfieldValue = ((Float) object).floatValue(); if(fieldValue< 0f || fieldValue> 1f) return "0"; int percentageValue = (int) (fieldValue * 100); return Integer.toString(percentageValue); } catch(Exception e) { // default to zero for null values or other problems return "0"; } } }
The objectToString
method converts the input as desired, and returns its String
representation. This will be the value indexed by Lucene.
To invoke this bridge class at index-time,
add a
@FieldBridge
annotation to the currentDiscountPercentage
property:
...
@Column
@Field
@FieldBridge(impl=PercentageBridge.class)
private float currentDiscountPercentage;
...
This entity field is a primitive float
, yet the bridge class is working with a Float
wrapper object. For flexibility, objectToString
takes a generic Object
parameter that must be cast to the appropriate type. However, thanks to autoboxing, primitives are converted into their object wrappers for us seamlessly.
The second interface for mapping single
variables to single fields, TwoWayStringBridge
, provides bidirectional translation between a value and its string representation.
You implement TwoWayStringBridge
in a manner similar to what we just saw with the regular StringBridge
interface. The only difference is that this bidirectional version also requires a stringToObject
method, for conversions going the other way:
...
public Object stringToObject(String stringValue) {
return Float.parseFloat(stringValue) / 100;
}
...
For even greater flexibility, it is possible to pass
configuration parameters to a bridge class. To do so, your bridge should implement the ParameterizedBridge
interface, in addition to StringBridge
or TwoWayStringBridge
. The class must then implement a setParameterValues
method for receiving the extra parameters.
For the sake of argument, let's say that we wanted our example bridge to be able to write percentages with a greater level of precision, rather than rounding to a whole number. We could pass it a parameter specifying the number of decimal places to use:
public class PercentageBridge implements StringBridge, ParameterizedBridge { public static final String DECIMAL_PLACES_PROPERTY = "decimal_places"; private int decimalPlaces = 2; // default public String objectToString(Object object) { String format = "%." + decimalPlaces + "g%n"; try { float fieldValue = ((Float) object).floatValue(); if(fieldValue< 0f || fieldValue> 1f) return "0"; return String.format(format, (fieldValue * 100f)); } catch(Exception e) { return String.format(format, "0"); } } public void setParameterValues(Map<String, String> parameters) { try { this.decimalPlaces = Integer.parseInt( parameters.get(DECIMAL_PLACES_PROPERTY) ); } catch(Exception e) {} } }
This version of our bridge class expects to receive a parameter named decimal_places
. Its value is stored in the decimalPlaces
member variable, and then used inside the objectToString
method. If no such parameter is passed, then a default of two decimal places will be used to build percentage strings.
The mechanism for actually passing one or
more parameters is the params
element of the @FieldBridge
annotation:
... @Column @Field @FieldBridge( impl=PercentageBridge.class, params=@Parameter( name=PercentageBridge.DECIMAL_PLACES_PROPERTY, value="4") ) private float currentDiscountPercentage; ...
The bridge types covered so far are the easiest and most straightforward way to map a Java property to a string index value. However, sometimes you need even greater flexibility, so there are a few field bridge variations supporting a free-form approach.
Occasionally, the desired relationship between a class property and Lucene index fields may not be one-to-one. For example, let's say that one property represents a filename. However, we would like the ability to search not only by filename, but also by file type (that is, the file extension). One approach is to parse the file extension from the filename property, and thereby use that one variable to create both fields.
The FieldBridge
interface allows us to do this. Implementations must provide a set
method, which in this example parses the file type from the file name field, and stores them separately:
import org.apache.lucene.document.Document; import org.hibernate.search.bridge.FieldBridge; import org.hibernate.search.bridge.LuceneOptions; public class FileBridge implements FieldBridge { public void set(String name, Object value, Document document, LuceneOptionsluceneOptions) { String file = ((String) value).toLowerCase(); String type = file.substring( file.indexOf(".") + 1 ).toLowerCase(); luceneOptions.addFieldToDocument(name+".file", file, document); luceneOptions.addFieldToDocument(name+".file_type", type, document); } }
The luceneOptions
parameter is a helper
object for interacting with Lucene, and document
represents the Lucene data structure to which we are adding fields. We use luceneOptions.addFieldToDocument()
to add fields to the index, without having to fully understand the Lucene API details.
The name
parameter passed to set
represents the name of the entity being indexed. Notice that we use this as a base when declaring the names of the two entities being added (that is, name+".file"
for the filename, and name+".file_type"
for the file type).
Finally, the value
parameter is the current field being mapped. Just as with the StringBridge
interface seen in the Bridges
section, the method signature here uses a generic Object
for flexibility. The value must be cast to its appropriate type.
To apply a FieldBridge
implementation, use the @FieldBridge
annotation just as we've already seen with the other custom bridge types:
...
@Column
@Field
@FieldBridge(impl=FileBridge.class)
private String file;
...
A custom bridge implementing the
FieldBridge
interface may also be used for the reverse purpose, to combine more than one property into a single index field. To gain this degree of flexibility, the bridge must be applied to the class level rather than the field level. When the FieldBridge
interface is used in this manner, it is known as a class bridge, and replaces the usual mapping mechanism for the entire entity class.
For example, consider an alternate approach we could have taken with the Device
entity in our VAPORware Marketplace application. Instead of indexing manufacturer
and name
as separate fields, we could have combined them into one fullName
field. The class bridge for this would still implement the
FieldBridge
interface, but it would concatenate the two properties into one index field as follows:
public class DeviceClassBridge implements FieldBridge { public void set(String name, Object value, Document document, LuceneOptionsluceneOptions) { Device device = (Device) value; String fullName = device.getManufacturer() + " " + device.getName(); luceneOptions.addFieldToDocument(name + ".name", fullName, document); } }
Rather than applying an annotation to any particular fields within the Device
class, we would instead apply a @ClassBridge
annotation at the class level. Notice that the field-level Hibernate Search annotations have been completely removed, as the class bridge will be responsible for mapping all index fields in this class.
Earlier we saw that the simple StringBridge
interface has a TwoWayStringBridge
counterpart, providing bidirectional mapping capability for document ID fields. Likewise, the FieldBridge
interface has a TwoWayFieldBridge
counterpart for the same reason. When you apply a field bridge interface to a property used by Lucene as an ID (that is, annotated with @Id
or @DocumentId
), then you must use the two-way variant.
The TwoWayStringBridge
interface requires the same
objectToString
method as StringBridge
, and the same set
method as FieldBridge
. However, this two-way version also requires a get
counterpart, for retrieving the string representation from Lucene and converting if the true type is different:
... public Object get(String name, Object value, Document document) { // return the full file name field... the file type field // is not needed when going back in the reverse direction return = document.get(name + ".file"); } public String objectToString(Object object) { // "file" is already a String, otherwise it would need conversion return object; } ...