Chapter 4. Advanced Mapping

So far, we have learned the basics of mapping objects to Lucene indexes. We have seen how to handle relationships with associated entities and embedded objects. However, the searchable fields have mostly been simple string data.

In this chapter, we will look at how to effectively map other data types. We will explore the process by which Lucene analyzes entities for indexing, and the Solr components that can customize that process. We will see how to adjust the importance of each field, to make sorting by relevance more meaningful. Finally, we will conditionally determine whether or not to index an entity at all, based on its state at runtime.

Bridges

The member variables in a Java class may be of an infinite number of custom types. It is usually possible to create custom types in your database as well. With Hibernate ORM, there are dozens of basic types from which more complex types can be constructed.

However, in a Lucene index, everything ultimately boils down to a string. When you map fields of any other data type for searching, the field is converted to a string representation. In Hibernate Search terminology, the code behind this conversion is called a bridge. Default bridges handle most common situations for you transparently, although you have the ability to write your own bridges for custom scenarios.

One-to-one custom conversion

The most common mapping scenario is where a single Java property is tied to a single Lucene index field. The String variables obviously don't require any conversion. With most other common data types, how they would be expressed as strings is fairly intuitive.

Mapping date fields

The Date values are adjusted to GMT time, and then stored as a string with the format yyyyMMddHHmmssSSS.

Although this all happens automatically, you do have the option to explicitly annotate the field with @DateBridge. You would do so when you don't want to index down to the exact millisecond. This annotation has one required element, resolution, which lets you choose a level of granularity from YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, or MILLISECOND (the normal default).

The downloadable chapter4 version of the VAPORware Marketplace application now adds a releaseDate field to the App entity. It is configured such that Lucene will only store the day, and not any particular time of day.

...
@Column
@Field
@DateBridge(resolution=Resolution.DAY)
private Date releaseDate;
...

Handling null values

By default, fields with null values are not indexed regardless of their type. However, you can also customize this behavior. The @Field annotation has an optional element, indexNullAs , which controls the handling of null values for that mapped field.

...
@Column
@Field(indexNullAs=Field.DEFAULT_NULL_TOKEN)
private String description;
...

The default setting for this element is Field.DO_NOT_INDEX_NULL, which causes null values to be omitted from Lucene indexing. However, when Field.DEFAULT_NULL_TOKEN is used, Hibernate Search will index the field with a globally configured value.

The name for this value is hibernate.search.default_null_token, and it is set within hibernate.cfg.xml (for traditional Hibernate ORM) or persistence.xml (for Hibernate configured as a JPA provider). If this value is not configured, then null fields will be indexed with the string "_null_".

Note

You may use this mechanism to apply null-substitution on some fields, and keep the default behavior on other fields. However, the indexNullAs element only works with that one substitute value, configured at the global level. If you want to use different null substitutes for different fields or in different scenarios, you must implement that logic through a custom bridge (discussed in the following subsection).

Custom string conversion

Sometimes you need more flexibility in converting a field to a string value. Rather than relying on the built-in bridge to handle it automatically, you can create your own custom bridge.

StringBridge

To map a single Java property to a single index field, your bridge can implement one of two interfaces offered by Hibernate Search. The first of these, StringBridge, is for a one-way translation between a property and a string value.

Let's say that our App entity has a currentDiscountPercentage member variable, representing any promotional discount being offered for that app (for example, 25 percent off!). For easier math operations, this field is stored as a float (0.25f). However, if we ever wanted to make discounts searchable, we would want them indexed in a more human-readable percentage format (25).

To provide that mapping, we would start by creating a bridge class, implementing the StringBridge interface. The bridge class must implement an objectToString method, which expects to take our currentDiscountPercentage property as an input parameter:

import org.hibernate.search.bridge.StringBridge;

/** Converts values from 0-1 into percentages (e.g. 0.25 -> 25) */
public class PercentageBridge implements StringBridge {
   public String objectToString(Object object) {
      try {
         floatfieldValue = ((Float) object).floatValue();
         if(fieldValue< 0f || fieldValue> 1f) return "0";
         int percentageValue = (int) (fieldValue * 100);
         return Integer.toString(percentageValue);
      } catch(Exception e) {
         // default to zero for null values or other problems
         return "0";
      }
   }

}

The objectToString method converts the input as desired, and returns its String representation. This will be the value indexed by Lucene.

Note

Notice that this method returns a hardcoded "0" when given a null value, or when it encounters any other sort of problem. Custom null-handling is another possible reason for creating a field bridge.

To invoke this bridge class at index-time, add a @FieldBridge annotation to the currentDiscountPercentage property:

...
@Column
@Field
@FieldBridge(impl=PercentageBridge.class)
private float currentDiscountPercentage;
...

Note

This entity field is a primitive float, yet the bridge class is working with a Float wrapper object. For flexibility, objectToString takes a generic Object parameter that must be cast to the appropriate type. However, thanks to autoboxing, primitives are converted into their object wrappers for us seamlessly.

TwoWayStringBridge

The second interface for mapping single variables to single fields, TwoWayStringBridge, provides bidirectional translation between a value and its string representation.

You implement TwoWayStringBridge in a manner similar to what we just saw with the regular StringBridge interface. The only difference is that this bidirectional version also requires a stringToObject method, for conversions going the other way:

...
public Object stringToObject(String stringValue) {
   return Float.parseFloat(stringValue) / 100;
}
...

Tip

A bidirectional bridge is only necessary when the field will be an ID field within a Lucene index (that is, annotated with @Id or @DocumentId).

ParameterizedBridge

For even greater flexibility, it is possible to pass configuration parameters to a bridge class. To do so, your bridge should implement the ParameterizedBridge interface, in addition to StringBridge or TwoWayStringBridge. The class must then implement a setParameterValues method for receiving the extra parameters.

For the sake of argument, let's say that we wanted our example bridge to be able to write percentages with a greater level of precision, rather than rounding to a whole number. We could pass it a parameter specifying the number of decimal places to use:

public class PercentageBridge implements StringBridge,
      ParameterizedBridge {

   public static final String DECIMAL_PLACES_PROPERTY =
         "decimal_places";
   private int decimalPlaces = 2;  // default

   public String objectToString(Object object) {
      String format = "%." + decimalPlaces + "g%n";
      try {
         float fieldValue = ((Float) object).floatValue();
         if(fieldValue< 0f || fieldValue> 1f) return "0";
         return String.format(format, (fieldValue * 100f));
      } catch(Exception e) {
         return String.format(format, "0");
      }
   }
   public void setParameterValues(Map<String, String> parameters) {
      try {
         this.decimalPlaces = Integer.parseInt(
            parameters.get(DECIMAL_PLACES_PROPERTY) );
      } catch(Exception e) {}
   }

}

This version of our bridge class expects to receive a parameter named decimal_places. Its value is stored in the decimalPlaces member variable, and then used inside the objectToString method. If no such parameter is passed, then a default of two decimal places will be used to build percentage strings.

The mechanism for actually passing one or more parameters is the params element of the @FieldBridge annotation:

...
@Column
@Field
@FieldBridge(
   impl=PercentageBridge.class,
   params=@Parameter(
      name=PercentageBridge.DECIMAL_PLACES_PROPERTY, value="4")
)
private float currentDiscountPercentage;
...

Note

Be aware that all implementations of StringBridge or TwoWayStringBridge must be thread-safe. Generally, you should avoid any shared resources, and only take additional information through the ParameterizedBridge parameters.

More complex mappings with FieldBridge

The bridge types covered so far are the easiest and most straightforward way to map a Java property to a string index value. However, sometimes you need even greater flexibility, so there are a few field bridge variations supporting a free-form approach.

Splitting a single variable into multiple fields

Occasionally, the desired relationship between a class property and Lucene index fields may not be one-to-one. For example, let's say that one property represents a filename. However, we would like the ability to search not only by filename, but also by file type (that is, the file extension). One approach is to parse the file extension from the filename property, and thereby use that one variable to create both fields.

The FieldBridge interface allows us to do this. Implementations must provide a set method, which in this example parses the file type from the file name field, and stores them separately:

import org.apache.lucene.document.Document;
import org.hibernate.search.bridge.FieldBridge;
import org.hibernate.search.bridge.LuceneOptions;

public class FileBridge implements FieldBridge {

   public void set(String name, Object value, 
         Document document, LuceneOptionsluceneOptions) {
      String file = ((String) value).toLowerCase();
      String type = file.substring(
      file.indexOf(".") + 1 ).toLowerCase();
      luceneOptions.addFieldToDocument(name+".file", file, document);
      luceneOptions.addFieldToDocument(name+".file_type", type, 
         document);
   }

}

The luceneOptions parameter is a helper object for interacting with Lucene, and document represents the Lucene data structure to which we are adding fields. We use luceneOptions.addFieldToDocument() to add fields to the index, without having to fully understand the Lucene API details.

The name parameter passed to set represents the name of the entity being indexed. Notice that we use this as a base when declaring the names of the two entities being added (that is, name+".file" for the filename, and name+".file_type" for the file type).

Finally, the value parameter is the current field being mapped. Just as with the StringBridge interface seen in the Bridges section, the method signature here uses a generic Object for flexibility. The value must be cast to its appropriate type.

To apply a FieldBridge implementation, use the @FieldBridge annotation just as we've already seen with the other custom bridge types:

...
@Column
@Field
@FieldBridge(impl=FileBridge.class)
private String file;
...

Combining multiple properties into a single field

A custom bridge implementing the FieldBridge interface may also be used for the reverse purpose, to combine more than one property into a single index field. To gain this degree of flexibility, the bridge must be applied to the class level rather than the field level. When the FieldBridge interface is used in this manner, it is known as a class bridge, and replaces the usual mapping mechanism for the entire entity class.

For example, consider an alternate approach we could have taken with the Device entity in our VAPORware Marketplace application. Instead of indexing manufacturer and name as separate fields, we could have combined them into one fullName field. The class bridge for this would still implement the FieldBridge interface, but it would concatenate the two properties into one index field as follows:

public class DeviceClassBridge implements FieldBridge {

   public void set(String name, Object value, 
         Document document, LuceneOptionsluceneOptions) {
      Device device = (Device) value;
      String fullName = device.getManufacturer()
         + " " + device.getName();
      luceneOptions.addFieldToDocument(name + ".name", 
      fullName, document);
   }

}

Rather than applying an annotation to any particular fields within the Device class, we would instead apply a @ClassBridge annotation at the class level. Notice that the field-level Hibernate Search annotations have been completely removed, as the class bridge will be responsible for mapping all index fields in this class.

@Entity
@Indexed
@ClassBridge(impl=DeviceClassBridge.class)
public class Device {

   @Id
   @GeneratedValue
   private Long id;

   @Column
   private String manufacturer;

   @Column
   private String name;

   // constructors, getters and setters...
}

TwoWayFieldBridge

Earlier we saw that the simple StringBridge interface has a TwoWayStringBridge counterpart, providing bidirectional mapping capability for document ID fields. Likewise, the FieldBridge interface has a TwoWayFieldBridge counterpart for the same reason. When you apply a field bridge interface to a property used by Lucene as an ID (that is, annotated with @Id or @DocumentId), then you must use the two-way variant.

The TwoWayStringBridge interface requires the same objectToString method as StringBridge, and the same set method as FieldBridge. However, this two-way version also requires a get counterpart, for retrieving the string representation from Lucene and converting if the true type is different:

...
public Object get(String name, Object value, Document document) {
   // return the full file name field... the file type field
   // is not needed when going back in the reverse direction
   return = document.get(name + ".file");
}
public String objectToString(Object object) {
   // "file" is already a String, otherwise it would need conversion
      return object;
}
...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset