Chapter 6. System Configuration and Index Management

In this chapter, we will look at configuration options for Lucene indexes, and learn how to perform basic maintenance tasks. We will see how to toggle between automatic and manual updates to Lucene indexes. We will examine low-latency write operations, synchronous versus asynchronous updates, and other performance tuning alternatives.

We will cover how to defragment and clean up a Lucene index for better performance, and how to use Lucene without touching hard drive storage at all. Last but not least, we will get exposure to the highly powerful Luke utility for working with Lucene indexes outside of application code.

Automatic versus manual indexing

So far, we really haven't had to think much about the timing of when entities are indexed. After all, Hibernate Search is tightly integrated with Hibernate ORM. By default, the add-on updates Lucene whenever the core updates the database.

However, you have the option of decoupling these operations, and indexing manually if you like. Some common situations where you might consider a manual approach are as follows:

  • If you can easily live with Lucene being out of sync for limited periods, you might want to defer indexing operations until off-peak hours, to reduce system load during times of peak usage.
  • If you want to use conditional indexing, but are not comfortable with the experimental nature of EntityIndexingInterceptor (refer to Chapter 4, Advanced Mapping), you might use manual indexing as an alternative approach.
  • If your database may be updated directly, by processes that do not go through Hibernate ORM, you must manually update your Lucene indexes regularly to keep them in sync with the database.

To disable automatic indexing, set the hibernate.search.indexing_strategy property to manual in hibernate.cfg.xml (or persistence.xml if using JPA) as follows:

...
<property name="hibernate.search.indexing_strategy">manual</property>
...

Individual updates

When automatic indexing is disabled, manual indexing operations are driven by methods on a FullTextSession object (either the traditional Hibernate or the JPA version).

Adds and updates

The most important of these methods is index, which works with both add and update operations on the database side. This method takes one parameter, an instance of any entity class that is configured for Hibernate Search indexing.

This chapter's version of the VAPORware Marketplace application uses manual indexing. StartupDataLoader calls index for each app, immediately after persisting it in the database:

...
fullTextSession.save(theCloud);
fullTextSession.index(theCloud);
...

On the Lucene side, the index method works within the same transactional context as the save method on the database side. The indexing only occurs when the transaction commits. In the event of a rollback, the Lucene index is untouched.

Note

Using index manually overrides any conditional indexing rules. In other words, the index method ignores any EntityIndexingInterceptor that is registered with that entity class.

This is not the case for mass updates (see the Mass updates section), but is something to bear in mind when considering a manual indexing of individual objects. The code that calls index would be responsible for checking any conditions first.

Deletes

The basic method for removing an entity from a Lucene index is purge. This method is somewhat different from index, in that you do not pass it an object instance to remove. Instead, you pass it the class reference for the entity, and the ID of a particular instance to remove (that is, corresponding to @Id or @DocumentId):

...
fullTextSession.purge(App.class, theCloud.getId());
fullTextSession.delete(theCloud);
...

Hibernate Search also offers purgeAll , a convenient method for removing all the instances of a particular entity type. This method also takes the entity class reference, although obviously there is no need to pass a specific ID:

...
fullTextSession.purgeAll(App.class);
...

As with index, both purge and purgeAll operate within a transaction. Deletes do not actually occur until the transaction commits. Nothing happens in the event of a rollback.

If you really want to write to a Lucene index before the transaction commits, then the zero-parameter flushToIndexes method allows you to do so. This might be useful if you are processing a large number of entities, and want to free up memory along the way (with the clear method) to avoid OutOfMemoryException:

...
fullTextSession.index(theCloud);
fullTextSession.flushToIndexes();
fullTextSession.clear();
...

Mass updates

Adding, updating, and deleting entities individually can be rather tedious, and potentially error-prone if you miss things. Another option is to use MassIndexer, which can be thought of as a compromise of sorts between automatic and manual indexing.

This utility class is still instantiated and used manually. However, when it is called, it automatically rebuilds the Lucene indexes for all mapped entity classes in one step. There's no need to distinguish between adds, updates, and deletes, because the operation wipes out the entire index and recreates it from scratch.

A MassIndexer is instantiated with a FullTextSession object's createIndexer method. Once you have an instance, there are two ways to kick off the mass indexing:

  • The start method indexes asynchronously, meaning that indexing occurs in a background thread while the flow of code in the main thread continues.
  • The startAndWait method runs the indexing in synchronous mode, meaning that execution of the main thread is blocked until the indexing completes.

When running in synchronous mode, you need to wrap the operation with a try-catch block in case the main thread is interrupted while waiting:

...
try {
   fullTextSession.createIndexer().startAndWait();
} catch (InterruptedException e) {
   logger.error("Interrupted while wating on MassIndexer: "
      + e.getClass().getName() + ", " + e.getMessage());
}
...

Tip

If practical, it is best to use mass indexing when the application is offline and not responding to queries. Indexing will place the system under heavy load, and Lucene will obviously be in a very inconsistent state relative to the database.

Mass indexing also differs from individual updates in two respects:

  • A MassIndexer operation is not transactional. There is no need to wrap the operation within a Hibernate transaction, and likewise you cannot rely on a rollback if something goes wrong.
  • MassIndexer does respect conditional indexing (refer to Chapter 4, Advanced Mapping). If you have an EntityIndexingInterceptor registered for that entity class, it will be invoked to determine whether or not to actually index particular instances.

    Note

    MassIndexer support for conditional indexing was added in the 4.2 generation of Hibernate Search. If you are working with an application that uses an older version, you will need to migrate to 4.2 or higher in order to use EntityIndexingInterceptor and MassIndexer together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset