In this chapter, we will look at configuration options for Lucene indexes, and learn how to perform basic maintenance tasks. We will see how to toggle between automatic and manual updates to Lucene indexes. We will examine low-latency write operations, synchronous versus asynchronous updates, and other performance tuning alternatives.
We will cover how to defragment and clean up a Lucene index for better performance, and how to use Lucene without touching hard drive storage at all. Last but not least, we will get exposure to the highly powerful Luke utility for working with Lucene indexes outside of application code.
So far, we really haven't had to think much about the timing of when entities are indexed. After all, Hibernate Search is tightly integrated with Hibernate ORM. By default, the add-on updates Lucene whenever the core updates the database.
However, you have the option of decoupling these operations, and indexing manually if you like. Some common situations where you might consider a manual approach are as follows:
EntityIndexingInterceptor
(refer to Chapter 4, Advanced Mapping), you might use manual indexing as an alternative approach.To disable automatic indexing, set the hibernate.search.indexing_strategy
property to manual
in hibernate.cfg.xml
(or persistence.xml
if using JPA) as follows:
...
<property name="hibernate.search.indexing_strategy">manual</property>
...
When automatic indexing is disabled, manual indexing operations are driven by methods on a FullTextSession
object (either the traditional Hibernate or the JPA version).
The most important of these methods is index
, which
works with both add and update operations on the database side. This method takes one parameter, an instance of any entity class that is configured for Hibernate Search indexing.
This chapter's version of the VAPORware Marketplace application uses manual indexing. StartupDataLoader
calls index
for each app, immediately after persisting it in the database:
...
fullTextSession.save(theCloud);
fullTextSession.index(theCloud);
...
On the Lucene side, the index
method works within the same transactional context as the save
method on the database side. The indexing only occurs when the transaction commits. In the event of a rollback, the Lucene index is untouched.
Using index
manually overrides any conditional indexing rules. In other words, the index
method ignores any EntityIndexingInterceptor
that is registered with that entity class.
This is not the case for mass updates (see the Mass updates section), but is something to bear in mind when considering a manual indexing of individual objects. The code that calls index
would be
responsible for checking any conditions first.
The basic method for removing an entity from a Lucene index is purge
. This method is somewhat different from index
, in that you do not pass it an object instance to remove. Instead, you pass it the class reference for the entity, and the
ID of a particular instance to remove (that is, corresponding to @Id
or @DocumentId
):
...
fullTextSession.purge(App.class, theCloud.getId());
fullTextSession.delete(theCloud);
...
Hibernate Search also offers purgeAll
, a convenient method for removing all the instances of a particular entity type. This method also takes the entity class reference, although obviously there is no need to pass a specific ID:
...
fullTextSession.purgeAll(App.class);
...
As with index
, both purge
and purgeAll
operate within a transaction. Deletes do not actually occur until the transaction commits. Nothing happens in the event of a rollback.
If you really want to write to a Lucene index before the transaction commits, then the zero-parameter flushToIndexes
method allows you to do so. This might be useful if you are processing a large number of entities, and want to free up memory along the way (with the clear
method) to avoid OutOfMemoryException
:
... fullTextSession.index(theCloud); fullTextSession.flushToIndexes(); fullTextSession.clear(); ...
Adding, updating, and deleting entities individually can be rather tedious, and potentially error-prone if you miss things. Another option is to use MassIndexer
, which can be thought of as a compromise of sorts between automatic and manual indexing.
This utility class is still instantiated and used manually. However, when it is called, it automatically rebuilds the Lucene indexes for all mapped entity classes in one step. There's no need to distinguish between adds, updates, and deletes, because the operation wipes out the entire index and recreates it from scratch.
A MassIndexer
is instantiated with a FullTextSession
object's createIndexer
method. Once you have an instance, there are two ways to kick off the mass indexing:
When running in synchronous mode, you need to wrap the operation with a try-catch block in case the main thread is interrupted while waiting:
...
try {
fullTextSession.createIndexer().startAndWait();
} catch (InterruptedException e) {
logger.error("Interrupted while wating on MassIndexer: "
+ e.getClass().getName() + ", " + e.getMessage());
}
...
Mass indexing also differs from individual updates in two respects:
MassIndexer
operation is not transactional. There is no need to wrap the operation within a Hibernate transaction, and likewise you cannot rely on a rollback if something goes wrong.MassIndexer
does respect conditional indexing (refer to Chapter 4, Advanced Mapping). If you have an EntityIndexingInterceptor
registered for that entity class, it will be invoked to determine whether or not to actually index particular instances.