Defragmenting an index

Changes to a Lucene index slowly make it less efficient over time, in the same way that a hard drive can become fragmented. When new entities are indexed, they go into a file (called a segment) that is separate from the main index file. When an entity is deleted, it actually remains in the index file and is simply marked as inaccessible.

These techniques help Lucene to keep its indexes as accessible for queries as possible, but it leads to slower performance over time. Having to open multiple segment files is slow, and can run up against operating system limits on the number of open files. Keeping deleted entities in the index makes the files more bloated than they need to be.

The process of merging all of these segments, and really purging deleted entities, is called optimization. It is analogous to defragmenting a hard drive. Hibernate Search provides mechanisms for optimizing your indexes on either on a manual or automatic basis.

Manual optimization

The SearchFactory class offers two methods for optimizing Lucene indexes manually. You can call these methods within your application, upon whatever event you like. Alternatively, you might expose them, and trigger your optimizations from outside the application (for example, with a web service called by a nightly cron job).

You can obtain a SearchFactory reference through a FullTextSession object's getSearchFactory method. Once you have an instance, its optimize method will defragment all available Lucene indexes:

...
fullTextSession.getSearchFactory().optimize();
...

Alternatively, you can use an overloaded version of optimize, taking an entity class as a parameter. This method limits the optimization to only that entity's Lucene index, as follows:

...
fullTextSession.getSearchFactory().optimize(App.class);
...

Note

Another option is use a MassIndexer to rebuild your Lucene indexes (refer to the Mass updates section). Rebuilding an index from scratch leaves it in an optimized state anyway, so further optimization would be redundant if you are already performing that kind of maintenance regularly.

A very manual approach is to use the Luke utility, outside your application code altogether. See the section on Luke at the very end of this chapter.

Automatic optimization

An easier, if less flexible approach, is to have Hibernate Search trigger optimization for you automatically. This can be done on a global or a per-index basis. The trigger event can be a threshold number of Lucene changes, or a threshold number of transactions.

The chapter6 version of the VAPORware Marketplace application now contains the following four lines in its hibernate.cfg.xml file:

<property name="hibernate.search.default.optimizer.operation_limit.max">
   1000
</property>
<property name="hibernate.search.default.optimizer.transaction_limit.max">
   1000
</property>
<property name="hibernate.search.App.optimizer.operation_limit.max">
   100
</property>
<property name="hibernate.search.App.optimizer.transaction_limit.max">
   100
</property>

The top two lines, referencing default in the property name, establish global defaults for all Lucene indexes. The last two lines, referencing App, are override values specific to the App entity.

Note

Most of the configuration properties in this chapter may be made index-specific, by replacing the default substring with the name of the relevant index.

Normally this is the class name of the entity (for example, App), but it could be a custom name if you set the index element in that entity's @Indexed annotation.

Whether you deal at the global or index-specific level, operation_limit.max refers to a threshold number of Lucene changes (that is, adds or deletes). transaction_limit.max refers to a threshold number of transactions.

Overall, this snippet configures the App index for optimization after 100 transactions or Lucene changes. All other indexes will be optimized after 1,000 transactions or changes.

Custom optimizer strategy

You might enjoy the best of both worlds by using the automatic approach with a custom optimizer strategy. This chapter's version of the VAPORware Marketplace application uses a custom strategy to only allow optimization during off-peak hours. This custom class extends the default optimizer strategy, but only allows the base class to proceed with optimization when the current time is between midnight and 6:00 a.m.:

public class NightlyOptimizerStrategy
      extendsIncrementalOptimizerStrategy {

   @Override
   public void optimize(Workspace workspace) {
      Calendar calendar = Calendar.getInstance();
      inthourOfDay = calendar.get(Calendar.HOUR_OF_DAY);
      if(hourOfDay>= 0 &&hourOfDay<= 6) {
         super.optimize(workspace);
      }
   }

}

Tip

The easiest approach is to extend IncrementalOptimizerStrategy, and override the optimize method with your intercepting logic. However, if your strategy is fundamentally different from the default, then you can start with your own base class. Just have it implement the OptimizerStrategy interface.

To declare your own custom strategy, at either the global or per-index level, add a hibernate.search.X.optimizer.implementation property to hibernate.cfg.xml (where X is either default, or the name of a particular entity index):

...
<property name="hibernate.search.default.optimizer.implementation">
com.packtpub.hibernatesearch.util.NightlyOptimizerStrategy
</property>
...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset