Selecting and configuring a directory provider

Both of the built-in index managers use a subclass DirectoryBasedIndexManager. As the name implies, both of them make use of Lucene's abstract class Directory, to manage the form in which indexes are stored.

In the Chapter 7, we will look at some special directory implementations geared for clustered environments. However, in single-server environments the two built-in choices are filesystem storage, and storage in memory.

Filesystem-based

By default, Lucene indexes are stored on the filesystem, in the current working directory of the Java application. No configuration is necessary for this arrangement, but it has been explicitly set in all versions of the VAPORware Marketplace application so far with this property in hibernate.cfg.xml (or persistence.xml):

...
<property name="hibernate.search.default.directory_provider">
   filesystem
</property>
...

As with the other configuration properties that we've seen in this chapter, you could replace default with a particular index name (for example, App).

When using filesystem-based indexes, you probably want to use a known fixed location rather than the current working directory. You can specify either a relative or absolute path with the indexBase property. In all of the VAPORware Marketplace versions that we've seen so far, the Lucene indexes have been stored under each Maven project's target directory, so that Maven removes them up before each fresh build:

...
<property name="hibernate.search.default.indexBase">
   target/lucenceIndex
</property>
...

Locking strategy

All Lucene directory implementations lock their indexes when writing to them, to prevent corruption from multiple processes or threads writing to them simultaneously. There are four locking strategies available, and you can specify one by setting the hibernate.search.default.locking_strategy property to one of these strings:

  • native: This is the default strategy for filesystem-based directories, when no locking strategy property is specified. It relies on file locking at the native operating system level, so that if your application crashes the index locks will still be released. However, the downside is that this strategy should not be used when your indexes are stored remotely on a network shared drive.
  • simple: This strategy relies on the JVM to handle file locking. It is safer to use when your Lucene index is on a remote shared drive, but locks will not be cleanly released if the application crashes or has to be killed.
  • single: This strategy does not create a lock file on the filesystem, but rather uses a Java object in memory (similar to a synchronized block in multithreaded Java code). For a single-JVM application, this works well no matter where the index files are, and there is no issue with locks being released after a crash. However, this strategy is only viable if you are sure that no other process outside the JVM might write to your index files.
  • none: It does not use locking at all. This is not a recommended option.

Tip

To remove locks that were not cleanly released, use the Luke utility explored in the Using the Luke utility section of this chapter.

RAM-based

For testing and demo purposes, our VAPORware Marketplace application has used an in-memory H2 database throughout this book. It is recreated every time the application starts, and is destroyed when the application stops, with nothing being persisted to permanent storage along the way.

Lucene indexes are able to work in the exact same manner. In this chapter's version of the example application, the hibernate.cfg.xml file has been modified to store its index in RAM rather than on the filesystem:

...
<property name="hibernate.search.default.directory_provider">
   ram
</property>
...

Note

The RAM-based directory provider initializes its Lucene indexes when the Hibernate SessionFactory (or JPA EntityManagerFactory) is created. Be aware that when you close this factory, it destroys all your indexes!

This shouldn't be a problem when using a modern dependency-injection framework, because the framework will keep your factory in memory and available when needed. Even in our vanilla example application, we have stored a singleton SessionFactory in the StartupDataLoader class for this reason.

An in-memory index would seem to offer greater performance, and it may be worth experimenting with in your application tuning. However, it is not generally recommended to use the RAM-based directory provider in production settings.

First and foremost, it is easy to run out of memory and crash the application with a large data set. Also, your application has to rebuild its indexes from scratch upon each and every restart. Clustering is not an option, because only the JVM which created the in-memory index has access to that memory. Last but not least, the filesystem-based directory provider already makes intelligent use of caching, and its performance is surprisingly comparable to the RAM-based provider.

All that being said, the RAM-based provider is a common approach for testing applications. Unit tests are likely to involve fairly small sets of data, so running out of memory is not a concern. Also, having the indexes completely and cleanly destroyed in between each unit test might be more of a feature than a drawback.

Tip

The RAM-based directory provider defaults to the single locking strategy, and it really makes no sense to change this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset