Chapter 5. Search

The function of a search in any ECM system is to allow the user to search all content in the repository which users have access to. Here, in Alfresco, search is a combination of searching content along with permission control. In most ECM systems, search is supported via a search engine. The responsibility of a search engine is to index the content in the repository and provide the user with search query capability to search content. Certain search engines work in a synchronized way: content is indexed immediately as it enters the repository. In some search engines, content is indexed in an asynchronized way.

One of the biggest features of Alfresco 5.x is search. The new Solr4 search engine was introduced in the latest version of Alfresco. In the old version of Alfresco, it used to have old versions of Solr and Lucene as its search engines. Alfresco allows users to search any content they have access to in the repository. Alfresco supports both full text and metadata searches.

We will cover the following topics in this chapter:

  • Understanding Solr and Alfresco integration
  • Configuring and managing Solr
  • Troubleshooting Solr

Understanding Solr and Alfresco integration

Searching in Alfresco is supported via the Solr4 search engine. Solr4 works as a standalone enterprise application. It is built in Java and uses lucene internally for indexing. Solr extends the lucene library to add new features around it and make it a standalone application. It exposes the REST API for searching and submitting content for indexing. It supports the indexing of any data via JSON/XML/CSV or binary. The search request is also supported by a HTTP Get request. Using HTTP, GET data can be searched using Solr.

Solr4 can be installed as an integral application with Alfresco on the same application server or it can be installed completely on a separate machine. The latest version of Solr4 also supports the clustering and sharding of indexing.

There are various advantages of Solr:

  • Scalable
  • Better performance
  • Allows Facet search and more accurate results
  • Easy monitoring and administration
  • Asynchronous indexing near to real time
  • Compact disk formats
  • Alfresco and Solr communicate with each other via HTTP asynchronously.

Solr polls Alfresco at certain intervals to fetch all transactional information for indexing. This transactional data includes node information and permission information. Solr also polls the data model from Alfresco to define the schema for indexing. As you know, in Alfresco there are two stores: workspaceStore (live content) and archiveStore (archived content). Solr creates different sets of indexes for both stores and has different configurations for each of them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset