Using scripts

ElasticSearch has a few functionalities where scripts can be used. You've already seen examples such as updating documents, filtering, and searching. Regardless of the fact that this seems to be advanced, we will take a look at the possibilities given by ElasticSearch. Looking into any request that use scripts, we can spot several fields:

  • script: This field contains the script code.
  • lang: This field informs the engine which language is used. If it is omitted, ElasticSearch assumes mvel.
  • params: This is an object containing parameters. Every defined parameter is available for script by its name. By using parameters, we can write cleaner code. Due to caching, code with parameters performs better than code with embedded constant values.

Available objects

During the execution of the script, ElasticSearch exposes several objects. The ones available for operations connected with searching are as follows:

  • doc (also available as _doc): This is an instance of the org.elasticsearch.search.lookup.DocLookup object. It gives us access to the current document found with calculated score and field values.
  • _source: This is an instance of org.elasticsearch.search.lookup.SourceLookup. This provides access to the source of the current document and values defined in this source.
  • _fields: This is an instance of org.elasticsearch.search.lookup.FieldsLookup. Again, it is used for access to document values.

In an update operation, ElasticSearch exposes only the ctx object with the _source property, which provides access to the current document.

As we have previously seen, several methods are mentioned in the context of document fields and their values. Let's show several examples of ways of getting the value for the title field (in the brackets you can see what ElasticSearch would return for one of our sample documents from the library index):

  • _doc.title.value (crime)
  • _source.title (Crime and Punishment)
  • _fields.title.value (null)

A bit confusing, isn't it? Let's stop for a moment and recall the previous information about fields. During indexing, a field value is sent to ElasticSearch as a part of the _source document. The search engine can store this information as a whole in the index (this is the default behavior but can be turned off). In addition, this source is parsed and every field may be stored in an index if it is marked as stored (meaning that the store property is set to true, that is, by default not marked). Finally, the field value may be configured as indexed. This means that the field value is analyzed, cut into tokens, and placed in the index again. To sum up, one field may be stored in an index as:

  • A part of _source
  • A stored, unparsed value
  • An indexed value, parsed into tokens

In scripts, except the script for updating, we have access to all these representations. You may wonder which version we should use. Well, if we want access to the processed form, the answer would be as simple as _doc. What about _source and _fields? In most cases, _source is a good choice. It is usually fast and needs fewer disk operations than reading the original field values from the index.

MVEL

ElasticSearch can use several languages for scripting when declared; otherwise, it assumes that MVEL is used. MVEL is fast, easy to use and embed, and simple, but it is a powerful expression language used in open source projects. It allows us to use Java objects, automatically maps properties to a getter/setter call, converts simple types, and maps collections and maps to arrays and associative arrays. For more information, refer to the following link:

http://mvel.codehaus.org/Language+Guide+for+2.0

Other languages

Using MVEL for scripting is a simple and sufficient solution, but if you would like to use something different, you can choose between JavaScript, Python, and Groovy. Before using other languages, we must install an appropriate plugin. For now, we'll just run the following command from the ElasticSearch directory:

bin/plugin -install elasticsearch/elasticsearch-lang-javascript/1.1.0

The only change we should make in the request is to add the additional information about which language we are using for scripting, and of course, to modify the script itself to be correct in the new language. Look at the following example:

{
 "query" : {
    "match_all" : { }
  },
  "sort" : {
      "_script" : {
        "script" : "doc.tags.values.length > 0 ? doc.tags.values[0] :'u19999';",
        "lang" : "javascript",
        "type" : "string",
        "order" : "asc"
      }
  }
}

As you can see, we used JavaScript for scripting instead of the default MVEL.

Script library

Usually, scripts are small, and it is quite convenient to put them in the request. But sometimes applications grow, and you want to give the developers something that they can reuse in their modules. If the scripts are large and complicated, it is generally better to place them in files and only refer to them in API requests. The first thing to do is to place our script in the proper place with a proper name. Our tiny script should be placed in the ElasticSearch directory config/scripts. Let's name our example file text_sort.js, where the extension of the file should indicate the language used for scripting. The content of this example file is very simple:

doc.tags.values.length > 0 ? doc.tags.values[0] :'u19999';

And the query using the preceding script can be a little easier:

{
 "query" : {
    "match_all" : { }
  },
  "sort" : {
      "_script" : {
        "script" : "text_sort",
        "type" : "string",
        "order" : "asc"
      }
  }
}

We can use text_sort as a method name. In addition, we can omit the script language; ElasticSearch will figure it out from the file extension.

Native code

For occasions when scripts are too slow or when you don't like scripting languages, ElasticSearch allows you to write Java classes and use them instead of scripts.

To create a new native script, we should implement at least two classes. The first one is a factory for our script. Let's focus on it for now and see some sample code:

package pl.solr.elasticsearch.examples.scripts;

import java.util.Map;

import org.elasticsearch.common.Nullable;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;

public class HashCodeSortNativeScriptFactory implements NativeScriptFactory {

  @Override
  public ExecutableScript newScript(@Nullable Map<String, Object> params) {
    return new HashCodeSortScript(params);
  }

}

The essential parts are highlighted. This class should implement org.elasticsearch.script.NativeScriptFactory. The interface forces us to implement the newScript() method. It takes parameters defined in the API call and returns an instance of our script.

Now, let's see the main class, our script. It will be used for sorting. Documents will be ordered by the hashCode() value of the chosen field. Documents without a field defined will be the first. We know the logic doesn't have too much sense, but it is good for presentation. The source code for our native script is like this:

package pl.solr.elasticsearch.examples.scripts;

import java.util.Map;

import org.elasticsearch.script.AbstractSearchScript;

public class HashCodeSortScript extends AbstractSearchScript {
  private String field = "name";

  public HashCodeSortScript(Map<String, Object> params) {
    if (params != null && params.containsKey("field")) {
      this.field = params.get("field").toString();
    }
  }

  @Override
  public Object run() {
    Object value = source().get(field);
    if (value != null) {
      return value.hashCode();
    }
    return 0;
  }

}

First of all the class inherits from org.elasticsearch.script.AbstractSearchScript and implements the run() method. This is the place where we get appropriate values from the current document, process them according to our strange logic, and return the result. You may notice the source() call. Yes, it is exactly the same _source parameter that we meet in the non-native scripts, and yes, there are also doc() and fields() available. Look at how we've used the parameters. We assume that a user can provide the field parameter, telling us which document field will be used for manipulation. We also provide a default value for this parameter.

Now it's time to install our native script. After packing the compiled classes as a JAR archive, we should put it in the ElasticSearch lib directory. This makes our code visible to the class loader. What we should do after that is to register our script. This can be done by using the settings API call or by adding a single line to the elasticsearch.yml configuration file, as shown in the following code:

script.native.native_sort.type: pl.solr.elasticsearch.examples.scripts.HashCodeSortNativeScriptFactory

Note the native_sort fragment. This is our script name, which will be used during requests and will be passed to the script parameter. The value for this property is the full class name of the factory whose server should be used to create the script.

The last thing is the need to restart the ElasticSearch instance and send our queries. For the example that uses our previously indexed data, we can try running the following query:

{
 "query" : {
    "match_all" : { }
  },
  "sort" : {
      "_script" : {
        "script" : "native_sort",
        "params" : {
          "field" : "otitle"
        },
        "lang" : "native",
        "type" : "string",
        "order" : "asc"
      }
  }
}

Note the params part of the query. In this call, we want to sort on the otitle field. We provide the script name, native_sort, and the script language, native. This is required. If everything goes well, we should see our results sorted by our custom sort logic. If we look at the response from ElasticSearch, we will see that documents without the otitle field are in the first positions of the results list and their sort value is 0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset