In most cases, you are good to go with the default scoring algorithms of Elasticsearch to return the most relevant results. However, some cases require you to have more control on the calculation of a score. This is especially required while implementing domain-specific logic such as finding the relevant candidates for a job, where you need to implement a very specific scoring formula. Elasticsearch provides you with the function_score
query to take control of all these things.
This chapter covers the code examples only in Java because a Python client gives you the flexibility to pass the query inside the body parameter of a search function as you have learned in the previous chapters. Python programmers can simply use the example queries in the same way. There is no extra module required to execute these queries. You can still download the Python code for this chapter from the Packt website.
The function_score
query allows you to take the complete control of how a score needs to be calculated for a particular query.
The syntax of a function_score
query:
{ "query": {"function_score": { "query": {}, "boost": "boost for the whole query", "functions": [ {} ], "max_boost": number, "score_mode": "(multiply|max|...)", "boost_mode": "(multiply|replace|...)", "min_score" : number }} }
The function_score
query has two parts: the first is the base query that finds the overall pool of results you want. The second part is the list of functions, which are used to adjust the scoring. These functions can be applied to each document that matches the main query in order to alter or completely replace the original query _score
.
The other parameters that can be used with a functions_score
query are as follows:
multiply
. The score
mode defines how the combined result of the score functions will influence the final score together with the subquery score. This can be replace
(only the function_score
is used; the query score
is ignored), max
(the maximum of the query score and the function score), min
(the minimum of the query score and the function score), sum
(the query score and the function score are added), avg
, or multiply
(the query score and the function score are multiplied).first
(the first function that has a matching filter is applied), avg
, max
, sum
, min
, and multiply
.The following are the built-in functions that are available to be used with the function score query:
weight
field_value_factor
script_score
linear
, exp
, and gauss
Let's see them one by one and then you will learn how to combine them in a single query.
A weight
function allows you to apply a simple boost to each document without the boost being normalized: a weight of 2
results in 2 *
_score
. For example:
GET profiles/candidate/_search { "query": { "function_score": { "query": { "term": { "skills": { "value": "java" } } }, "functions": [ { "filter": { "term": { "skills": "python" } }, "weight": 2 } ], "boost_mode": "replace" } } }
The preceding query will match all the candidates who know Java, but will give a higher score to the candidates who also know Python. Please note that boost_mode
is set to replace
, which will cause _score
to be calculated by a query that is to be overridden by the weight
function for our particular filter clause. The query output will contain the candidates on top with a _score
of 2
who know both Java and Python.
Java example:
The previous query can be implemented in Java in the following way:
import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.Client; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder; import org.elasticsearch.index.query.functionscore.ScoreFunctionBuilders;
FunctionScoreQueryBuilder functionQuery = new FunctionScoreQueryBuilder(QueryBuilders.termQuery("skills", "java")) .add(QueryBuilders.termQuery("skills", "python"), ScoreFunctionBuilders.weightFactorFunction(2)).boostMode("replace"); SearchResponse response = client.prepareSearch().setIndices(indexName) .setTypes(docType).setQuery(functionQuery) .execute().actionGet();
This uses the value of a field in the document to alter the _score
:
GET profiles/candidate/_search { "query": { "function_score": { "query": { "term": { "skills": { "value": "java" } } }, "functions": [ { "field_value_factor": { "field": "total_experience" } } ], "boost_mode": "multiply" } } }
The preceding query finds all the candidates with Java in their skills, but influences the total score depending on the total experience of the candidate. So, the more experience the candidate has, the higher the ranking they will get. Please note that boost_mode
is set to multiply, which will yield the following formula for the final scoring:
_score = _score * doc['total_experience'].value
However, there are two issues with the preceding approach: first is the documents that have the total experience value as 0 and will reset the final score to 0. Second, Lucene _score
usually falls between 0 and 10, so a candidate with an experience of more than 10 years will completely swamp the effect of the full text search score.
To get rid of this problem, apart from using the field
parameter, the field_value_factor
function provides you with the following extra parameters to be used:
factor
: This is an optional factor to multiply the field value with. This defaults to 1.modifier
: This is a mathematical modifier to apply to the field value. This can be: none
, log
, log1p
, log2p
, ln
, ln1p
, ln2p
, square
, sqrt
, or reciprocal
. It defaults to none
.Java example:
The preceding query can be implemented in Java in the following way:
import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.Client; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.functionscore*;
FunctionScoreQueryBuilder functionQuery = new FunctionScoreQueryBuilder(QueryBuilders.termQuery("skills", "java")) .add(new FieldValueFactorFunctionBuilder("total_experience")).boostMode("multiply"); SearchResponse response = client.prepareSearch().setIndices("profiles") .setTypes("candidate").setQuery(functionQuery) .execute().actionGet();
script_score
is the most powerful function available in Elasticsearch. It uses a custom script to take complete control of the scoring logic. You can write a custom script to implement the logic you need. Scripting allows you to write from a simple to very complex logic. Scripts are cached, too, to allow faster executions of repetitive queries. Let's see an example:
{ "script_score": { "script": "doc['total_experience'].value" } }
Look at the special syntax to access the field values inside the script
parameter. This is how the value of the fields is accessed using Groovy scripting language.
To see some of the power of this function, look at the following example:
GET profiles/candidate/_search { "query": { "function_score": { "query": { "term": { "skills": { "value": "java" } } }, "functions": [ { "script_score": { "params": { "skill_array_provided": [ "java", "python" ] }, "script": "final_score=0; skill_array = doc['skills'].toArray(); counter=0; while(counter<skill_array.size()){for(skill in skill_array_provided){if(skill_array[counter]==skill){final_score = final_score+doc['total_experience'].value};};counter=counter+1;};return final_score" } } ], "boost_mode": "replace" } } }
Let's understand the preceding query:
params
is the placeholder where you can pass the parameters to your function, similar to how you use parameters inside a method signature in other languages. Inside the script
parameter, you write your complete logic.skill_array
variable. Finally, each skill that we have passed inside the params
section is compared with the skills inside skill_array
. If this matches, the value of the final_score
variable is incremented with the value of the total_experience
field of that document. The score calculated by the script score will be used to rank the documents because boost_mode
is set to replace the original _score
value.Do not try to work with the analyzed fields while writing the scripts. You might get weird results. This is because, had our skills field contained a value such as "core java", you could not have got the exact matching for it inside the script section. So, the fields with space-separated values need to be set as not_analyzed
or the keyword has to be analyzed in advance.
To write these script functions, you need to have some command over groovy scripting. However, if you find it complex, you can write these scripts in other languages, such as Python, using the language plugin of Elasticsearch. More on this can be found here: https://github.com/elastic/elasticsearch-lang-python.
For fast performance, use Groovy or Java functions. Python and JavaScript code requires the marshalling and unmarshalling of values that kill performance due to more CPU/memory usage.
Java example:
The previous query can be implemented in Java in the following way:
import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.Client; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.functionscore.*; import org.elasticsearch.script.Script;
String script = "final_score=0; skill_array = doc['skills'].toArray(); " + "counter=0; while(counter<skill_array.size())" + "{for(skill in skill_array_provided)" + "{if(skill_array[counter]==skill)" + "{final_score = final_score+doc['total_experience'].value};};" + "counter=counter+1;};return final_score"; ArrayList<String> skills = new ArrayList<String>(); skills.add("java"); skills.add("python"); Map<String, Object> params = new HashMap<String, Object>(); params.put("skill_array_provided",skills); FunctionScoreQueryBuilder functionQuery = new FunctionScoreQueryBuilder(QueryBuilders.termQuery("skills", "java")) .add(new ScriptScoreFunctionBuilder(new Script(script, ScriptType.INLINE, "groovy", params))).boostMode("replace"); SearchResponse response = client.prepareSearch().setIndices(indexName) .setTypes(docType).setQuery(functionQuery) .execute().actionGet();
As you can see, the script logic is a simple string that is used to instantiate the Script
class constructor inside ScriptScoreFunctionBuilder
.
We have seen the problems of restricting the range of experience and distance that could result in getting zero results or no suitable candidates. Maybe a recruiter would like to hire a candidate from a different province because of a good candidate profile. So, instead of completely restricting with the range filters, we can incorporate sliding-scale values such as geo_location
or dates into _score
to prefer documents near a latitude/longitude point or recently published documents.
function_score
provide to work with this sliding scale with the help of three decay functions: linear
, exp
(that is, exponential), and gauss
(that is, Gaussian). All three functions take the same parameter, as shown in the following code and are required to control the shape of the curve created for the decay function: origin
, scale
, decay
, and offset
.
The point of origin is used to calculate distance. For date fields, the default is the current timestamp. The scale
parameter defines the distance from the origin at which the computed score will be equal to the decay
parameter.
The origin
and scale
parameters can be thought of as your min
and max
that define a bounding box within which the curve will be defined. If we wanted to give more boosts to the documents that have been published in the past 10 days, it would be best to define the origin as the current timestamp and the scale as 10d.
The offset
specifies that the decay function will only compute the decay function of documents with a distance greater that the defined offset. The default is 0.
Finally, the decay
option alters how severely the document is demoted based on its position. The default decay value is 0.5.
GET profiles/candidate/_search { "query": { "function_score": { "query": { "match_all": {} }, "functions": [ { "exp": { "geo_code": { "origin": { "lat": 28.66, "lon": 77.22 }, "scale": "100km" } } } ],"boost_mode": "multiply" } } }
In the preceding query, we have used the exponential decay function that tells Elasticsearch to start decaying the score calculation after a distance of 100 km from the given origin. So, the candidates who are at a distance of greater than 100 km from the given origin will be ranked low, but not discarded. These candidates can still get a higher rank if we combine other function score queries such as weight or field_value_factor
with the decay function and combine the result of all the functions together.
Java example:
The preceding query can be implemented in Java in the following way:
import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.Client; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.functionscore.*;
Map<String, Object> origin = new HashMap<String, Object>(); String scale = "100km"; origin.put("lat", "28.66"); origin.put("lon", "77.22"); FunctionScoreQueryBuilder functionQuery = new FunctionScoreQueryBuilder() .add(new ExponentialDecayFunctionBuilder("geo_code",origin, scale)).boostMode("multiply"); //For Linear Decay Function use below syntax //.add(new LinearDecayFunctionBuilder("geo_code",origin, scale)).boostMode("multiply"); //For Gauss Decay Function use below syntax //.add(new GaussDecayFunctionBuilder("geo_code",origin, scale)).boostMode("multiply"); SearchResponse response = client.prepareSearch().setIndices(indexName) .setTypes(docType).setQuery(functionQuery) .execute().actionGet();
In the preceding example, we have used the exp
decay function but, the commented lines show examples of how other decay functions can be used.
Last, as always, remember that Elasticsearch lets you use multiple functions in a single function_score
query to calculate a score that combines the results of each function.