Percolator

Did you ever wonder what would happen if we reversed the traditional model of using queries to find documents? Does it make sense to find documents that match the queries? It's no surprise that there is an entire range of solutions where this model is very useful. Wherever you operate on unbounded streams of input data, where you search for occurrences of particular events, you can use this approach. This can be the detection of failures in a monitoring system or a "tell me when this product with defined criteria will be available in this shop" functionality. Let's see how the ElasticSearch percolator works and how it can handle this last example.

Preparing the percolator

The percolator looks like an additional index in ElasticSearch. This means that we can store any documents in it and obtain its mappings. We can also search it like an ordinary index. However, we spoke about the reversal of the standard behavior and treating queries as documents. Let's get the library example from Chapter 2, Searching Your Data, and try to index this query in the percolator. We assume that our users need to be informed when any book matching a defined criterion is available. Of course, the challenge is to develop the user interface for defining such complicated queries, but happily, we are only search specialists, and this is not our problem.

Look at the query1.json file that contains the example query generated by the user:

{
 "query" : {
  "bool" : {
   "must" : {
    "term" : {
     "title" : "crime"
    }
   },
   "should" : {
    "range" : {
     "year" : {
      "from" : 1900,
      "to" : 2000
     }
    }
   },
   "must_not" : {
    "term" : {
     "otitle" : "nothing"
    }
   }
  }
 }
}

The user interface can also use filters. This is not a problem. The second query should find all the books written before the year 2010 and that are currently available in our library. This is what the query2.json content looks like:

{
 "query" : {
  "filtered": {
   "query" : {
    "range" : {
     "from" : 0,
     "year" : {
      "to" : 2010
     }
    }
   },
   "filter" : {
    "term" : {
     "available" : true
    }
   }
  }
 }
}

Now let's register both the queries in the percolator. In order to do that we run the following commands:

curl -XPUT 'localhost:9200/_percolator/notifier/1' –d @query1.json
curl -XPUT 'localhost:9200/_percolator/notifier/old_books' –d @query2.json

ElasticSearch assumes that the target index (in our case, this index is named notifier) must be available, so let's create it now by running the following command:

curl -XPUT 'localhost:9200/notifier'

We are now ready to use our percolator. Our application will provide documents to the percolator and check whether ElasticSearch finds the corresponding queries. Let's use an example document that will match both the stored queries. It'll have the required title, the release date, and is currently available. It can look like the following code:

curl -XGET 'localhost:9200/notifier/x/_percolate?pretty' -d '{ 
 "doc" : {
  "title": "Crime and Punishment",
  "otitle": "
Преступлéние и наказáние
",
  "author": "Fyodor Dostoevsky",
  "year": 1886,
  "characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],
  "tags": [],
  "copies": 0, 
  "available" : true
 } 
}'

As we expected, ElasticSearch responds with the result that lists the identifiers of the matching queries:

{
  "ok" : true,
  "matches" : [ "1", "old_books" ]
}

It works like a charm! Note the endpoint used in this query. The index name corresponds to the type name in the _percolator index. Type is irrelevant. We can use any name just to satisfy the index/type syntax in ElasticSearch.

Getting deeper

Because queries registered in the percolator are in fact documents, we can use a normal query sent to ElasticSearch to choose which queries stored in the _percolator index should be used in the percolate process! It may sound weird, but it really gives us a lot of possibilities. In our library we can have several groups of users. Let's assume some of them have permissions to borrow very rare books. Or we have several branches in the city, and the user can declare where he/she would like to go and get the book from. Look at the following query registration command:

curl -XPUT 'localhost:9200/_percolator/notifier/3' -d '{
 "query" : {
  "term" : {
   "title" : "crime"
  }
 },
 "branches" : ["bra", "brb", "brd"]
}'

In this example, the user is interested in any books with "crime" in the title. He/she wants to borrow this book in one of the three listed branches. We will search in the branches field as we've already done with ordinary fields. In this particular case we have an array, so we must prepare mapping for that field. If you've already read Chapter 1, Getting Started with ElasticSearch Cluster, there shouldn't be a problem creating such a mapping. For example, we can do it like this:

{ 
 "notifier" : { 
  "properties" : { 
   "branches" : { 
    "type" : "string", 
    "index" : "not_analyzed"
   }
  }
 }
}

After updating the mappings and indexing our query, we can now test matching with our example document. We assume that the book was returned in the branch brB; now, let's check whether someone is interested in this book:

curl -XGET 'localhost:9200/notifier/x/_percolate?pretty' -d '{ 
 "doc" : {
  "title": "Crime and Punishment",
  "otitle": 
"Преступлéние и наказáние"
,
  "author": "Fyodor Dostoevsky",
  "year": 1886,
  "characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],
  "tags": [],
  "copies": 0, 
  "available" : true
 },
 "query" : {
  "term" : {
   "branches" : "brb"
  }
 }
}'

If everything is right, the answer should be similar to this (we index our query with 3 as the identifier):

{
  "ok" : true,
  "matches" : [ "3" ]
}

Note

Please note that there are some limitations when it comes to the query types supported by the percolator functionality. In the current implementation, parent-child and nested queries are not available, so you can't use queries such as has_child, top_children, has_parent, and nested.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset