MongoDB has integrated text search features, as we saw in the previous recipe. However, there are multiple reasons why one would not use the Mongo text search feature and would fall back to conventional search engines such as Solr or Elasticsearch. The following are a few of the reasons:
Setting up a dedicated search engine does need additional efforts to integrate it with a MongoDB instance. In this recipe, we will see how to integrate a MongoDB instance with the search engine Elasticsearch.
We will be using the Mongo connector for integration purpose. It is an open source project that is available at https://github.com/10gen-labs/mongo-connector.
Refer to the Installing PyMongo recipe in Chapter 3, Programming Language Drivers, to install and set up Python. The tool pip
is used to get the Mongo connector. However, if you are working on the Windows platform, the steps to install pip
were not mentioned earlier. Visit https://sites.google.com/site/pydatalog/python/pip-for-windows to get pip
for Windows.
The prerequisites for starting a single instance are all we need for this recipe. However, in this recipe, we will start the server as a single node replica set for demonstration purpose.
Download the BlogEntries.json
file from the book's website and keep it on your local drive, ready to be imported.
Download Elasticsearch for your target platform from http://www.elasticsearch.org/overview/elkdownloads/. Extract the downloaded archive, and from the shell, go to the bin
directory of the extraction.
We will be getting the mongo-connector
source from github.com and running it. A Git client is needed for this purpose. Download and install the Git client on your machine. Visit http://git-scm.com/downloads and follow the instructions to install Git on your target operating system. If you are not comfortable installing Git on your operating system, then there is an alternative available that lets you download the source as an archive.
Visit https://github.com/10gen-labs/mongo-connector. Here, you will get an option that lets you download the current source as an archive, which we can then extract on our local drive. The following screenshot shows the download option available on the bottom-right corner of the screen:
Just like in the previous recipe, where we saw text search in Mongo, we will use the five documents to test our simple search. Download and keep BlogEntries.json
pip
for your operating system platform are installed. We will now get mongo-connector
from the source. If you have already installed the Git client, we will be executing the following steps on the operating system shell. If you have decided to download the repository as an archive, you may skip this step. Go to the directory where you would like to clone the connector repository, and execute the following commands:$ git clone https://github.com/10gen-labs/mongo-connector.git $ cd mongo-connector $ python setup.py install
$ mongod --dbpath /data/mongo/db --replSet textSearch --smallfiles --oplogSize 50
$ mongo
> rs.initiate()
bin
directory of the extracted elasticsearch
archive:$ elasticsearch
http://localhost:9200/_nodes/process?pretty
in the browser.{ "cluster_name" : "elasticsearch", "nodes" : { "p0gMLKzsT7CjwoPdrl-unA" : { "name" : "Zaladane", "transport_address" : "inet[/192.168.2.3:9300]", "host" : "Amol-PC", "ip" : "192.168.2.3", "version" : "1.0.1", "build" : "5c03844", "http_address" : "inet[/192.168.2.3:9200]", "process" : { "refresh_interval" : 1000, "id" : 5628, "max_file_descriptors" : -1, "mlockall" : false } } } }
For the sake of this test, we will be using the user_blog
collection in the test
database. The field on which we would like to have text search implemented is the blog_text
field in the document.
$ python mongo_connector/connector.py -m localhost:27017 -t http://localhost:9200 -n test.user_blog --fields blog_text -d mongo_connector/doc_managers/elastic_doc_manager.py
BlogEntries.json
file into the collection using the mongoimport
utility as follows. The command is executed with the .json
file present in the current directory:$ mongoimport -d test -c user_blog BlogEntries.json --drop
http://localhost:9200/_search?q=blog_text:facebook
in it.Basically, Mongo connector tails the oplog to find new updates that it publishes to another endpoint. We used Elasticsearch in our case, but it could even be Solr. You may choose to write a custom DocManager
that would plugin with the connector. For more details, visit https://github.com/10gen-labs/mongo-connector/wiki. The Readme for https://github.com/10gen-labs/mongo-connector gives some detailed information as well.
We gave the connector the -m
, -t
, -n
, --fields
, and -d
options. Their meaning as follows:
For more supported options, refer to the readme of the connector's page on GitHub.
Once the insert is executed on the MongoDB server, the connector detects the newly added documents to the collection of its interest, that is, user_blog
, and starts sending the data to be indexed from the newly added documents to Elasticsearch. To confirm the addition, we execute a query in the browser to view the results.
Elasticsearch will complain about index names with upper case characters in them. The mongo connector doesn't take care of this and thus, if the name of the collection has to be in lower case (for example, userBlog
), it will fail.
We have not done any additional configuration on Elasticsearch, as that was not the objective of this recipe. We were more interested in integrating MongoDB and Elasticsearch. You will have to refer to the Elasticsearch documentation for more advanced config options. If integration with Elasticsearch is required, there is a concept called rivers in Elasticsearch, that can be used as well. Rivers are Elasticsearch's way to get data from another data source. For MongoDB, the code for a river can be found at https://github.com/richardwilly98/elasticsearch-river-mongodb/. README.md
in this repository has steps on how to set up.
In this chapter, we explored a recipe named Implementing triggers in Mongo using oplog, on how to implement trigger-like functionalities using Mongo. This connector and the MongoDB river for Elasticsearch rely on the same logic to get the data out of Mongo as and how it is needed.