Handle multiple Librarians

The core functionality of Goophr Librarian is to update the index and return relevant DocIDs based on the search terms. As we saw while implementing the codebase for Librarian, we need to update the index, retrieve relevant DocIDs, and then, based on relevance, sort them before returning query results. Many operations are involved and a lot of maps are being used for lookups and updates. These operations might seem trivial. However, as the size of the lookup table (map) increases, the performance of operations on the lookup table will start to decline. In order to avoid such a decline in performance, many approaches can be taken.

Our primary goal is to understand distributed systems in the context of Go, and, for this reason, we will split Librarian to handle only a certain set of the index. Partitioning is one of the standard techniques used in databases, where the database is split into multiple partitions. In our case, we  we will have three instances of Librarian running, each of which is responsible for handling index for all tokens that are within character range, that are assigned to each of the partitions:

  • a_m_librarian: Librarian responsible for tokens starting with character "A" to "M"
  • n_z_librarian: Librarian responsible for tokens starting with character "N" to "Z"
  • others_librarian: Librarian responsible for tokens starting with numbers
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset