Why use Sphinx for full-text searching?

If you're looking for a good Database Management System (DBMS), there are plenty of options available with support for full-text indexing and searches, such as MySQL, PostgreSQL, and SQL Server. There are also external full-text search engines, such as Lucene and Solr. Let's see the advantages of using Sphinx over the DBMS's full-text searching capabilities and other external search engines:

  • It has a higher indexing speed. It is 50 to 100 times faster than MySQL FULLTEXT and 4 to 10 times faster than other external search engines.
  • It also has higher searching speed since it depends heavily on the mode, Boolean vs. phrase, and additional processing. It is up to 500 times faster than MySQL FULLTEXT in cases involving a large result set with GROUP BY. It is more than two times faster in searching than other external search engines available.
  • As mentioned earlier, relevancy is among the key features one expects when using a search engine, and Sphinx performs very well in this area. It has phrase-based ranking in addition to classic statistical BM25 ranking.
  • Last but not the least, Sphinx has better scalability. It can be scaled vertically (utilizing many CPUs, many HDDs) or horizontally (utilizing many servers), and this comes out of the box with Sphinx. One of the biggest known Sphinx cluster has over 3 billion records with more than 2 terabytes of size.

In one of his presentations, Andrew Aksyonoff (creator of Sphinx) presented the following benchmarking results. Approximately 3.5 Million records with around 5 GB of text were used for the purpose.

 

MySQL

Lucene

Sphinx

Indexing time, min

1627

176

84

Index size, MB

3011

6328

2850

Match all, ms/q

286

30

22

Match phrase, ms/q

3692

29

21

Match bool top-20, ms/q

24

29

13

Apart from a basic search, there are many features that make Sphinx a better solution for searching. These features include multivalve attributes, tokenizing settings, wordforms, HTML processing, geosearching, ranking, and many others. We will be taking a more elaborate look at some of these features in later chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset