Making a Choice

As we said at the beginning, data is like oil. We sit upon a vast ocean of data, yet until it’s refined into information, it’s unusable (and with a more crude comparison, no pun intended, there’s a lot of money in data these days). The ease of collecting and ultimately storing, mining, and refining the data out there starts with the database you choose.

Deciding which database to choose is often more complex than merely considering which genre maps best to a given domain’s data. Though a social graph may seem to clearly function best with a graph database, if you’re Facebook, you simply have far too much data to choose one. You are more likely going to choose a “Big Data” implementation, such as HBase or DynamoDB. This will force your hand into choosing a columnar or key-value store. In other cases, though you may believe a relational database is clearly the best option for bank transactions, it’s worth knowing that Neo4j also supports ACID transactions, expanding your options.

These examples serve to point out that there are other avenues beyond genre to consider when choosing which database—or databases—best serve your problem scope. As a general rule, as the size of data increases, the capacity of certain database styles wane. Column-oriented database implementations are often built to scale across datacenters and support the largest “Big Data” sets, while graphs generally support the smallest. This is not always the case, however. DynamoDB is a large-scale key-value store meant to automatically shard data across hundreds or thousands of nodes without any need for user administration, while Redis was built to run on one—with the possibility of a few master-slave replicas or client-managed shards.

There are several more dimensions to consider when choosing a database, such as durability, availability, consistency, scalability, and security. You have to decide whether ad hoc queryability is important or if mapreduce will suffice. Do you prefer to use an HTTP/REST interface, or are you willing to require a driver for a custom binary protocol? Even smaller scope concerns, such as the existence of bulk data loaders, might be important for you to think about.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset