NoSQL Databases

In the previous chapter, we took a closer look at the RDBMS services of GCP, Cloud SQL, and Cloud Spanner. These are great for many use cases, but there are also several situations in which they are not quite the right tool. The NoSQL offerings on the GCP, Bigtable, and Datastore might come in handy here. Bigtable is similar in many ways to Apache's HBase, while Datastore is a document database that competes with alternatives such as MongoDB.

Now, one little bit of fine print: in this chapter, we will use the terms NoSQL and RDBMS as if they are perfect alternatives; that is, it might seem like any storage solution that is not an RDBMS is a NoSQL database. That's not quite strictly true. BigQuery, for instance, is a SQL-compliant data warehouse, which is certainly not an RDBMS. So, the term NoSQL really only means that the data is not accessed via SQL; the alternative could be either:

  • Product-specific syntax (such as the scan syntax in HBase)
  • Programmatic access (from a programming language such as Java or Python)

Relational databases use tables, columns, rows, or schemas to store and retrieve data. NoSQL databases do not use these structures since they opt for more flexible data models, such as documents or rows of key/value pairs (for example graph stores which store social connections as key-value pairs). Popular expansions of NoSQL include not SQL or not only SQL. Relational databases have important limitations that make them unsuitable for semi-structured data. Common types of semi-structured data include user and session data; chat, messaging, and log data; time series data such as IoT and device data; and large objects such as video and images.

Let's start with an understanding of the internal data representation in a couple of important types of NoSQL databases—but first, a small digression!

This digression above is meant to help us remember all that we really need to know about Datastore. Here is the same text, now annotated to make it relevant to NoSQL databases!

Now, once we've gotten the essential attributes of Datastore into our heads, remembering the essential characteristics of Bigtable is a lot easier; check out this table given below:

Datastore

Bigtable

Datastore is great for the small end of big data; data order of TB, not PB.

Bigtable is definitely meant for the big end of Big Data; order of several TB or PB. If the data size < 10 TB, performance is not great.

Datastore's big attraction is fast lookup...

Bigtable is best for high-speed scans (all rows, or all rows satisfying a condition) along a single column.

...achieved by indexing basically along every column.

Bigtable effectively only indexes along the row key; what is more, it also sorts data by the row key.

Fast lookup is achieved via hash indices, and these have the trade-off that insertion becomes slow.

Bigtable is the best game in town if you need fast and frequent writes; insertion is very fast (updates along row key are slow though).

Query time is pretty much independent of dataset size.

Both queries and updates specified using the row key are super-fast (order of milliseconds!), while operations on other columns are slow.

Datastore is a document database, suitable for XML-like hierarchical data.

Bigtable has a data model similar to columnar databases like HBase and works best for very large data with a clear sort order.

Datastore supports transactions, but you can also use it in a non-transactional manner.

Bigtable is ACID at the row-level, and only supports eventual consistency.

Datastore is a lot more economical than BigTable, the other NoSQL option on the GCP.

Bigtable can get costly as the cluster size grows.

Datastore is serverless; you never need to provision a server or specify a number of nodes.

Bigtable requires explicit provisioning of a cluster, and choices about the kind of disks in VMs in that cluster.

Scaling down to zero is easy thanks to the serverless nature of the technology.

Scaling down to zero is hard, as with any service that involves a cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset