How it works...

The partition key for a DynamoDB table is used to spread the data out over partitions, which are 10 GB slices of data located across a large number of physical machines. It's best to use something like a UUID (GUID) or something with a similarly high level of cardinality for the partition key to preventing too much data from being concentrated on one partition. A combination of partition and sort key can be used in cases where you want buckets of data sorted under your keys, which can be useful in many scenarios but are very application-specific.

Once a table is created, data records can be retrieved by the key very quickly, but if you want to search by any of the other properties, lookups are very slow because you are forced to scan the entire table. For a table with millions or billions of records, this is obviously not practical. That's where GSIs come in. A GSI functions much like a separate table, holding the same data, but indexed by a different key. The benefit is that you don't have to manage that secondary table yourself. DynamoDB handles it for you, making sure the index is always kept up to date according to changes made on the table (in an eventually consistent manner).

While primary keys on the primary table must be unique, the same rule is not enforced for index keys. A query operation specifying a key for a GSI can return multiple records. Any items with index keys that are missing from the primary table do not take up any space in the GSI, which allows for efficient storage of items with different sets of properties.

Another concept that is important to understand is the difference between auto-scaling and on-demand settings for table capacity. It's also possible to manually set the read and write capacity of a table, but since we always want to look for ways to automate, this option should not be your first choice. Choosing auto-scaling can result in lower costs if you carefully choose your settings, and if your application scales slowly and smoothly, without abrupt changes in usage or long periods of no activity at all. On-demand is a relatively new choice that is much more appropriate for bursty workloads, and it handles quick changes better than auto-scaling, but in some scenarios, it can end up being more expensive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset