In the Chapter 3, Atomicity, Consistency, Isolation, and Durability (ACID), we learned about what ACID is, why we need it, and how it's supported in CockroachDB. Here, we will learn all about geo-partitioning. Geo-partitioning is one of the most important reasons why you will want to use a distributed SQL database such as CockroachDB.
In this chapter, you will get a basic understanding of what geo-partitioning is and why this feature is useful for you. We will also go over some cloud jargon and some of the options provided by various cloud providers to distribute your data geographically for better resiliency, performance, and availability. At the end of the chapter, we will go over different ways of geo-partitioning your data in CockroachDB.
The following topics will be covered in this chapter:
For executing the examples in this chapter, you will need to install CockroachDB. If you still haven't done so, please refer to the Technical requirements section in Chapter 2, How Does CockroachDB Work Internally? All the queries in this chapter are available at https://github.com/PacktPublishing/Getting-Started-with-CockroachDB.
As the word geo-partition suggests, the data is partitioned based on geographical locations. Geo-partitioning refers to the mechanism of storing the data in various geographical locations, based on where the data is being consumed.
For example, let's say you are maintaining a database for an airlines company that has international and domestic travelers as its users from every continent. Since they have a global presence, it would be beneficial to keep the users' data close to where they live. This will help in serving the data locally and quickly.
Figure 4.1 shows an example of a table whose rows are partitioned based on geo-location across three different continents. Rows are stored in specific databases based on their locality. This locality can be mapped to the user's location based on their activity:
Geo-partitioning will be useful, even within smaller regions, where you provide services locally. An example would be, let's say, you are managing a database for a company that has to serve lots of data very frequently, which is highly localized to users living in that area. This example fits companies such as Instacart, DoorDash, Uber Eats, Uber, Lyft, and Ola.
For the sake of discussion, let's consider DoorDash. In the case of DoorDash, its users search for restaurants and grocery stores, usually within 10–15 miles of where they live, and place an order on their app. In this case, we can house the data regarding user purchase orders and delivery status close to the users. This not only helps us in serving the data faster for local users, but we can also easily apply local compliance rules only on a specific subset of CockroachDB clusters.
If we consider the state of California for this example, we can strategically deploy CockroachDB nodes in San Francisco and Los Angeles. For restaurants, menus, and grocery items, which don't change often, we can create duplicate indexes that will help reduce read latencies when browsing.
Also, now, if you want to incorporate the California Consumer Privacy Act of 2018 (CCPA) for all the purchase orders that happen within California, it would be much easier to restrict it to the databases in California that manage this information. The following figure illustrates this aspect:
Some of the main advantages of geo-partitioning are as follows:
Let's now learn about cloud regions and availability zones, which is essential to understand how geo-partitioning can be realized on various cloud providers.
In this section, we will learn about some jargon related to the cloud. We will also learn about regions and zones, and how they have been realized by various cloud providers. These concepts are important to understand to decide how you want to distribute your data and what sort of guarantees you want to provide:
A region refers to an actual physical location where your cloud resources are housed. Each cloud provider has different notions of a region.
It's very important to understand how regions are implemented by different cloud providers, as it determines the following things:
A region consists of multiple zones. A zone refers to a more specific location within a given region.
An availability zone is an isolated data center that doesn't share any resources with other zones within the same region. All the communication between availability zones happens through a high-speed network. A region is supposed to have at least two availability zones that help in implementing redundancy, failover, and high availability.
Now, let's understand the definitions of region and zone by some of the top cloud providers.
In this section, we will briefly go over what region and zone mean on the top four cloud providers. We will be covering the following cloud providers in this section:
Let's get started!
Region: A Region is a physical location that consists of multiple data centers.
Availability Zone: A group of discrete data centers that provide redundancy to cloud resources is called an Availability Zone. Availability Zones help in implementing features such as high availability, fault tolerance, reliability, and scale.
AWS Local Zone: Local Zones provide resources that are located close to your end users. These will be useful in services such as gaming and streaming, which require low latency, high throughput, and elastic services.
Region: A region is a collection of zones.
Zones: A zone is a deployment within a region. You should use multiple zones to provide high availability and fault tolerance.
Region: A region is a set of data centers connected within a perimeter determined by the latency and connected through a fast network.
Geography: An area of the world containing at least one Azure region. A geography spans multiple regions and is fault-tolerant, even in the event of a complete regional failure.
Availability zones: Unique physical locations within a region. Each availability zone comprises one or more data centers with resources that are not shared with other zones.
Region: A region is a localized area and is made of several availability domains.
Availability domains: Availability domains are made of one or more data centers, they do not share any resources amongst them, and are connected through a fast network.
Each availability domain has three fault domains. Fault domains ensure your resources are from different availability domains, which offers improved resiliency.
Next, we will learn about how to achieve geo-partitioning with CockroachDB.
CockroachDB provides two topology patterns, which provide two levels of data resiliency, latency and availability.
Here, the entire data is in a single region.
CockroachDB defines two variations of single-region topology, development and production, as follows:
As part of your Continuous Integration/Continuous Deployment (CI/CD) pipeline, you can have a dedicated stage in which you provision a single-node cluster and later can run a bunch of system tests that interact with a real database. Since the clients will be local to the data, reads and writes will be much faster, although there is no resiliency.
The following is an example of a single-region deployment:
This topology can withstand up to a single-node failure. If two nodes fail, then some ranges might not have any leaseholders due to a lack of consensus and will become unavailable. You would also need a load balancer to spread the traffic from clients across three nodes evenly.
The following is a single-region deployment with three nodes:
In basic production topology, client requests hit a load balancer first, and later they are forwarded to one of the nodes, which acts as a gateway node. The gateway node then coordinates with relevant leaseholders, gathers all the data, and serves the data back to the client.
Here, the data is spread across multiple regions. You can use row-level control to distribute the rows geographically. The following figure shows a multi-region deployment, which spans three different regions in the US:
You can choose the following topologies based on your requirement:
Geo-partitioning options are only available with the enterprise license of CockroachDB. If you try to use enterprise features without the enterprise license, you will see the following message:
ERROR: use of partitions requires an enterprise license. see https://cockroachlabs.com/pricing?cluster=65244c3a-2d63-432c-a8b4-c70a53459ca1 for details on how to enable enterprise features
SQLSTATE: XXC02
You can visit the URL mentioned in the error message and get an enterprise license for a specific trial period.
Applying an enterprise license to your cluster would involve the following steps:
root@localhost:26258/defaultdb> SET CLUSTER SETTING cluster.organization = 'Self';
SET CLUSTER SETTING
Time: 166ms total (execution 165ms / network 0ms)
root@localhost:26258/defaultdb> SET CLUSTER SETTING enterprise.license = 'crl-0-EKD0mYsGGAIiBFNlbGY';
SET CLUSTER SETTING
Time: 191ms total (execution 191ms / network 0ms)
Now, let's take a deeper look at each one of these topologies. All these topologies would need a multi-region deployment. The easiest way to create a multi-region cluster for experiment purposes is to create an account at https://cockroachlabs.cloud and request a three-region, nine-node cluster.
In the case of geo-partitioned replicas, you have to have a column for the geolocation. It then has to be combined with the table's unique identifier to form a compound primary key. For example, let's say you have an ID, which is a Universally Unique Identifier (UUID) – you can combine that with the city or the state to form a compound primary key. Then, you have to partition the table and all the secondary indexes based on that column, and each partition will have its own replicas. Once you have this, you can ask CockroachDB to place data belonging to each partition in a specific region.
For example, let's say you have users from California and Ohio. All the rows belonging to users from California can be stored in the US-West region and users in Ohio can be stored in the US-East region. Here, the assumption is that data will be consumed locally, so both read and write latencies will be reduced.
For example, if you operate a food delivery service, your users will place orders from the city they live in, and also the food will be delivered within the same city. So, in this case, it makes total sense to use a geo-partitioned replica for your users. Since all the data and its replica are housed in the same region, if an entire region goes down, the data will not be available.
Let's see an example in which we are going to create a geo-partitioned replicas topology. Basically, the replicas will be pinned to a particular region so that local reads and writes are faster:
CREATE TABLE users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
first_name STRING NOT NULL,
last_name STRING NOT NULL,
city STRING NOT NULL,
PRIMARY KEY (city ASC, id ASC) );
CREATE INDEX users_first_name_last_name_index ON users (city, first_name, last_name);
ALTER TABLE users PARTITION BY LIST (city) (
PARTITION sfo VALUES IN ('san francisco'),
PARTITION aus VALUES IN ('austin'),
PARTITION ny VALUES IN ('new york')
);
ALTER INDEX users_first_name_last_name_index PARTITION BY LIST (city) (s
PARTITION sfo VALUES IN ('san francisco'),
PARTITION aus VALUES IN ('austin'),
PARTITION ny VALUES IN ('new york')
);
ALTER PARTITION sfo OF INDEX users@* CONFIGURE ZONE USING constraints = '{"+region=us-west":1}', num_replicas=3;
ALTER PARTITION aus OF INDEX users@* CONFIGURE ZONE USING constraints = '{"+region=us-central":1}', num_replicas=3;
ALTER PARTITION ny OF INDEX users@* CONFIGURE ZONE USING constraints = '{"+region=us-east":1}', num_replicas=3;
SHOW CREATE TABLE users;
The sample output is as follows:
SHOW CREATE TABLE users;
table_name | create_statement
-------------+------------------------------------------------------------------------------------------------------------------
users | CREATE TABLE public.users (
| id UUID NOT NULL DEFAULT gen_random_uuid(),
| first_name STRING NOT NULL,
| last_name STRING NOT NULL,
| city STRING NOT NULL,
| CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
| INDEX users_first_name_last_name_index (city ASC, first_name ASC, last_name ASC) PARTITION BY LIST (city) (
| PARTITION sfo VALUES IN (('san francisco')),
| PARTITION aus VALUES IN (('austin')),
| PARTITION ny VALUES IN (('new york'))
| ),
| FAMILY "primary" (id, first_name, last_name, city)
| ) PARTITION BY LIST (city) (
| PARTITION sfo VALUES IN (('san francisco')),
| PARTITION aus VALUES IN (('austin')),
| PARTITION ny VALUES IN (('new york'))
| );
| ALTER TABLE defaultdb.public.users CONFIGURE ZONE USING
| num_replicas = 3;
| ALTER PARTITION sfo OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-west: 1}';
| ALTER PARTITION sfo OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-west: 1}';
| ALTER PARTITION aus OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-central: 1}';
| ALTER PARTITION aus OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-central: 1}';
| ALTER PARTITION ny OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-east: 1}';
| ALTER PARTITION ny OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-east: 1}'
(1 row)
Time: 278ms total (execution 278ms / network 0ms)
Here, you can see that replicas are constrained to relevant regions. So, if an entire region goes down, a partition in that region becomes unavailable.
Next, we will go through the geo-partitioned leaseholders topology.
Like in the case of a geo-partitioned replica, you still need a column that has geolocation. You will also need a compound primary key, which is a combination of a unique ID and geolocation.
Here, the main difference is that you only pin the leaseholder to a specific location, but the replicas can be stored in different regions. Since we are only pinning the leaseholder, reads will always be faster but writes take more time, since data is replicated across regions, which takes more time as replication also involves consensus.
Let's see an example, in which we are going to create a geo-partitioned leaseholders topology. Basically, the leaseholder will be pinned to a particular region, so that local reads are faster:
CREATE TABLE users (
id UUID NOT NULL DEFAULT gen_random_uuid(),
first_name STRING NOT NULL,
last_name STRING NOT NULL,
city STRING NOT NULL,
PRIMARY KEY (city ASC, id ASC) );
CREATE INDEX users_first_name_last_name_index ON users (city, first_name, last_name);
ALTER TABLE users PARTITION BY LIST (city) (
PARTITION sfo VALUES IN ('san francisco'),
PARTITION aus VALUES IN ('austin'),
PARTITION ny VALUES IN ('new york')
);
ALTER INDEX users_first_name_last_name_index PARTITION BY LIST (city) (
PARTITION sfo VALUES IN ('san francisco'),
PARTITION aus VALUES IN ('austin'),
PARTITION ny VALUES IN ('new york')
);
ALTER PARTITION sfo OF INDEX users@*
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-west":1}',
lease_preferences = '[[+region=us-west]]';
ALTER PARTITION aus OF INDEX users@*
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-central":1}',
lease_preferences = '[[+region=us-central]]';
ALTER PARTITION ny OF INDEX users@*
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-east":1}',
lease_preferences = '[[+region=us-east]]';
SHOW CREATE TABLE users;
The sample output is as follows:
SHOW CREATE TABLE users;
table_name | create_statement
-------------+------------------------------------------------------------------------------------------------------------------
users | CREATE TABLE public.users (
| id UUID NOT NULL DEFAULT gen_random_uuid(),
| first_name STRING NOT NULL,
| last_name STRING NOT NULL,
| city STRING NOT NULL,
| CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
| INDEX users_first_name_last_name_index (city ASC, first_name ASC, last_name ASC) PARTITION BY LIST (city) (
| PARTITION sfo VALUES IN (('san francisco')),
| PARTITION aus VALUES IN (('austin')),
| PARTITION ny VALUES IN (('new york'))
| ),
| FAMILY "primary" (id, first_name, last_name, city)
| ) PARTITION BY LIST (city) (
| PARTITION sfo VALUES IN (('san francisco')),
| PARTITION aus VALUES IN (('austin')),
| PARTITION ny VALUES IN (('new york'))
| );
| ALTER TABLE defaultdb.public.users CONFIGURE ZONE USING
| num_replicas = 3;
| ALTER PARTITION sfo OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-west: 1}',
| lease_preferences = '[[+region=us-west]]';
| ALTER PARTITION sfo OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-west: 1}',
| lease_preferences = '[[+region=us-west]]';
| ALTER PARTITION aus OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-central: 1}',
| lease_preferences = '[[+region=us-central]]';
| ALTER PARTITION aus OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-central: 1}',
| lease_preferences = '[[+region=us-central]]';
| ALTER PARTITION ny OF INDEX defaultdb.public.users@primary CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-east: 1}',
| lease_preferences = '[[+region=us-east]]';
| ALTER PARTITION ny OF INDEX defaultdb.public.users@users_first_name_last_name_index CONFIGURE ZONE USING
| num_replicas = 3,
| constraints = '{+region=us-east: 1}',
| lease_preferences = '[[+region=us-east]]'
(1 row)
Time: 37ms total (execution 37ms / network 0ms)
Here, you can see that the lease preference is restricted to relevant regions, which will ensure that the leaseholders are always pinned to a specific region.
Next, we will go through the duplicate indexes topology.
The duplicate indexes topology is useful in cases where you write once and read it from various locations. For example, let's say you are managing the credit cards of folks who travel throughout the US very often. If you pin the data to a single region, whenever the user moves out of that region, it will slow down the reads. So, duplicate indexes come in handy to solve this issue.
Just like the previous two cases, you will have a compound primary key with a combination of an ID and geolocation. Here, you can create a partition based on that column, but you only pin the leaseholder to a specific region.
In our example, it can be the primary address of the user. Here, the credit card information can be replicated in different regions to cover the entire US. Since only the leaseholder is responsible for writes and reads, your reads will always be routed to the pinned region of the leaseholder. This again introduces latency.
Now, you can create secondary indexes for the credit card. For example, assuming that your leaseholder is pinned to the west coast, you can create secondary indexes, such as id_creditcard_east and id_creditcard_central, which can be constrained to the US-East and US-Central regions respectively. This will also guarantee that there are local leaseholders for secondary indexes in all the regions, which drastically reduces the read latency, as they were served locally always.
Since we already have multiple copies of the original data and we are creating secondary indexes that are also replicated, now there are a lot of copies of the same data in multiple regions. So, this increases the write latencies, as all these copies have to be updated and a multi-region consensus has to be reached:
CREATE TABLE local_attractions (
id UUID NOT NULL DEFAULT gen_random_uuid(),
name STRING NOT NULL,
address STRING NOT NULL,
city STRING NOT NULL,
PRIMARY KEY (id ASC)
);
ALTER TABLE local_attractions
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-central":1}',
lease_preferences = '[[+region=us-central]]';
CREATE INDEX idx_west ON local_attractions (city)
STORING (name);
CREATE INDEX idx_east ON local_attractions (city)
STORING (name);
ALTER INDEX local_attractions@idx_west
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-west":1}',
lease_preferences = '[[+region=us-west]]';
ALTER INDEX local_attractions@idx_east
CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=us-east":1}',
lease_preferences = '[[+region=us-east]]';
SHOW CREATE TABLE local_attractions;
The sample output is as follows:
SHOW CREATE TABLE local_attractions;
table_name | create_statement
--------------------+--------------------------------------------------
local_attractions | CREATE TABLE public.local_attractions (
| id UUID NOT NULL DEFAULT gen_random_uuid(),
| name STRING NOT NULL,
| address STRING NOT NULL,
| city STRING NOT NULL,
| CONSTRAINT "primary" PRIMARY KEY (id ASC),
| INDEX idx_west (city ASC) STORING (name),
| INDEX idx_east (city ASC) STORING (name),
| FAMILY "primary" (id, name, address, city)
| )
(1 row)
Time: 28ms total (execution 28ms / network 0ms)
Here, you can see that there are multiple identical indexes for multiple regions. So, whenever there are queries involving city and name, they can be served locally, hence reducing the latency of reads. Since we are maintaining identical indexes in multiple regions, the writes are much slower. This topology is useful where the data doesn't change much but is accessed frequently in all the regions.
Next, we will go through the follower reads topology.
If you want low-read latency but don't care about slightly older data, you can use this topology. In this case, you add the AS OF SYSTEM TIME clause in your reads, which then avoids the round trip to the leaseholder, and data is served locally. Writes would still need a multi-region consensus. You should not use this topology if you need strong consistency. Please refer to the CAP theorem section in Chapter 1, CockroachDB – A Brief Introduction, if you want to understand what consistency here means.
Let's create a sample table to understand how this works:
CREATE TABLE local_attractions (
id UUID NOT NULL DEFAULT gen_random_uuid(),
name STRING NOT NULL,
address STRING NOT NULL,
city STRING NOT NULL,
PRIMARY KEY (id ASC)
);
Here, in the SELECT query, you should use AS OF SYSTEM TIME follower_read_timestamp(). The follower_read_timestamp() function returns the TIMESTAMP data type with statement_timestamp() – the 4.8s value:
SELECT city FROM local_attractions
AS OF SYSTEM TIME follower_read_timestamp()
WHERE city = 'san francisco';
Since the data is always retrieved locally, without involving the leaseholder, you might get stale or older data. Once again, do not use this topology if you need strong consistency.
Next, we will look at the follow-the-workload topology.
This is the default topology if you don't use any of the previous ones. This topology works well if a given table is active in a single region, which means clients are doing reads and writes that are in the same region. Here, the read latency will be low in the active region and it will be higher in non-active regions, as the leaseholder will be in the active region. Writes still need a multi-region consensus and can be slower.
The following is a table that should help you to decide which topology might be relevant for your database workload:
In this section, we learned about various geo-partitioning topologies and how to configure them in CockroachDB. Based on latency, data consistency, and resiliency requirements, we should select the appropriate topology.
In this chapter, we learned what geo-partitioning is and why it is useful to geo-partition your data. Then, we covered all the important jargon in the cloud world, especially how each major cloud provider has defined regions and availability zones. We later discussed how to configure various multi-region topologies based on your application requirements.
In the next chapter, we will go over fault tolerance and auto-recovery with CockroachDB.