In the previous two chapters, we learned about important topics regarding networking and compute services. Now, we will continue by looking at another fundamental offering from GCP – storage. Every application needs to store data, but not all data is the same. Data can be structured, unstructured, relational, or transactional, and GCP offers services to meet these requirements. Your application may even need to use multiple storage services to achieve your needs.
In this chapter, we will look at the different types of storage offered by Google and explain what should be considered before selecting a particular service. Google provides numerous storage options and covering all of these in a single chapter may make the facts difficult for you to digest. Therefore, we have divided this chapter into two parts.
In this part, we will consider which storage option we should choose and look into the following topics:
Although BigQuery can be considered a storage option, we will look at that product in Chapter 13, Analyzing Big Data Options.
Let's start this chapter by looking at when we should select a specific storage service. After that, we will look at each service individually while concentrating on exam topics.
As we mentioned at the beginning of this chapter, various storage services are offered by Google. You must understand what the correct storage to use is for your requirements. Google provides a nice flow chart that we can walk through. It is easy to follow and shows how requirements will dictate which storage option to select. Take some time to look at the following decision tree. This diagram works through some key questions to make sure that the storage service we select will fulfill our requirements:
With the preceding diagram in mind, let's walk through an example:
As shown in the preceding diagram, there are various outcomes, depending on our requirements. There are a lot of storage services offered by GCP, but if we follow this flow chart, we should be able to choose the best option.
Additionally, Google offers the following information, which may help you understand the product you require:
Exam Tip
Understanding these options is critical. You may be presented with a scenario where relational databases are needed over non-relational databases, or where you need to select an option that is suitable for structured data. You also need to understand the limits of storage; as an example, let's say you are told that you need to assess a service that offers a NoSQL database and can scale to terabytes of data. After reading this chapter, you should be able to map this scenario to a service.
In the next section, we will discuss data consistency.
Data consistency refers to how the storage service handles transactions and how data is written to a database. As you go through this chapter, make sure that you review the consistency of each service, as this is one of the main design factors when choosing a product or service.
We will also take this chance to explain ACID. This is a key term in data consistency, and we will refer to this several times throughout this chapter.
ACID is an acronym for the following attributes:
Exam Tip
It's also important for you to be able to identify storage services that support structured data, low latency, or horizontal scaling. Likewise, you should be able to identify storage services that offer strong consistency or support ACID properties.
Now, let's look more closely at each of these services. We will begin with Cloud Storage.
Google Cloud Storage is a service for storing objects in Google Cloud. It is fully managed and can scale dynamically. Objects are referred to as immutable pieces of data consisting of a file of any format. Some use cases are video transcoding, video streaming, static web pages, and backups. It is designed to provide secure and durable storage while also offering optimal pricing and performance for your requirements through different storage classes.
Cloud Storage uses the concept of buckets; you may be familiar with this term if you have used AWS S3 storage. A bucket is a basic container where your data will reside and is attached to a GCP project, such as other GCP services. Each bucket name is globally unique and once created, you cannot change it. There is no minimum or maximum storage size, and we only pay for what we use. Access to our buckets can be controlled in several ways. We will learn more about security in Chapter 15, Security and Compliance, but as an overview, we have the following main methods:
We will also mention encryption in Chapter 15, Security and Compliance. By default, Cloud Storage will always encrypt your data on the server side before it is written to disk. There are three options available for server-side encryption:
There is also the client-side encryption option, where encryption occurs before data is sent to Cloud Storage and additional encryption takes place at the server side.
Bucket locations allow us to specificity a location for storing our data. There are the following different location types:
Bucket locations are permanent, so some consideration should be taken before you select your preferred option. Data should be stored in a location that is convenient for your business users. For example, if you wish to optimize latency and network bandwidth for users in the same region, then region-based locations are ideal. However, if you also have additional requirements for higher availability, then consider dual-region to take advantage of this geo-redundancy.
We previously mentioned storage classes, and we should make it clear that it's vital to understand the different offerings to be successful in this exam. Let's take a look at these in more detail:
These different classes are summarized in the following diagram. Please note that the SLA and typical monthly availability may show different percentage values:
You can edit Cloud Storage buckets to move between these storage classes. Note that there are additional storage classes that cannot be set using Cloud Console:
Buckets get a default storage class upon their creation, and any objects inside it will inherit this class unless otherwise specified. We can change the default storage class of our bucket, but it will only apply to new objects after the change; the existing objects will remain in the original class. We can also effectively move data between buckets by using the Cloud Storage Transfer Service. However, note that moving data between locations incurs network usage costs. Additionally, we should be aware of extra costs that may be applicable. As Nearline, Coldline, and Archive storage are used to store infrequently accessed data, Google may apply retrieval or early deletion fees. A retrieval cost will be applied when we copy, read, or rewrite data or metadata that is stored in one of these storage classes.
Cloud Storage will provide both strong consistency and eventual consistency in certain circumstances.
Strong consistency can be expected in the following circumstances:
Eventual consistency can be expected in the following circumstance:
Cloud Storage cannot be mounted to a Google Compute Engine instance. However, if this is something that you would like to explore, GCP currently offers third-party integration using an open source FUSE adapter that will allow you to mount a storage bucket as a filesystem on Linux. This is available free of charge and is not officially supported; however, normal Cloud Storage charges are applicable.
Now that we have a good understanding of storage classes, let's take a look at how to create a bucket. We will also learn how to copy files into our bucket and change the storage policy of an object.
We have two options to create a bucket – either through a console or using gsutil. We will learn more about gsutil in Chapter 16, Google Cloud Management Options, but for the context of this chapter, it's enough for you to know that it is a dedicated command-line tool that will make RESTful API calls to your Cloud Storage service. Let's look at how to create a bucket from the console by going through the following steps:
As you can see, this is a pretty simple process. Let's make it even simpler by creating it from our dedicated command-line tool, gsutil, which we can access from Cloud Shell:
gsutil mb -c <storage class> -l <location> gs://<bucket name>
Let's look at the code in a little more detail:
gsutil mb -c regional -l us-east1 gs://cloudarchitect001
gsutil cp <filename> gs://<bucketname>
Here, <filename> is the name of the file you wish to copy, while <bucketname> is the destination bucket.
gsutil cp conf.yaml gs://cloudarchitect001
gsutil rewrite -s <storage class> gs://<bucket name>/<file name>
gsutil rewrite -s coldline gs://cloudarchitect001/conf.yaml
Great! So, we now know how to manage our buckets. In the next section, we will take a look at some key features of Cloud Storage.
Now, let's take a look at versioning and life cycle management. These are important features offered by Cloud Storage and are of particular significance for the exam.
Exam Tip
Cloud Storage offers an array of features. To succeed in the exam, you must understand the versioning and life cycle management features of Cloud Storage. Like the storage classes, you should review these and make sure you are comfortable before moving on to future sections of this chapter.
Versioning can be enabled to retrieve objects that we have deleted or overwritten. Note that, at this point, this will increase storage costs, but this may be worth it for you to be able to roll back to previous versions of important files. The additional costs come from the archived versions of each object that is created each time we overwrite or delete the live version of an object. The archived version of the file will retain the original name but will also be appended by a generation number to identify it. Let's go through the process of versioning:
gsutil versioning set on gs://<bucket name>
Here, <bucket name> is the bucket that we want to enable versioning on.
Let's look at the following screenshot, which shows how this looks. In this example, we are enabling versioning on our cloudarchitect bucket and then confirming that versioning is indeed enabled:
gsutil ls -a gs://<bucket name>/<filename>
Let's run this command on our file, named vm.yaml, which resides in our cloudarchitect bucket. We can see that we have several archived versions available:
We can turn versioning on and off whenever we want. Any available archived versions will remain when we turn it off, but we should remember that we will not have any archived versions until we enable this option.
Archived files could quickly get out of hand and impact our billing unnecessarily. To avoid this, we can utilize life cycle management policies. Life cycle management configurations can be assigned to a bucket. These files are a set of rules that can be applied to current or future objects in the bucket. When any of the objects meet the guidelines of the configuration, Cloud Storage can perform actions automatically.
One of the most common use cases for using life cycle management policies is when we want to downgrade the storage class of objects after a set period. For example, there may be a scenario where data needs to be accessed frequently up to 30 days after it is moved to Cloud Storage, and after that, it will only be accessed once a year. It makes no sense to keep this data in regional or multi-regional storage, so we can use life cycle management to move this to Coldline storage after 30 days. Perhaps objects are not needed at all after 1 year, in which case we can use a policy to delete these objects. This helps keep costs down. Let's learn how we can apply a policy to a bucket.
The first thing we need to do is create a life cycle configuration file. As we mentioned previously, these files contain a set of rules that we want to apply to our bucket. If the policy contains more than one rule, then an object has to match all of the conditions before the action will be taken. It might be possible that a single object could be subject to multiple actions; in this case, Cloud Storage will perform only one of the actions before re-evaluating any additional actions.
We should also note that a Delete action will take precedence over a SetStorageClass action. Additionally, if an object has two rules applied to it to move the object to Nearline or Coldline, then the object will always move to the Coldline storage class if both rules use the same condition. A configuration file can be created in JSON or XML.
In the following example, we will create a simple configuration file in JSON format that will delete files after 60 days. We will also delete archived files after 30 days. Let's save this as lifecycle.json:
{
"lifecycle": {
"rule": [{
"action": {
"type": "Delete"
},
"condition": {
"age": 60,
"isLive": true
}
}, {
"action": {
"type": "Delete"
},
"condition": {
"age": 30,
"isLive": false
}
}]
}
}
Exam Tip
If the value of isLive is true, then this lifecycle condition will only match objects in a live state; however, if the value is set to false, the lifecycle condition will only match archived objects. We should also note that objects in non-versioned buckets are considered live.
We can also use gsutil to enable this policy on our buckets using the following syntax:
gsutil lifecycle set <lifecycle policy name> gs://<bucket name>
Here, <lifecycle policy name> is the file we have created with our policy, while <bucket name> is the unique bucket name we want to apply the policy to. The following code shows how to apply our lifecycle.json file to our cloudarchitect bucket:
gsutil lifecycle set lifecycle.json gs://cloudarchitect
We mentioned earlier that a common use case for lifecycle would be to move objects to a cheaper storage class after a set period. Let's look at an example that would map to a common scenario. Suppose we have objects in our cloudarchitect bucket that we want to access frequently for 90 days. After this time, we only need to access the data monthly. Finally, after 180 days, we will only access the data yearly. We can translate these requirements into storage classes and create the following policy, which will help keep our costs to a minimum.
Let's break this down a little since we now have a policy with two rules that each have two conditions. Any storage object with an age greater than 90 days and that has the REGIONAL storage class applied to it will be moved to NEARLINE storage. Any object with an age greater than 180 days and that has the NEARLINE storage class applied to it will be moved to ARCHIVE storage:
{
"lifecycle": {
"rule": [{
"action": {
"type": "SetStorageClass",
"storageClass": "NEARLINE"
},
"condition": {
"age": 90,
"matchesStorageClass": ["REGIONAL"]
}
},
{
"action": {
"type": "SetStorageClass",
"storageClass": "ARCHIVE"
},
"condition": {
"age": 180,
"matchesStorageClass": ["NEARLINE"]
{
Exam Tip
At least one condition is required. If you enter an incorrect action or condition, then you will receive a 400 bad request error response.
Next, let's check out retention policies and locks.
Bucket locks allow us to configure data retention policies for a Cloud Storage bucket. This means we can govern how long objects in the bucket must be retained and prevent the policy from being reduced or removed.
We can create a retention policy by running the following command:
gsutil retention set 60s "gs://<bucketname>"
This sets the policy for 60 seconds. We can also set it to 60d for 60 days, 1m for 1 month, or 10y for 10 years. The output of setting this policy will be similar to the following:
Retention Policy (UNLOCKED):
Duration: 60 Second(s)
Effective Time: Thu, 01 Jul 2021 05:49:05 GMT
Note that if a bucket does not have a retention policy, then we can delete or replace objects at any time. However, if a policy is applied, then the objects in the bucket can only be replaced or deleted once their age is older than the retention policy period.
Exam Tip
Retention policies and object versioning are exclusive features, which means that for a given bucket, only one of these can be enabled at a time.
Object holds are metadata flags that can be placed on individual objects, which means they cannot be replaced or deleted. We can, however, still edit the metadata of an object that has been placed on hold.
There are two types of holds and an object can have one, both, or neither hold applied. When an object is stored in a bucket without a retention policy, both hold types behave in the same way. However, if a retention policy has been applied to the bucket, the holds have different effects on an object when the hold is released:
Let's look at an example to make things a little clearer. Let's say we have two objects – Object A and Object B – that we add to a storage bucket that has a 1-month retention period. We apply an event-based hold on Object A and a temporary hold on Object B. Due to the retention period alone, we would usually be able to delete both objects after 1 month. However, because we applied an object hold to each object, we are unable to do so. If we release the hold on Object A, then due to the event-based hold, the object must remain in the bucket for another 1 month before we can delete it. If we release the hold on Object B, we can immediately delete this due to the temporary hold not affecting the object's time in the bucket.
There will be cases where we want to transfer files to a bucket from on-premises or another cloud-based storage, and Google offers many ways to do so. There are some considerations that we should bear in mind to make sure that we select the right option. The size of the data and the available bandwidth will be the deciding factors. Let's take a quick look at a chart to show how long it would take to transfer some set data sizes over a specific bandwidth:
Next, Let's look at some methods of transferring data.
Cloud Storage Transfer Service provides us with the option to transfer or back up our data from online sources to a data sink. Let's say, for example, that we want to move data from our AWS S3 bucket into our Cloud Storage bucket. S3 would be our source and the destination Cloud Storage bucket would be our data sink. We can also use Cloud Storage Transfer Service to move data from an HTTP/HTTP(s) source or even another Cloud Storage bucket. To make these data migrations or synchronizations easier, Cloud Storage Transfer Service offers several options:
Exam Tip
You can also use gsutil to transfer data between Cloud Storage and other locations. You can speed up the process of transferring files on-premises by using the -m option to enable a multi-threaded copy. If you plan to upload larger files, gsutil will split these files into several smaller chunks and upload these in parallel.
It is recommended that you use Cloud Storage Transfer Service when transferring data from Amazon S3 to Cloud Storage.
If we look at Figure 9.13, we can see that there are some pretty large transfer times. If we have petabytes of storage to transfer, then we should use Transfer Appliance. This is a hardware storage device that allows secure offline data migration. It can be set up in our data center and is rack-mountable. We simply fill it with data and then ship it to an ingest location where Google will upload it to Cloud Storage.
Google offers the option of a 40 TB or 300 TB transfer appliance. The following is a quote from https://cloud.google.com/transfer-appliance/docs/4.0/security-and-encryption:
A common use case for Transfer Appliance is to move large amounts of existing backups from on-premises to cheaper Cloud Storage using the Coldline storage class. It would take more than 1 week to perform this upload over the network.
Now, we will have a look at IAM.
Access to Google Cloud Storage is secured with IAM. Let's have a look at the following list of predefined roles and their details:
Google Cloud Storage comes with predefined quotas. These default quotas can be changed via the Navigation menu, under the IAM & Admin | Quotas section. From this menu, we can review the current quotas and request an increase to these limits. We recommend that you familiarize yourself with the limits of each service as this can have an impact on your scalability. For Cloud Storage, we should be aware of the following limits:
Pricing for Google Cloud Storage is based on four components:
It's wise to note early on in this section that Google rebranded their previous Cloud Datastore product with a newer major version called Cloud Firestore. However, it is also important to understand the relationship between both as Firestore gives the option of Native mode and Datastore mode, with the latter using the traditional Datastore system's behaviors with the Firestore storage layer, which eliminates some of Datastore's limitations.
Cloud Datastore is a NoSQL database that is built to ease application development. It uses a distributed architecture that is highly scalable, and because it is serverless, we do not have to worry about the underlying infrastructure. Cloud Datastore distributes our data over several machines and uses masterless, synchronous replication over a wide geographical area.
Firestore builds on this but also accelerates the development of mobile, IoT, and web applications. It offers built-in live synchronization, as well as an offline mode that increases the efficiency of developing real-time applications. This includes workloads consisting of live asset tracking, real-time analytics, social media profiles, and gaming leaderboards.
Firestore also offers very high availability (HA) on our data. With automatic multi-region replication and strong consistency, it offers a 99.999% guarantee.
Before we go any further into the details, we don't want to make any assumptions that you know what a NoSQL database is. SQL databases are far more well-known by non-database administrators in the industry. A SQL database is primarily a relational database that is based on tables consisting of several rows of data that have predefined schemas.
Compare that to the concept of a NoSQL database, which is a non-relational database based around key-value pairs that do not have to adhere to any schema definitions. Ultimately, this allows us to scale efficiently; therefore, we can see how it becomes a lot easier to add or remove resources as we need to. This elasticity makes Cloud Datastore perfect for transactional information, such as real-time inventories, ACID transactions that will always provide data that is valid, or for providing user profiles to keep a record of user experience based on past activities or preferences. However, it is also important to note that NoSQL databases may not be the perfect fit for every scenario and consideration should be taken when deciding on what database to select, particularly regarding the efficiency of querying your data. OK; so, now that we know what a NoSQL database is, let's look at what makes up a Datastore database, as relationships between data objects are addressed differently than other databases that you may be familiar with. A traditional SQL database would be made up of tables, rows, and columns. In Cloud Datastore, an entity is the equivalent of a row, and a kind is the equivalent of a table. Entities are data objects that can have one or more named properties.
Each entity has a key that identifies it. Let's look at the following table to compare the terminology that is used with SQL databases and Cloud Datastore:
Now, let's look at the different types of modes that are offered in more detail.
When we create a new database, we must select one of the modes. However, we cannot use a mixture of modes within the same project. So, it is recommended to think about your design and what your application is built for.
Native mode offers all the new features of the revamped product. It takes the best of Cloud Datastore and the Firebase real-time database. Firebase was originally designed for mobile applications that required synced states across clients in real time. As such, Firestore takes advantage of this and offers a new, strongly consistent storage layer, as well as real-time updates. Additionally, it offers a collection and document data model and mobile and web client libraries. Native mode can scale automatically to millions of concurrent clients.
Exam Tip
While Firestore is backward compatible with Datastore, the new data model, real-time updates, and mobile and web client libraries are not. You must use Native mode to gain access to these features.
Datastore mode supports the Datastore API, so there is no requirement to change any existing Datastore applications. If your architecture requires the use of an established Datastore server architecture, then Datastore mode allows for this and removes some of the limitations from Datastore, mainly that Datastore mode offers strong consistency unless eventual consistency is specified. Datastore mode can scale automatically to millions of writes per second.
Exam Tip
Data objects in Firestore in Datastore mode are known as entities.
New projects that require a Datastore database should use Firestore in Datastore mode. However, starting in 2021, existing Datastore databases will be automatically upgraded without the need to update your application code. Google will notify users of the schedule for the upgrade, which does not require downtime.
The first thing we need to do is select a mode before we can utilize Cloud Firestore. Let's look at creating a database in Datastore mode and run some queries on the entities that we create. Before we do so, let's remind ourselves that Datastore is a managed service. As we go through this process, you will note that we are directly creating our entities without provisioning any underlying infrastructure first. You should also note that, like all storage services from GCP, Cloud Datastore automatically encrypts all data before it's written to disk. Follow these steps to create a Firestore in Datastore mode:
We can now give our entity a namespace, a Kind (a table, in SQL terminology), and a key identifier, which will be auto-generated. Let's leave our namespace as the default and create a new kind called exam:
With that, we have seen how easy it is to create a Datastore database and query the entities we create. Now, let's move on and look at some of the differences between Datastore and Firestore.
Cloud Firestore in Native mode is the complete re-branded product. It takes the best of Cloud Datastore and the Firebase real-time database to offer the same conceptual NoSQL database as Datastore, but as Firebase is built for low-latency mobile applications, it also includes the following new features:
Cloud Firestore in Datastore mode uses the same Datastore system behavior but will access the Firestore storage layer, which will remove some limitations, such as the following:
Access to Google Cloud Datastore is secured with IAM. Let's have a look at the list of predefined roles and their details:
Google Cloud Datastore comes with predefined quotas. These default quotas can be changed by going to the Navigation menu, under the IAM & Admin | Quotas section. From this menu, we can review the current quotas and request an increase to these limits. We recommend that you familiarize yourself with the limits for each service as this can have an impact on your scalability. Cloud Firestore offers a free quota that allows us to get started at no cost:
With the free tier, there are the following standard limits:
Pricing for Google Cloud Firestore offers a free quota to get us started, as mentioned previously.
If these limits are exceeded, there will be a charge that will fluctuate, depending on the location of your data. General pricing is per 100,00 documents for reads, writes, and deletes. Stored data is charged per GB per month.
Given the name, it won't be a major surprise to hear that Cloud SQL is a database service that makes it easy to set up, maintain, and manage your relational PostgreSQL, MySQL, or SQL Server database on Google Cloud.
Although we are provisioning the underlying instances, it is a fully managed service that is capable of handling up to 64 TB of storage. Cloud SQL databases are relational, which means that they are organized into tables, rows, and columns. As an alternative, we can install the SQL Server application image onto a Compute Engine instance, but Cloud SQL offers many benefits that come from being fully managed by Google – for example, scalability, patching, and updates are applied automatically, automated backups are provided, and it offers HA out-of-the-box. Again, it is important to highlight that although these benefits may look great, we should also be mindful that there are some unsupported functions and that consideration should be taken before committing to Cloud SQL.
Exam Tip
Cloud SQL can accommodate up to 64 TB of storage. If you need to handle larger amounts of data, then you should look at alternative services, such as Cloud Spanner.
Now, let's look at some of the features offered by Cloud SQL.
Cloud SQL for MySQL offers us the following:
Cloud SQL for PostgreSQL offers us the following:
Cloud SQL for SQL Server offers us the following:
Now that we know the main features, let's look at some of the things to bear in mind when we create a new instance:
Selecting the correct capacity to fit your database size is also extremely important because once you have created your instance, you cannot decrease the capacity. If we over-spec our storage, then we will be paying for unused space. If we under-spec our storage, then we can cause issues with availability. One method to avoid this is through the automatic storage increase setting. If this is enabled, your storage is checked every 30 seconds to ensure that the storage has not fallen below a set threshold size. If it has, then additional storage will be automatically added to your instance. The threshold's size depends on the amount of storage your instance currently has provisioned, and it cannot be larger than 25 GB. This sounds great, but we should also be mindful that if we have spikes in demand, then we will suffer a permanent increase in storage cost. We should also be mindful that creating or increasing storage capacity to 30 TB or greater could lead to increased latency for operations such as backups:
We discussed VPC peering in more detail in Chapter 10, Networking Options in GCP. Using a private IP will help lower network latency and improve security as traffic will never be exposed to the internet; however, we should also note that to access a Cloud SQL instance on its private IP address from other GCP resources, the resource must be in the same region.
Important Note
It is recommended that you use a private IP address if you wish to access a Cloud SQL instance from an application that is running in Google's container service; that is, Google Kubernetes Engine.
Cloud SQL is a relational database that offers strong consistency and support for ACID transactions.
Now that we have an understanding of the concepts, let's look at how to create a new MySQL instance:
We can also set the storage capacity and encryption options to enable the use of customer-managed encryption keys. In the preceding screenshot, we are using the default settings.
Like other services, we can create an instance by using the command-line tools available to us. Earlier in this chapter, we touched on some command-line tools, and likewise, we can use gcloud to create a new instance. We can run the following syntax to create a new database:
gcloud sql instances create <instance name> --database-version=<DATABASE version> --cpu=<cpu size> --memory=<RAM size> --region=<region name>
Let's look at this code in a little more detail:
Let's create a new PostgreSQL DB with version 9.6 called cloudarchitect001 and allocate 4 CPUs and 3,840 MiB of RAM. We will deploy this instance in the us-central region. Please note that this command may ask you to enable the API. Click Yes:
gcloud sql instances create cloudarchitect001 --database-version=POSTGRES_9_6 --cpu=1 --memory=3840MiB --region=us-central
Exam Tip
You can directly map your on-premises MySQL, SQL, or PostgreSQL database to Cloud SQL without having to convert it into a different format. This makes it one of the common options for companies looking to experiment with public cloud services.
Now that we have created a new instance by console and command line, let's look at some important features in more detail – read replicas and failover replicas.
Cloud SQL offers the ability to replicate our master instance to one or more read replicas. This is essentially a copy of the master that will register changes that are made to the master in the replica in almost real time.
Tip
We should highlight here that read replicas do not provide HA, and we cannot failover to a read replica from a master. They also do not fall in line with any maintenance window that we may have specified when creating the master instance.
There are several scenarios that Cloud SQL offers for read replication:
Enabling read replicas is a simple process. Let's learn how to do this from our GCP Console:
Exam Tip
Before we can create a read replica, the following requirements must be met: binary logging must be enabled and at least one backup must have been created since binary logging was enabled.
Simply give the instance a name, select a region, and click Create. We will see configuration options similar to those we used when we created a master instance:
In this section, we looked at read replicas and the various options available. In the next section, we will look at failover replicas.
I am sure we are all familiar with the term HA. The purpose of this is, of course, to reduce the downtime of a service and in terms of Cloud SQL, to reduce the downtime when a zone or instance becomes unavailable. Public clouds are not immune to downtime and zonal outages can occur, as well as instances becoming corrupted. By enabling HA, our data continues to be available to client applications. As we discussed in the previous section, when deploying a new Cloud SQL instance, we can select a multiple zone configuration to enable HA, which means two separate instances are created within the same region but in different zones. The first instance is configured as the primary, while the second is used as a standby. Through synchronous replication to each zones persistent disk, all writes that are made to the primary instance will be replicated to the disks in both zones before the transaction is confirmed as completed. In the event of a zone or instance failure, the persistent disk is attached to the standby instance, which will then become the new primary instance. Users are then routed to this new primary instance. This process is described as a failover. It is important to note that this configuration will remain even when the original primary instance comes back online. This is because once the original primary instance becomes available once more, it is destroyed, recreated, and becomes the new standby instance. If there is a need to have the primary instance in the original zone, then a failback can be performed. This process is the same as a failover but is where we are making the decision.
Please note that until Q1 2021, legacy MySQL had an HA option for using failover replicas. This is no longer available from Cloud Console.
We mentioned backup and recovery previously when we created a new instance. We can create a backup schedule and trigger an on-demand backup from our console or command line whenever required. If we want to edit our instance, we have to check the backup schedule that we have configured or change the backup time if necessary:
Backups are a way to restore our data from a certain time in the event of corruption or data loss. Cloud SQL gives us the option to restore data for the original instance or a different instance. Both methods will overwrite all of the current data on the target instance.
Let's look at how we can restore from the console:
In the next section, we will look at how to migrate data.
GCP now offers a service that allows us to easily migrate databases to Cloud SQL from on-premises, Compute Engine, or even other clouds. In essence, this is a lift and shift for MySQL and PostgreSQL workloads in Cloud SQL. This is an ideal first step for organizations looking to move existing services into a public cloud as it means less infrastructure to manage and also enjoying the HA, DR, and performance benefits of Google Cloud.
We previously mentioned that we can clone our instances. This will create a copy of our source instance, but it will be completely independent. This does not mean that we have a cluster or read-only replica in place. Once the cloning is complete, any changes to the source instance will not be reflected in the new cloned instance and vice versa. Instance IP addresses and replicas are not copied to the new cloned instance and must be configured again on the cloned instance. Similarly, any backups that are taken from the source instance are not replicated to the cloned instance.
This can be done from our GCP Console or by using our gcloud command-line tool:
gcloud sql instances clone <instance source name> <target instance name>
Here, <instance source name> is the name of the instance you wish to clone, while <target instance name> is the name you wish to assign to the cloned instance.
gcloud sql instances clone cloudarchitect001 cloudarchitect001cloned
We will call the cloned instance cloudarchitect001cloned.
Access to Google Cloud SQL is secured with IAM. Let's have a look at the list of predefined roles and their details:
Google Cloud SQL comes with predefined quotas. These default quotas can be changed via the Navigation menu, under the IAM & Admin | Quotas section. From this menu, we can review the current quotas and request an increase to these limits. We recommend that you familiarize yourself with the limits for each service as this can have an impact on your scalability. For Cloud SQL, we should be aware of some important limits.
Exam Tip
MySQL (second generation) and PostgreSQL have a storage limit of 10 TB.
Other limits to be aware of include the following:
Pricing for Cloud SQL is based on CPU and memory pricing, storage and networking pricing, and instance pricing.
For CPU and memory, it is dependent on location, the number of CPUs, and the amount of memory you plan to use. Read replicas are charged at the same rate as standalone instances.
For storage and networking, it is dependent on location, per GB of storage used per month, and the amount of egress networking from Cloud SQL.
Instance pricing is also dependent on where the instance is located and the shared-core machine type.
In this chapter, we covered the storage options available to us by looking at which ones were most appropriate for common requirements. We also learned about Google Cloud Storage, Cloud Firestore, and Cloud SQL.
With Google Cloud Storage, we looked at use cases, storage classes, and some main features of the service. We also covered some considerations to bear in mind when transferring data. It's clear that Cloud Storage offers a lot of flexibility, but when we need to store more structured data, we need to look at alternatives.
With Cloud Datastore, we learned that this service is a NoSQL database and is ideal for your situation, should your application rely on highly available and structured data. Also, Cloud Datastore can scale from zero up to terabytes of data with ease and is ideal for ACID transactions. We also learned that it offers eventual or strong consistency; however, if we have different requirements and a need for a relational database that has full SQL support for Online Transaction Processing (OLTP), then we should consider Cloud SQL.
For Cloud SQL, we learned that we have different offerings – MySQL, SQL Server, and PostgreSQL. We reviewed how to create and manage our instances, how to create replicas, and how to restore our instances.
In the next chapter, we will continue looking at GCP storage options by covering Cloud Spanner and Bigtable.
Read the following articles for more information on the topics that were covered in this chapter: