Introducing Amazon Redshift

Amazon Redshift is one of the database as a service (DBaaS) offerings from AWS that provides a massively scalable data warehouse as a managed service, at significantly lower costs. The data warehouse is based on the open source PostgreSQL database technology however; not all features offered in PostgreSQL are present in Amazon Redshift. Here's a look at some of the essential concepts and terminologies that you ought to keep in mind when working with Amazon Redshift:

  • Clusters: Just like Amazon EMR, Amazon Redshift too relies on the concept of clusters. Clusters here are logical containers containing one or more instances or compute nodes, and one leader node that is responsible for the cluster's overall management. Here's a brief look at what each node provides:
    • Leader node: The leader node is a single node present in a cluster that is responsible for orchestrating and executing various database operations, as well as facilitating communication between the database and associate client programs.
    • Compute node: Compute nodes are responsible for executing the code provided by the leader node. Once executed, the compute nodes share the results back to the leader node for aggregation. Amazon Redshift supports two types of compute nodes: dense storage nodes and dense compute nodes. The dense storage nodes provide standard hard disk drives for creating large data warehouses; whereas, the dense compute nodes provide higher performance SSDs. You can start off by using a single node that provides 160 GB of storage and scale up to petabytes by leveraging one or more 16 TB capacity instances as well.
  • Node slices: Each compute node is partitioned into one or more smaller chunks or slices by the leader node, based on the cluster's initial size. Each slice contains a portion of the compute nodes memory, CPU and disk resource, and uses these resources to process certain workloads that are assigned to it. The assignment of workloads is again performed by the leader node.
  • Databases: As mentioned earlier, Amazon Redshift provides a scalable database that you can leverage for a data warehouse, as well as analytical purposes. With each cluster that you spin in Redshift, you can create one or more associated databases with it. The database is based on the open source relational database PostgreSQL (v8.0.2) and thus, can be used in conjunction with other RDBMS tools and functionalities. Applications and clients can communicate with the database using standard PostgreSQL JDBC and ODBC drivers.

Here is a representational image of a working data warehouse cluster powered by Amazon Redshift:

With this basic information in mind, let's look at some simple and easy to follow steps using which you can set up and get started with your Amazon Redshift cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset