Clustering tables

In PostgreSQL, there is a command called CLUSTER that allows us to rewrite a table in the desired order. It is possible to point to an index and store data in the same order as the index:

test=# h CLUSTER  
Command: CLUSTER 
Description: cluster a table according to an index 
Syntax: 
CLUSTER [VERBOSE] table_name [ USING index_name ] 
CLUSTER [VERBOSE]

The CLUSTER command has been around for many years and serves its purpose well. But, there are some things to consider before blindly running it on a production system:

  • The CLUSTER command will lock the table while it is running. You cannot insert or modify data while CLUSTER is running. This might not be acceptable on a production system.
  • Data can only be organized according to one index. You cannot order a table by postal code, name, ID, birthday, and so on, at the same time. It means that CLUSTER will make sense if there are search criteria, which is used most of the time.
  • Keep in mind that the example outlined in this book is more of a worst-case scenario. In reality, the performance difference between a clustered and a non- clustered table will depend on the workload, the amount of data retrieved, cache hit rates, and a lot more.
  • The clustered state of a table will not be maintained as changes are made to a table during normal operations. Correlation will usually deteriorate as time goes by.

Here is an example of how to run the CLUSTER command:

test=# CLUSTER t_random USING idx_random; 
CLUSTER

Depending on the size of the table, the time needed to cluster will vary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset