Chapter 2. Creating Data Structures

This chapter is dedicated to data structures and their creation. The following topics will be covered:

  • Grouping columns the right way
  • Deciding on the right data type

Grouping columns the right way

Some people might wonder what is meant by the title of this section. Does it make a difference which order columns are aligned in? This might be a bit surprising, but it does. Even if a table contains the same data, its size on the disk might vary depending on the order of columns. Here is an example:

test=# CREATE TABLE t_test (  i1 int,
          i2 int,
          i3 int,
          v1 varchar(100),
          v2 varchar(100),
          v3 varchar(100)
);
CREATE TABLE
test=# INSERT INTO t_test SELECT 10, 20, 30,
  'abcd', 'abcd', 'abcd'
  FROM generate_series(1, 10000000);
INSERT 0 10000000

A table with three columns has been created. First of all, there are three integer columns. Then some varchar columns are added. In the second statement, 10 million rows are added. The generate_series command is a nice way to generate a list of numbers. In this example, the output of generate_series is not used. I am just utilizing the function to repeat the static data in the SELECT clause.

Now the size of the table can be checked:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 574 MB
(1 row)

The pg_relation_size returns the size of a table in bytes. In the case of a large relation, this is somewhat unhandy because users may easily end up with very large, hard-to-read numbers. Therefore, the return value of pg_relation_size should be wrapped into pg_size_pretty, which makes the size a lot more useful, as data is returned in a more human-readable format.

Let's try the same example again. This time, however, the order of the columns is changed. The rest of the example stays exactly the same:

test=# CREATE TABLE t_test (	v1 varchar(100),
          i1 int,
          v2 varchar(100),
          i2 int,
          v3 varchar(100),
          i3 int
);
CREATE TABLE
test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd',
      20, 'abcd', 30
  FROM generate_series(1, 10000000);
INSERT 0 10000000
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 651 MB
(1 row)

The table has grown considerably, even though the data inside the table is exactly the same. The reason for this problem is called alignment and can be found deep inside the code. The theory is as follows: a CPU has a hard time if a field does not start at a multiplier of CPU word-size. Therefore, PostgreSQL will accordingly align data physically.

The most important point here is that it can make sense to group columns with similar data types next to each other. Of course, the outcome and the potential difference in size greatly depend on the content. If "abc" was used instead of "abcd" in this example, the results would not have shown any difference; both versions of the table would have had a fixed size of 498 MB.

Tip

Note that different types of CPUs (x86_32, x86_64, Sparc, ARM, and so on.) may not lead to the same results here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset