Using the TABLESAMPLE clause

Table sampling has long been the real strength of commercial database vendors. Traditional database systems have provided sampling for many years. However, the monopoly has been broken. Since PostgreSQL 9.5, we have also had a solution to the problem of sampling.

Here's how it works:

test=# CREATE TABLE t_test (id int); 
CREATE TABLE
test=# INSERT INTO t_test
SELECT * FROM generate_series(1, 1000000);
INSERT 0 1000000

First, a table containing 1 million rows is created. Then, tests can be executed:

test=# SELECT count(*), avg(id) 
FROM t_test TABLESAMPLE BERNOULLI (1);
count | avg

--------+---------------------
9802 | 502453.220873291165
(1 row)
test=# SELECT count(*), avg(id)
FROM t_test TABLESAMPLE BERNOULLI (1);
count | avg

--------+---------------------
10082 | 497514.321959928586
(1 row)

In this example, the same test is executed twice. A 1% random sample is used in each case. Both average values are pretty close to 5 million, so the result is pretty good from a statistical point of view.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset