Using the TABLESAMPLE clause

Table sampling has long been the real strength of commercial database vendors. Traditional database systems have provided sampling for many years. However, the monopoly has been broken. Since PostgreSQL 9.5, we have also had a solution to the problem of sampling.

Here's how it works:

test=# CREATE TABLE t_test (id int); 
CREATE TABLE 
test=# INSERT INTO t_test 
    SELECT * FROM generate_series(1, 1000000); 
INSERT 0 1000000

First, a table containing 1 million rows is created. Then, tests can be executed:

test=# SELECT count(*), avg(id) 
        FROM t_test TABLESAMPLE BERNOULLI (1); 
 count  |     avg 
--------+--------------------- 
 9802   |     502453.220873291165 
(1 row) 
test=# SELECT count(*), avg(id) 
        FROM t_test TABLESAMPLE BERNOULLI (1); 
 count  |     avg 
--------+--------------------- 
 10082  |     497514.321959928586
(1 row)

In this example, the same test is executed twice. A 1% random sample is used in each case. Both average values are pretty close to 5 million, so the result is pretty good from a statistical point of view.

Table of Contents for Using the TABLESAMPLE clause

Create new playlist

Sign In

Sign Up

Table of Contents for
Using the TABLESAMPLE clause