Tables constitute the building blocks of relational database storage. Structuring tables so that they form meaningful relationships is the key to relational database design. In PostgreSQL, constraints enforce relationships between tables. To distinguish a table from just a heap of data, we establish indexes. Much like the indexes you find at the end of books or the tenant list at the entrances to grand office buildings, indexes point to locations in the table so you don’t have to scour the table from top to bottom every time you’re looking for something.
In this chapter, we introduce syntax for creating tables and adding rows. We then move on to constraints to ensure that your data doesn’t get out of line. Finally, we show you how to add indexes to your tables to expedite searches.
Indexing a table is as much a programming task as it is an experimental endeavor. A misappropriated index is worse than useless. Not all indexes are created equal. Algorithmists have devised different kinds of indexes for different data types and different query types, all in an attempt to scrape that last morsel of speed from a query.
In addition to ordinary data tables, PostgreSQL offers several kinds of tables that are rather uncommon: temporary, unlogged, inherited, typed, and foreign (covered in Chapter 10).
Example 6-1 shows the table creation syntax, which is similar to what you’ll find in all SQL databases.
CREATE
TABLE
logs
(
log_id
serial
PRIMARY
KEY
,
user_name
varchar
(
50
)
,
description
text
,
log_ts
timestamp
with
time
zone
NOT
NULL
DEFAULT
current_timestamp
)
;
CREATE
INDEX
idx_logs_log_ts
ON
logs
USING
btree
(
log_ts
)
;
serial
is the data type used to represent an incrementing autonumber. Adding a
serial column automatically adds an accompanying sequence object to
the database schema. A serial data type is always an integer with
the default value set to the next value of the sequence object. Each
table usually has just one serial column, which often serves as the
primary key. For very large tables, you should opt for the related
bigserial
.
varchar
is shorthand for “character varying,” a variable-length string
similar to what you will find in other databases. You don’t need to
specify a maximum length; if you don’t, varchar
will be
almost identical to the text data type.
text
is a string of indeterminate length. It’s never followed by a
length restriction.
timestamp with time zone
(shorthand
timestamptz
) is a date and time data type, always stored in UTC. It
displays date and time in the server’s own time zone unless you tell
it to otherwise. See “Time Zones: What They Are and Are Not” for a
more thorough discussion.
New in version 10 is the IDENTITY qualifier for a column. IDENTITY is a more standard-compliant way of generating an autonumber for a table column.
You could turn the existing log_id column to the new IDENTITY construct using a sequence object:
DROP SEQUENCE logs_log_id_seq CASCADE; ALTER TABLE logs ALTER COLUMN log_id ADD GENERATED BY DEFAULT AS IDENTITY;
If we already had data in the table, we’d need to prevent the numbering from starting at 1 with a statement like this:
ALTER TABLE logs ALTER COLUMN log_id RESTART WITH 2000;
If we were starting with a new table, we’d create it as shown in Example 6-2 using IDENTITY instead of serial.
CREATE
TABLE
logs
(
log_id
int
GENERATED
BY
DEFAULT
AS
IDENTITY
PRIMARY
KEY
,
user_name
varchar
(
50
),
description
text
,
log_ts
timestamp
with
time
zone
NOT
NULL
DEFAULT
current_timestamp
);
The structure of Example 6-2 is much the same as what we saw in Example 6-1 but more verbose.
Under what cases would you prefer to use IDENTITY over serial? The main benefit of the IDENTITY construct is that an identity is always tied to a specific table, so incrementing and resetting the value is managed with the table. A serial, on the other hand, creates a sequence object that may or may not be reused by other tables and needs to be dropped manually when it’s no longer needed. If you wanted to reset the number of a serial, you’d need to modify the related SEQUENCE object, which means knowing what the name of it is.
The serial approach is still useful if you need to reuse an autonumber generator across many tables. In that case, though, you’d create the sequence object separate from the table and set the table column default to the next value of the sequence. Internally, the new IDENTITY construct behaves much the same by creating behind the scenes a sequence object, but preventing that sequence object from being edited directly.
PostgreSQL stands alone as the only database product offering inherited tables. When you specify that a table (the child table) inherits from another table (the parent table), PostgreSQL creates the child table with its own columns plus all the columns of the parent table. PostgreSQL will remember this parent-child relationship so that any subsequent structural changes to the parent automatically propagate to its children. Parent-child table design is perfect for partitioning your data. When you query the parent table, PostgreSQL automatically includes all rows in the child tables. Not every trait of the parent passes down to the child. Notably, primary key constraints, foreign key constraints, uniqueness constraints, and indexes are never inherited. Check constraints are inherited, but children can have their own check constraints in addition to the ones they inherit from their parents (see Example 6-3).
CREATE
TABLE
logs_2011
(
PRIMARY
KEY
(
log_id
)
)
INHERITS
(
logs
)
;
CREATE
INDEX
idx_logs_2011_log_ts
ON
logs_2011
USING
btree
(
log_ts
)
;
ALTER
TABLE
logs_2011
ADD
CONSTRAINT
chk_y2011
CHECK
(
log_ts
>
=
'2011-1-1'
:
:
timestamptz
AND
log_ts
<
'2012-1-1'
:
:
timestamptz
)
;
We define a check constraint to limit data to the year 2011. Having the check constraint in place allows the query planner to skip inherited tables that do not satisfy the query condition.
A new feature in PostgreSQL 9.5 is inheritance between local and foreign tables: each type can now inherit from the other. This is all in pursuit of making sharding easier.
New in version 10 are partitioned tables. Partitioned tables are much like inherited tables in that they allow partitioning of data across many tables and the planner can conditionally skip tables that don’t satisfy a query condition. Internally they are implemented much the same, but use a different DDL syntax.
Although partitioned tables replace the functionality of inherited tables in many cases, they are not complete replacements. Here are some key differences between inherited tables and partition tables:
A partitioned table group is created using the declarative
partition syntax CREATE TABLE .. PARTITION BY
RANGE ..
.
When partitions are used, data can be inserted into the core table and is rerouted automatically to the matching partition. This is not the case with inherited tables, where you either need to insert data into the child table, or have a trigger that reroutes data to the child tables.
All tables in a partition must have the same exact columns. This is unlike inherited tables, where child tables are allowed to have additional columns that are not in the parent tables.
Each partitioned table belongs to a single partitioned group. Internally that means it can have only one parent table. Inherited tables, on other hand, can inherit columns from multiple tables.
The parent of the partition can’t have primary keys, unique keys, or indexes, although the child partitions can. This is different from the inheritance tables, where the parent and each child can have a primary key that needs only to be unique within the table, not necessarily across all the inherited children.
Unlike inherited tables, the parent partitioned table can’t have any rows of its own. All inserts are redirected to a matching child partition and when no matching child partition is available, an error is thrown.
We’ll re-create the logs
table
from Example 6-1 as a
partitioned table and create the child tables using partition syntax
instead of the inheritance shown in Example 6-3.
First, we’ll drop our existing logs table and all its child tables:
DROP TABLE IF EXISTS logs CASCADE;
For a partitioned table set, the parent table must be noted as a
partitioned table through the PARTITION BY
syntax,
as shown in Example 6-4. Contrast that to Example 6-1 where we just start with a
regular table definition. Also note that we do not define a primary key
because primary keys are not supported for the parent partition
table.
CREATE
TABLE
logs
(
log_id
int
GENERATED
BY
DEFAULT
AS
IDENTITY
,
user_name
varchar
(
50
),
description
text
,
log_ts
timestamp
with
time
zone
NOT
NULL
DEFAULT
current_timestamp
)
PARTITION
BY
RANGE
(
log_ts
);
Similar to inheritance, we create child tables of the partition,
except instead of using CHECK constraints to denote allowed data in the
child table, we use the FOR VALUES FROM
DDL construct. We repeat
the exercise from Example 6-3 in Example 6-5 but using the FOR
VALUES FROM
construct instead of INHERITS
.
CREATE
TABLE
logs_2011
PARTITION
OF
logs
FOR
VALUES
FROM
(
'2011-1-1'
)
TO
(
'2012-1-1'
)
;
CREATE
INDEX
idx_logs_2011_log_ts
ON
logs_2011
USING
btree
(
log_ts
)
;
ALTER
TABLE
logs_2011
ADD
CONSTRAINT
pk_logs_2011
PRIMARY
KEY
(
log_id
)
;
Define the new table as a partition of
logs
.
Define the set of data to be stored in this partition. Child partitions must not have overlapping ranges, so if you try to define a range that overlaps an existing range, the CREATE TABLE command will fail with an error.
Child partitions can have indexes and primary keys. As with inheritance, the primary key is not enforced across the whole partition set of tables.
Now if we were to insert data as follows:
INSERT INTO logs(user_name, description ) VALUES ('regina', 'Sleeping');
We’d get an error such as:
ERROR: no partition of relation "logs" found for row DETAIL: Partition key of the failing row contains (log_ts) = (2017-05-25 02:58:28.057101-04).
If we then create a partition table for the current year:
CREATE
TABLE
logs_gt_2011
PARTITION
OF
logs
FOR
VALUES
FROM
(
'2012-1-1'
)
TO
(
unbounded
);
Unlike Example 6-5, we opted
to use the PARTITION range keyword unbounded
, which allows our partition to be
used for future dates.
Repeating our insert now, we can see by SELECT * FROM
logs_gt_2011;
that our data got rerouted to the new
partition.
In the real world, you would need to create indexes and primary keys on the new child for query efficiency.
Similar to the way inheritance works, when we query the parent table, all partitions that don’t satisfy the date filter are skipped, as shown in Example 6-6.
EXPLAIN ANALYZE SELECT * FROM logs WHERE log_ts > '2017-05-01';
Append (cost=0.00..15.25 rows=140 width=162) (actual time=0.008..0.009 rows=1 loops=1) -> Seq Scan on logs_gt_2011 (cost=0.00..15.25 rows=140 width=162) (actual time=0.008..0.008 rows=1 loops=1) Filter: (log_ts > '2017-05-01 00:00:00-04'::timestamp with time zone) Planning time: 0.152 ms Execution time: 0.022 ms
If you are using the PSQL packaged with PostgreSQL 10, you will get more information when you use the describe table command that details the partition ranges of the parent table:
d+ logs
Table "public.logs" : Partition key: RANGE (log_ts) Partitions: logs_2011 FOR VALUES FROM ('2011-01-01 00:00:00-05') TO ('2012-01-01 00:00:00-05'), logs_gt_2011 FOR VALUES FROM ('2012-01-01 00:00:00-05') TO (UNBOUNDED)
For ephemeral data that could be rebuilt in the event of a disk failure or doesn’t
need to be restored after a crash, you might prefer having more speed
than redundancy. The UNLOGGED
modifier allows you to create unlogged tables, as shown in Example 6-7. These tables will not be part of
any write-ahead logs. The big advantage of an unlogged table is that
writing data to it is much faster than to a logged table—10−15 times
faster in our experience.
If you accidentally unplug the power cord on the server and then turn the power back on, the rollback process will wipe clean all data in unlogged tables. Another consequence of making a table unlogged is that its data won’t be able to participate in PostgreSQL replication. A pg_dump option also allows you to skip the backing up of unlogged data.
CREATE
UNLOGGED
TABLE
web_sessions
(
session_id
text
PRIMARY
KEY
,
add_ts
timestamptz
,
upd_ts
timestamptz
,
session_state
xml
);
There are a few other sacrifices you have to make with unlogged tables. Prior to PostgreSQL 9.3, unlogged tables didn’t support GiST indexes (see “PostgreSQL Stock Indexes”), which are commonly used for more advanced data types such as arrays, ranges, json, full text, and spatial. Unlogged tables in any version will accommodate the common B-Tree and GIN indexes.
Prior to PostgreSQL 9.5, you couldn’t easily convert an UNLOGGED table to a logged one. To do so in version 9.5+, enter:
ALTER TABLE some_table
SET LOGGED;
PostgreSQL automatically creates a corresponding composite data type in the background whenever you create a new table. The reverse is not true. But you can use a composite data type as a template for creating tables. We’ll demonstrate this by first creating a type with the definition:
CREATE
TYPE
basic_user
AS
(
user_name
varchar
(
50
),
pwd
varchar
(
10
));
We can then create a table with rows that are instances of this type as shown in Example 6-8.
CREATE
TABLE
super_users
OF
basic_user
(
CONSTRAINT
pk_su
PRIMARY
KEY
(
user_name
));
After creating tables from data types, you can’t alter the columns of the table. Instead, add or remove columns to the composite data type, and PostgreSQL will automatically propagate the changes to the table structure. Much like inheritance, the advantage of this approach is that if you have many tables sharing the same underlying structure and you need to make a universal alteration, you can do so by simply changing the underlying composite type.
Let’s say we now need to add a phone number to our
super_users
table from Example 6-8. All we have to do is execute
the following command:
ALTER
TYPE
basic_user
ADD
ATTRIBUTE
phone
varchar
(
10
)
CASCADE
;
Normally, you can’t change the definition of a type if tables
depend on that type. The CASCADE
modifier overrides this
restriction, applying the same change to all dependent tables.
PostgreSQL constraints are the most advanced (and most complex) of any database we’ve worked with. You can control all facets of how a constraint handles existing data, all cascade options, how to perform the matching, which indexes to incorporate, conditions under which the constraint can be violated, and more. On top of it all, you can pick your own name for each constraint. For the full treatment, we suggest you review the official documentation. You’ll find comfort in knowing that using the default settings usually works out fine. We’ll start off with something familiar to most relational folks: foreign key, unique, and check constraints. Then we’ll move on to exclusion constraints.
Names of primary key and unique key constraints must be unique within a given schema. A good practice is to include the name of the table and column as part of the name of the key. For the sake of brevity, our examples might not abide by this practice.
PostgreSQL follows the same convention as most databases that support referential integrity. You can specify cascade update and delete rules to avoid pesky orphaned records. We show you how to add foreign key constraints in Example 6-9.
SET
search_path
=
census
,
public
;
ALTER
TABLE
facts
ADD
CONSTRAINT
fk_facts_1
FOREIGN
KEY
(
fact_type_id
)
REFERENCES
lu_fact_types
(
fact_type_id
)
ON
UPDATE
CASCADE
ON
DELETE
RESTRICT
;
CREATE
INDEX
fki_facts_1
ON
facts
(
fact_type_id
)
;
We define a foreign key relationship between our facts and fact_types tables. This prevents us from introducing fact types into facts tables unless they are already present in the fact_types lookup table.
We add a cascade rule that automatically updates the
fact_type_id in our facts table should we renumber our fact types.
We restrict deletes from our lookup table so fact types in use
cannot be removed. RESTRICT
is the default behavior,
but we suggest stating it for clarity.
Unlike for primary key and unique constraints, PostgreSQL doesn’t automatically create an index for foreign key constraints; you should add this yourself to speed up queries.
Foreign key constraints are important for data integrity. Newer versions of PostgreSQL can also use them to improve the planner’s thinking. In version 9.6, the planner was revised to use foreign key relationships to infer selectivity for join predicates, thus improving many types of queries.
Each table can have no more than a single primary key. If you need to enforce uniqueness on other columns, you must resort to unique constraints or unique indexes. Adding a unique constraint automatically creates an associated unique index. Similar to primary keys, unique key constraints can participate as the foreign key in foreign key constraints, but can have null values. A unique index without a unique key constraint can also have null values and in addition can use functions in its definition. The following example shows how to add a unique key:
ALTER
TABLE
logs_2011
ADD
CONSTRAINT
uq
UNIQUE
(
user_name
,
log_ts
);
Often you’ll find yourself needing to ensure uniqueness for only a subset of your rows. PostgreSQL does not offer conditional unique constraints, but you can achieve the same effect by using a partial uniqueness index. See “Partial Indexes”.
Check constraints are conditions that must be met for a field or a set of fields for each row. The query planner takes advantage of check constraints by skipping tables that don’t meet the check constraints outright. We saw an example of a check constraint in Example 6-3. That particular example prevents the planner from having to scan rows failing to satisfy the date range specified in a query. You can exercise some creativity in your check constraints, because you can use functions and Boolean expressions to build complicated matching conditions. For example, the following constraint requires all usernames in the logs tables to be lowercase:
ALTER
TABLE
logs
ADD
CONSTRAINT
chk
CHECK
(
user_name
=
lower
(
user_name
));
The other noteworthy aspect of check constraints is that unlike primary key, foreign key, and unique key constraints, they inherit from parent tables.
Exclusion constraints allow you to incorporate additional operators to enforce uniqueness that can’t be satisfied by the equality operator. Exclusion constraints are especially useful in problems involving scheduling.
PostgreSQL 9.2 introduced the range data types that are perfect candidates for exclusion constraints. You’ll find a fine example of using exclusion constraints for range data types at Waiting for 9.2 Range Data Types.
Exclusion constraints are generally enforced using GiST indexes, but you can create compound indexes that incorporate B-Tree as well. Before you do this, you need to install the btree_gist extension. A classic use of a compound exclusion constraint is for scheduling resources.
Here’s an example using exclusion constraints. Suppose you have a
fixed number of conference rooms in your office, and groups must book
them in advance. See how we’d prevent double-booking in Example 6-10, and how we are able to use
the overlap operator (&&
) for our
temporal comparison and the usual equality operator for the room
number.
CREATE
TABLE
schedules
(
id
serial
primary
key
,
room
int
,
time_slot
tstzrange
);
ALTER
TABLE
schedules
ADD
CONSTRAINT
ex_schedules
EXCLUDE
USING
gist
(
room
WITH
=
,
time_slot
WITH
&&
);
Just as with uniqueness constraints, PostgreSQL automatically creates a corresponding index of the type specified in the constraint declaration.
Arrays are another popular type where EXCLUSION constraints come in handy. Let’s suppose you have a set of rooms that you need to assign to a group of people. We’ll call these room “blocks.” For expediency, you decide to store one record per party, but you want to ensure that two parties are never given the same room. So you set up a table as follows:
CREATE TABLE room_blocks(block_id integer primary key, rooms int[]);
To ensure that no two blocks have a room in common, you can set up an exclusion constraint preventing blocks from overlapping (two blocks having the same room). Exclusion constraints unfortunately work only with GiST indexes, and because GIST indexes don’t exist for arrays out of the box, you need to install an additional extension before you can do this, as shown in Example 6-11.
CREATE
EXTENSION
IF
NOT
EXISTS
intarray
;
ALTER
TABLE
room_blocks
ADD
CONSTRAINT
ex_room_blocks_rooms
EXCLUDE
USING
gist
(
rooms
WITH
&&
);
The intarray extension provides GiST index support for integer arrays (int4, int8). After intarray is installed, you can then use GiST with arrays and create exclusion constraints on integer arrays.
PostgreSQL comes with a lavish framework for creating and fine-tuning indexes. The art of PostgreSQL indexing could fill a tome all by itself. PostgreSQL is packaged with several types of indexes. If you find these inadequate, you can define new index operators and modifiers to supplement. If still unsatisfied, you’re free to invent your own index type.
PostgreSQL also allows you to mix and match different index types in the same table with the expectation that the planner will consider them all. For instance, one column could use a B-Tree index while an adjacent column uses a GiST index, with both indexes contributing to speed up the queries. To delve more into the mechanics of how the planner takes advantage of indexes, visit Bitmap Index Scan Strategy.
You can create indexes on tables (with the exception of foreign tables) as well as materialized views.
To take full advantage of all that PostgreSQL has to offer, you’ll want to understand the various types of indexes and situations where they will aid or harm. Following is a list of stock indexes:
B-Tree is a general-purpose index common in relational databases. You can usually get by with B-Tree alone if you don’t want to experiment with additional index types. If PostgreSQL automatically creates an index for you or you don’t bother specifying the index method, B-Tree will be chosen. It is currently the only indexing method for primary keys and unique keys.
Block range index (BRIN) is an index type introduced in PostgreSQL 9.4. It’s designed specifically for very large tables where using an index such as B-Tree would take up too much space and not fit in memory. The approach of BRIN is to treat a range of pages as one unit. BRIN indexes are much smaller than B-Tree and other indexes and faster to build. But they are slower to use and can’t be used for primary keys or certain other situations.
Generalized Search Tree (GiST) is an index optimized for FTS, spatial data, scientific data, unstructured data, and hierarchical data. Although you can’t use it to enforce uniqueness, you can create the same effect by using it in an exclusion constraint.
GiST is a lossy index, in the sense that the index itself will not store the value of what it’s indexing, but merely a bounding value such as a box for a polygon. This creates the need for an extra lookup step if you need to retrieve the value or do a more fine-tuned check.
Generalized Inverted Index (GIN) is geared toward the built-in full text search and binary json data type of PostgreSQL. Many other extensions, such as hstore and pg_trgm, also utilize it. GIN is a descendent of GiST but without the lossiness. GIN will clone the values in the columns that are part of the index. If you ever need a query limited to covered columns, GIN is faster than GiST. However, the extra replication required by GIN means the index is larger and updating the index is slower than a comparable GiST index. Also, because each index row is limited to a certain size, you can’t use GIN to index large objects such as large hstore documents or text. If there is a possibility you’ll be inserting a 600-page manual into a field of a table, don’t use GIN to index that column.
You can find a wonderful example of GIN in Waiting for Faster LIKE/ILIKE. As of version 9.3, you can index regular expressions that leverage the GIN-based pg_trgm extension.
Space-Partitioned Generalized Search Tree (SP-GiST) can be used in the same situations as GiST but can be faster for certain kinds of data distribution. PostgreSQL’s native geometric data types, such as point and box, and the text data type, were the first to support SP-GiST. In version 9.3, support extended to range types.
Hash indexes were popular prior to the advent of GiST and GIN. General consensus rates GiST and GIN above hash in terms of both performance and transaction safety. The write-ahead log prior to PostgreSQL 10 did not track hash indexes; therefore, you couldn’t use them in streaming replication setups. Although hash indexes were relegated to legacy status for some time, they got some love in PostgreSQL 10. In that version, they gained transactional safety and some performance improvements that made them more efficient than B-Tree in some cases.
If you want to explore indexes beyond what PostgreSQL installs by default, either out of need or curiosity, start with the composite B-Tree-GiST or B-Tree-GIN indexes, both available as extensions and included with most PostgreSQL distributions.
These hybrids support the specialized operators of GiST or GIN, but also offer indexability of the equality operator like B-Tree indexes. You’ll find them indispensable when you want to create a compound index comprised of multiple columns containing both simple and complex types. For example, you can have a compound index that consists of a column of plain text and a column of full text. Normally complex types such as full-text, ltree, geometric, and spatial types can use only GIN or GiST indexes, and thus can never be combined with simpler types that can only use B-Tree. These combo methods allow you to combine columns indexed with GIST with columns indexed with B-Tree in a single index.
Although not packaged with PostgreSQL, other indexes can be found in extensions for PostgreSQL. Most popular others are the VODKA and RUM (a variant based on GIN) index method types, which will work with PostgreSQL 9.6 and up. RUM is most suited for work with complex types such as full-text and is required if you need index support for full-text phrase searches. It also offers additional distance operators.
Another recent addition is pgroonga, a PostgreSQL extension currently supported for PostgreSQL 9.5 and 9.6. It brings the power of the groonga full-text engine and column store to PostgreSQL. PGRoonga includes with it an index called pgroonga and companion operators. PGRoonga supports indexing of regular text to produce full-text like functionality without needing to have a full-text vector, as the built-in PostgreSQL FTS requires. PGRoonga also makes ILIKE and LIKE '%something%' indexable similar to the pg_trgm extension. In addition, it supports indexing of text arrays and JSONB. There are binaries available for Linux/Mac and Windows.
Most of you will skate through your index-capades without ever needing to know what operator classes (opclasses for short) are and why they matter for indexes. But if you falter, you’ll need to understand opclasses to troubleshoot the perennial question, “Why is the planner not taking advantage of my index?”
Index architects intend for their indexes to work only against
certain data types and with specific comparison operators. An expert in
indexing ranges could obsess over the overlap operator (&&
), whereas an expert in indexing
text searches may find little meaning in an overlap. A linguist trying
to index logographic languages, such as Chinese, probably has little use
for inequalities, whereas a linguist trying to index alphabetic
languages would find A-to-Z sorting indispensable.
PostgreSQL groups operators into operator classes. For example,
the int4_ops
operator class includes the operators = < > > <
to be applied against
the data type of int4
(commonly known as an integer). The
pg_opclass system table provides a complete listing of available
operator classes, both from your original install and from extensions. A
particular index will work only against a given set of opclasses. To see
this complete list, you can either open up pgAdmin and look under
operator classes, or execute the query in Example 6-12 to get a comprehensive
view.
SELECT
am
.
amname
AS
index_method
,
opc
.
opcname
AS
opclass_name
,
opc
.
opcintype
::
regtype
AS
indexed_type
,
opc
.
opcdefault
AS
is_default
FROM
pg_am
am
INNER
JOIN
pg_opclass
opc
ON
opc
.
opcmethod
=
am
.
oid
WHERE
am
.
amname
=
'btree'
ORDER
BY
index_method
,
indexed_type
,
opclass_name
;
index_method | opclass_name | indexed_type | is_default -------------+---------------------+--------------+------------ btree | bool_ops | boolean | t ⋮ btree | text_ops | text | t btree | text_pattern_ops | text | f btree | varchar_ops | text | f btree | varchar_pattern_ops | text | f :
In Example 6-12, we limit our result to B-Tree. Notice that one opclass per indexed data type is marked as the default. When you create an index without specifying the opclass, PostgreSQL chooses the default opclass for the index. Generally, this is good enough, but not always.
For instance, B-Tree against text_ops
(aka
varchar_ops
) doesn’t include the ~~
operator (the LIKE
operator), so none of your LIKE
searches can use an index in the
text_ops
opclass. If you plan on doing many wildcard
searches on varchar
or text
columns, you’d be
better off explicitly choosing the
text_pattern_ops
/varchar_pattern_ops
opclass
for your index. To specify the opclass, just append the opclass after
the column name, as in:
CREATE
INDEX
idx1
ON
census
.
lu_tracts
USING
btree
(
tract_name
text_pattern_ops
);
You will notice that the list contains both
varchar_ops
and text_ops
, but they map only
to text
. character
varying
doesn’t have B-Tree operators of its own, because it
is essentially text with a length constraint. varchar_ops
and varchar_pattern_ops
are just aliases for
text_ops
and text_pattern_ops
to satisfy the
desire of some to maintain this symmetry of opclasses starting with
the name of the type they support.
Finally, remember that each index you create works against only a
single opclass. If you would like an index on a column to cover multiple
opclasses, you must create separate indexes. To add the default index
text_ops
to a table, run:
CREATE
INDEX
idx2
ON
census
.
lu_tracts
USING
btree
(
tract_name
);
Now you have two indexes against the same column. (There’s no
limit to the number of indexes you can build against a single column.)
The planner will choose idx2
for basic equality queries and
idx1
for comparisons using LIKE.
You’ll find operator classes detailed in the Operator Classes section of the official documentation. We also strongly recommend that you read our article for tips on troubleshooting index issues, Why is My Index Not Used?
PostgreSQL lets you add indexes to functions of columns. Functional indexes prove their usefulness in mixed-case textual data. PostgreSQL is a case-sensitive database. To perform a case-insensitive search you could create a functional index:
CREATE
INDEX
idx
ON
featnames_short
USING
btree
(
upper
(
fullname
)
varchar_pattern_ops
);
This next example uses the same function to uppercase the fullname
column before comparing. Since we created the index with the same
upper(fullname)
expression, the planner will be able to use
the index for this query:
SELECT
fullname
FROM
featnames_short
WHERE
upper
(
fullname
)
LIKE
'S%'
;
Always use the same functional expression when querying to ensure use of the index.
Partial indexes (sometimes called filtered indexes) are indexes that cover only rows
fitting a predefined WHERE
condition. For instance, if you
have a table of 1,000,000 rows, but you care about a fixed set of
10,000, you’re better off creating partial indexes. The resulting
indexes can be faster because more can fit into RAM, plus you’ll save a
bit of disk space on the index itself.
Partial indexes let you place uniqueness constraints only on some rows of the data. Pretend that you manage newspaper subscribers who signed up in the past 10 years and want to ensure that nobody is getting more than one paper delivered per day. With dwindling interest in print media, only about 5% of your subscribers have a current subscription. You don’t care about subscribers being duplicated who have stopped getting newspapers, because they’re not on the carriers’ list anyway. Your table looks like this:
CREATE
TABLE
subscribers
(
id
serial
PRIMARY
KEY
,
name
varchar
(
50
)
NOT
NULL
,
type
varchar
(
50
),
is_active
boolean
);
We add a partial index to guarantee uniqueness only for current subscribers:
CREATE
UNIQUE
INDEX
uq
ON
subscribers
USING
btree
(
lower
(
name
))
WHERE
is_active
;
Functions used in the index’s WHERE
condition must
be immutable. This means you can’t use time functions like CURRENT_DATE
or data from other tables (or
other rows of the indexed table) to determine whether a record should
be indexed.
One warning we stress is that when you query the data, in order
for the index to be considered by the planner, the conditions used when
creating the index must be a part of your WHERE
condition
and any functions used in the index must also be used in the query
filter. This index is both PARTIAL and functional because what it
indexes is upper(name)
(not name
). An easy way
to not have to worry about this is to use a view. Back to our
subscribers example, create a view as follows:
CREATE
OR
REPLACE
VIEW
vw_subscribers_current
AS
SELECT
id
,
lower
(
name
)
As
name
FROM
subscribers
WHERE
is_active
=
true
;
Then always query the view instead of the table (many purists
advocate never querying tables directly anyway). A view is a saved query
that is transparent to the planner. Any query done on a view will
include the view’s WHERE conditions and functional additions as well as
what other additions the query adds. The view we created does two things
to make indexes available to queries. The view replaces the name column
with lower(name), so that when we do a query against name with the view,
it’s short-hand for lower(name) against the underlying table. The view
also enables is_active = true
, which means any query
against the view will automatically have that condition in it and be
able to use the PARTIAL index:
SELECT
*
FROM
vw_subscribers_current
WHERE
name
=
'sandy'
;
You can open up the planner and confirm that the planner indeed used your index.
You’ve already seen many examples of multicolumn (aka compound) indexes in this chapter, but you can also create functional indexes using more than one underlying column. Here is an example of a multicolumn index:
CREATE
INDEX
idx
ON
subscribers
USING
btree
(
type
,
upper
(
name
)
varchar_pattern_ops
);
The PostgreSQL planner uses a strategy called bitmap index scan that automatically tries to combine indexes on the fly, often from single-column indexes, to achieve the same goal as a multicolumn index. If you’re unable to predict how you’ll be querying compound fields in the future, you may be better off creating single-column indexes and let the planner decide how to combine them during search.
If you have a multicolumn B-Tree index on type
and
upper(name)
, there is no need for an index on just
type
, because the planner can still use the compound index
for cases in which you just need to filter by type
.
Although the planner can use the index even if the columns you are
querying are not the first in the index, querying by the first column in
an index is much more efficient than querying by just secondary columns.
The planner can also employ a strategy called an index-only scan, which enables the planner to use just the index and not the table if the index contains all the columns needed to satisfy a query. So if you commonly filter by the same set of fields and output those, a compound index can improve speed since it can skip the table. Keep in mind that the more columns you have in an index, the fatter your index and the less of it that can easily fit in RAM. Don’t go overboard with compound indexes.