Chapter 6. Tables, Constraints, and Indexes

Tables constitute the building blocks of relational database storage. Structuring tables so that they form meaningful relationships is the key to relational database design. In PostgreSQL, constraints enforce relationships between tables. To distinguish a table from just a heap of data, we establish indexes. Much like the indexes you find at the end of books or the tenant list at the entrances to grand office buildings, indexes point to locations in the table so you don’t have to scour the table from top to bottom every time you’re looking for something.

In this chapter, we introduce syntax for creating tables and adding rows. We then move on to constraints to ensure that your data doesn’t get out of line. Finally, we show you how to add indexes to your tables to expedite searches.

Indexing a table is as much a programming task as it is an experimental endeavor. A misappropriated index is worse than useless. Not all indexes are created equal. Algorithmists have devised different kinds of indexes for different data types and different query types, all in an attempt to scrape that last morsel of speed from a query.

Tables

In addition to ordinary data tables, PostgreSQL offers several kinds of tables that are rather uncommon: temporary, unlogged, inherited, typed, and foreign (covered in Chapter 10).

Basic Table Creation

Example 6-1 shows the table creation syntax, which is similar to what you’ll find in all SQL databases.

Example 6-1. Basic table creation
CREATE TABLE logs (
log_id serial PRIMARY KEY, 1
user_name varchar(50), 2
description text, 3
log_ts timestamp with time zone NOT NULL DEFAULT current_timestamp
); 4
CREATE INDEX idx_logs_log_ts ON logs USING btree (log_ts);
1

serial is the data type used to represent an incrementing autonumber. Adding a serial column automatically adds an accompanying sequence object to the database schema. A serial data type is always an integer with the default value set to the next value of the sequence object. Each table usually has just one serial column, which often serves as the primary key. For very large tables, you should opt for the related bigserial.

2

varchar is shorthand for “character varying,” a variable-length string similar to what you will find in other databases. You don’t need to specify a maximum length; if you don’t, varchar will be almost identical to the text data type.

3

text is a string of indeterminate length. It’s never followed by a length restriction.

4

timestamp with time zone (shorthand timestamptz) is a date and time data type, always stored in UTC. It displays date and time in the server’s own time zone unless you tell it to otherwise. See “Time Zones: What They Are and Are Not” for a more thorough discussion.

New in version 10 is the IDENTITY qualifier for a column. IDENTITY is a more standard-compliant way of generating an autonumber for a table column.

You could turn the existing log_id column to the new IDENTITY construct using a sequence object:

DROP SEQUENCE logs_log_id_seq CASCADE;
ALTER TABLE logs
    ALTER COLUMN log_id ADD GENERATED BY DEFAULT AS IDENTITY;

If we already had data in the table, we’d need to prevent the numbering from starting at 1 with a statement like this:

ALTER TABLE logs
    ALTER COLUMN log_id RESTART WITH 2000;

If we were starting with a new table, we’d create it as shown in Example 6-2 using IDENTITY instead of serial.

Example 6-2. Basic table creation using IDENTITY
CREATE TABLE logs (
log_id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
user_name varchar(50),
description text,
log_ts timestamp with time zone NOT NULL DEFAULT current_timestamp
);

The structure of Example 6-2 is much the same as what we saw in Example 6-1 but more verbose.

Under what cases would you prefer to use IDENTITY over serial? The main benefit of the IDENTITY construct is that an identity is always tied to a specific table, so incrementing and resetting the value is managed with the table. A serial, on the other hand, creates a sequence object that may or may not be reused by other tables and needs to be dropped manually when it’s no longer needed. If you wanted to reset the number of a serial, you’d need to modify the related SEQUENCE object, which means knowing what the name of it is.

The serial approach is still useful if you need to reuse an autonumber generator across many tables. In that case, though, you’d create the sequence object separate from the table and set the table column default to the next value of the sequence. Internally, the new IDENTITY construct behaves much the same by creating behind the scenes a sequence object, but preventing that sequence object from being edited directly.

Inherited Tables

PostgreSQL stands alone as the only database product offering inherited tables. When you specify that a table (the child table) inherits from another table (the parent table), PostgreSQL creates the child table with its own columns plus all the columns of the parent table. PostgreSQL will remember this parent-child relationship so that any subsequent structural changes to the parent automatically propagate to its children. Parent-child table design is perfect for partitioning your data. When you query the parent table, PostgreSQL automatically includes all rows in the child tables. Not every trait of the parent passes down to the child. Notably, primary key constraints, foreign key constraints, uniqueness constraints, and indexes are never inherited. Check constraints are inherited, but children can have their own check constraints in addition to the ones they inherit from their parents (see Example 6-3).

Example 6-3. Inherited table creation
CREATE TABLE logs_2011 (PRIMARY KEY (log_id)) INHERITS (logs);
CREATE INDEX idx_logs_2011_log_ts ON logs_2011 USING btree(log_ts);
ALTER TABLE logs_2011
    ADD CONSTRAINT chk_y2011
    CHECK (
        log_ts >= '2011-1-1'::timestamptz AND log_ts < '2012-1-1'::timestamptz
    ); 1
1

We define a check constraint to limit data to the year 2011. Having the check constraint in place allows the query planner to skip inherited tables that do not satisfy the query condition.

A new feature in PostgreSQL 9.5 is inheritance between local and foreign tables: each type can now inherit from the other. This is all in pursuit of making sharding easier.

Partitioned Tables

New in version 10 are partitioned tables. Partitioned tables are much like inherited tables in that they allow partitioning of data across many tables and the planner can conditionally skip tables that don’t satisfy a query condition. Internally they are implemented much the same, but use a different DDL syntax.

Although partitioned tables replace the functionality of inherited tables in many cases, they are not complete replacements. Here are some key differences between inherited tables and partition tables:

  • A partitioned table group is created using the declarative partition syntax CREATE TABLE .. PARTITION BY RANGE ...

  • When partitions are used, data can be inserted into the core table and is rerouted automatically to the matching partition. This is not the case with inherited tables, where you either need to insert data into the child table, or have a trigger that reroutes data to the child tables.

  • All tables in a partition must have the same exact columns. This is unlike inherited tables, where child tables are allowed to have additional columns that are not in the parent tables.

  • Each partitioned table belongs to a single partitioned group. Internally that means it can have only one parent table. Inherited tables, on other hand, can inherit columns from multiple tables.

  • The parent of the partition can’t have primary keys, unique keys, or indexes, although the child partitions can. This is different from the inheritance tables, where the parent and each child can have a primary key that needs only to be unique within the table, not necessarily across all the inherited children.

  • Unlike inherited tables, the parent partitioned table can’t have any rows of its own. All inserts are redirected to a matching child partition and when no matching child partition is available, an error is thrown.

We’ll re-create the logs table from Example 6-1 as a partitioned table and create the child tables using partition syntax instead of the inheritance shown in Example 6-3.

First, we’ll drop our existing logs table and all its child tables:

DROP TABLE IF EXISTS logs CASCADE;

For a partitioned table set, the parent table must be noted as a partitioned table through the PARTITION BY syntax, as shown in Example 6-4. Contrast that to Example 6-1 where we just start with a regular table definition. Also note that we do not define a primary key because primary keys are not supported for the parent partition table.

Example 6-4. Basic table creation for partition
CREATE TABLE logs (
log_id int GENERATED BY DEFAULT AS IDENTITY,
user_name varchar(50),
description text,
log_ts timestamp with time zone NOT NULL DEFAULT current_timestamp
) PARTITION BY RANGE (log_ts);

Similar to inheritance, we create child tables of the partition, except instead of using CHECK constraints to denote allowed data in the child table, we use the FOR VALUES FROM DDL construct. We repeat the exercise from Example 6-3 in Example 6-5 but using the FOR VALUES FROM construct instead of INHERITS.

Example 6-5. Create a child partition
CREATE TABLE logs_2011 PARTITION OF logs 1
FOR VALUES FROM ('2011-1-1') TO ('2012-1-1') 2;
CREATE INDEX idx_logs_2011_log_ts ON logs_2011 USING btree(log_ts); 3
ALTER TABLE logs_2011 ADD CONSTRAINT pk_logs_2011  PRIMARY KEY (log_id) 4;
1

Define the new table as a partition of logs.

2

Define the set of data to be stored in this partition. Child partitions must not have overlapping ranges, so if you try to define a range that overlaps an existing range, the CREATE TABLE command will fail with an error.

34

Child partitions can have indexes and primary keys. As with inheritance, the primary key is not enforced across the whole partition set of tables.

Now if we were to insert data as follows:

INSERT INTO logs(user_name, description ) VALUES ('regina', 'Sleeping');

We’d get an error such as:

ERROR:  no partition of relation "logs" found for row
DETAIL:  Partition key of the failing row contains 
(log_ts) = (2017-05-25 02:58:28.057101-04).

If we then create a partition table for the current year:

CREATE TABLE logs_gt_2011 PARTITION OF logs
FOR VALUES FROM ('2012-1-1') TO (unbounded);

Unlike Example 6-5, we opted to use the PARTITION range keyword unbounded, which allows our partition to be used for future dates.

Repeating our insert now, we can see by SELECT * FROM logs_gt_2011; that our data got rerouted to the new partition.

In the real world, you would need to create indexes and primary keys on the new child for query efficiency.

Similar to the way inheritance works, when we query the parent table, all partitions that don’t satisfy the date filter are skipped, as shown in Example 6-6.

Example 6-6. Planner skipping other partitions
EXPLAIN ANALYZE SELECT * FROM logs WHERE log_ts > '2017-05-01';
Append  (cost=0.00..15.25 rows=140 width=162) 
(actual time=0.008..0.009 rows=1 loops=1)
  ->  Seq Scan on logs_gt_2011  (cost=0.00..15.25 rows=140 width=162) 
(actual time=0.008..0.008 rows=1 loops=1)
        Filter: (log_ts > '2017-05-01 00:00:00-04'::timestamp with time zone)
Planning time: 0.152 ms
Execution time: 0.022 ms

If you are using the PSQL packaged with PostgreSQL 10, you will get more information when you use the describe table command that details the partition ranges of the parent table:

d+ logs
Table "public.logs"
:
Partition key: RANGE (log_ts)
Partitions: logs_2011
    FOR VALUES FROM ('2011-01-01 00:00:00-05') TO ('2012-01-01 00:00:00-05'),
            logs_gt_2011
            FOR VALUES FROM ('2012-01-01 00:00:00-05') TO (UNBOUNDED)

Unlogged Tables

For ephemeral data that could be rebuilt in the event of a disk failure or doesn’t need to be restored after a crash, you might prefer having more speed than redundancy. The UNLOGGED modifier allows you to create unlogged tables, as shown in Example 6-7. These tables will not be part of any write-ahead logs. The big advantage of an unlogged table is that writing data to it is much faster than to a logged table—10−15 times faster in our experience.

If you accidentally unplug the power cord on the server and then turn the power back on, the rollback process will wipe clean all data in unlogged tables. Another consequence of making a table unlogged is that its data won’t be able to participate in PostgreSQL replication. A pg_dump option also allows you to skip the backing up of unlogged data.

Example 6-7. Unlogged table creation
CREATE UNLOGGED TABLE web_sessions (
	session_id text PRIMARY KEY,
	add_ts timestamptz,
	upd_ts timestamptz,
	session_state xml);

There are a few other sacrifices you have to make with unlogged tables. Prior to PostgreSQL 9.3, unlogged tables didn’t support GiST indexes (see “PostgreSQL Stock Indexes”), which are commonly used for more advanced data types such as arrays, ranges, json, full text, and spatial. Unlogged tables in any version will accommodate the common B-Tree and GIN indexes.

Prior to PostgreSQL 9.5, you couldn’t easily convert an UNLOGGED table to a logged one. To do so in version 9.5+, enter:

ALTER TABLE some_table SET LOGGED;

TYPE OF

PostgreSQL automatically creates a corresponding composite data type in the background whenever you create a new table. The reverse is not true. But you can use a composite data type as a template for creating tables. We’ll demonstrate this by first creating a type with the definition:

CREATE TYPE basic_user AS (user_name varchar(50), pwd varchar(10));

We can then create a table with rows that are instances of this type as shown in Example 6-8.

Example 6-8. Using TYPE to define a new table structure
CREATE TABLE super_users OF basic_user (CONSTRAINT pk_su PRIMARY KEY (user_name));

After creating tables from data types, you can’t alter the columns of the table. Instead, add or remove columns to the composite data type, and PostgreSQL will automatically propagate the changes to the table structure. Much like inheritance, the advantage of this approach is that if you have many tables sharing the same underlying structure and you need to make a universal alteration, you can do so by simply changing the underlying composite type.

Let’s say we now need to add a phone number to our super_users table from Example 6-8. All we have to do is execute the following command:

ALTER TYPE basic_user ADD ATTRIBUTE phone varchar(10) CASCADE;

Normally, you can’t change the definition of a type if tables depend on that type. The CASCADE modifier overrides this restriction, applying the same change to all dependent tables.

Constraints

PostgreSQL constraints are the most advanced (and most complex) of any database we’ve worked with. You can control all facets of how a constraint handles existing data, all cascade options, how to perform the matching, which indexes to incorporate, conditions under which the constraint can be violated, and more. On top of it all, you can pick your own name for each constraint. For the full treatment, we suggest you review the official documentation. You’ll find comfort in knowing that using the default settings usually works out fine. We’ll start off with something familiar to most relational folks: foreign key, unique, and check constraints. Then we’ll move on to exclusion constraints.

Warning

Names of primary key and unique key constraints must be unique within a given schema. A good practice is to include the name of the table and column as part of the name of the key. For the sake of brevity, our examples might not abide by this practice.

Foreign Key Constraints

PostgreSQL follows the same convention as most databases that support referential integrity. You can specify cascade update and delete rules to avoid pesky orphaned records. We show you how to add foreign key constraints in Example 6-9.

Example 6-9. Building foreign key constraints and covering indexes
SET search_path=census, public;
ALTER TABLE facts ADD CONSTRAINT fk_facts_1 FOREIGN KEY (fact_type_id)
REFERENCES lu_fact_types (fact_type_id) 1ON UPDATE CASCADE ON DELETE RESTRICT;
2
CREATE INDEX fki_facts_1 ON facts (fact_type_id); 3
1

We define a foreign key relationship between our facts and fact_types tables. This prevents us from introducing fact types into facts tables unless they are already present in the fact_types lookup table.

2

We add a cascade rule that automatically updates the fact_type_id in our facts table should we renumber our fact types. We restrict deletes from our lookup table so fact types in use cannot be removed. RESTRICT is the default behavior, but we suggest stating it for clarity.

3

Unlike for primary key and unique constraints, PostgreSQL doesn’t automatically create an index for foreign key constraints; you should add this yourself to speed up queries.

Foreign key constraints are important for data integrity. Newer versions of PostgreSQL can also use them to improve the planner’s thinking. In version 9.6, the planner was revised to use foreign key relationships to infer selectivity for join predicates, thus improving many types of queries.

Unique Constraints

Each table can have no more than a single primary key. If you need to enforce uniqueness on other columns, you must resort to unique constraints or unique indexes. Adding a unique constraint automatically creates an associated unique index. Similar to primary keys, unique key constraints can participate as the foreign key in foreign key constraints, but can have null values. A unique index without a unique key constraint can also have null values and in addition can use functions in its definition. The following example shows how to add a unique key:

ALTER TABLE logs_2011 ADD CONSTRAINT uq UNIQUE (user_name,log_ts);

Often you’ll find yourself needing to ensure uniqueness for only a subset of your rows. PostgreSQL does not offer conditional unique constraints, but you can achieve the same effect by using a partial uniqueness index. See “Partial Indexes”.

Check Constraints

Check constraints are conditions that must be met for a field or a set of fields for each row. The query planner takes advantage of check constraints by skipping tables that don’t meet the check constraints outright. We saw an example of a check constraint in Example 6-3. That particular example prevents the planner from having to scan rows failing to satisfy the date range specified in a query. You can exercise some creativity in your check constraints, because you can use functions and Boolean expressions to build complicated matching conditions. For example, the following constraint requires all usernames in the logs tables to be lowercase:

ALTER TABLE logs ADD CONSTRAINT chk CHECK (user_name = lower(user_name));

The other noteworthy aspect of check constraints is that unlike primary key, foreign key, and unique key constraints, they inherit from parent tables.

Exclusion Constraints

Exclusion constraints allow you to incorporate additional operators to enforce uniqueness that can’t be satisfied by the equality operator. Exclusion constraints are especially useful in problems involving scheduling.

PostgreSQL 9.2 introduced the range data types that are perfect candidates for exclusion constraints. You’ll find a fine example of using exclusion constraints for range data types at Waiting for 9.2 Range Data Types.

Exclusion constraints are generally enforced using GiST indexes, but you can create compound indexes that incorporate B-Tree as well. Before you do this, you need to install the btree_gist extension. A classic use of a compound exclusion constraint is for scheduling resources.

Here’s an example using exclusion constraints. Suppose you have a fixed number of conference rooms in your office, and groups must book them in advance. See how we’d prevent double-booking in Example 6-10, and how we are able to use the overlap operator (&&) for our temporal comparison and the usual equality operator for the room number.

Example 6-10. Prevent overlapping bookings for the same room
CREATE TABLE schedules(id serial primary key, room int, time_slot tstzrange);
ALTER TABLE schedules ADD CONSTRAINT ex_schedules
EXCLUDE USING gist (room WITH =, time_slot WITH &&);

Just as with uniqueness constraints, PostgreSQL automatically creates a corresponding index of the type specified in the constraint declaration.

Arrays are another popular type where EXCLUSION constraints come in handy. Let’s suppose you have a set of rooms that you need to assign to a group of people. We’ll call these room “blocks.” For expediency, you decide to store one record per party, but you want to ensure that two parties are never given the same room. So you set up a table as follows:

CREATE TABLE room_blocks(block_id integer primary key, rooms int[]);

To ensure that no two blocks have a room in common, you can set up an exclusion constraint preventing blocks from overlapping (two blocks having the same room). Exclusion constraints unfortunately work only with GiST indexes, and because GIST indexes don’t exist for arrays out of the box, you need to install an additional extension before you can do this, as shown in Example 6-11.

Example 6-11. Prevent overlapping array blocks
CREATE EXTENSION IF NOT EXISTS intarray;
ALTER TABLE room_blocks
 ADD CONSTRAINT ex_room_blocks_rooms
 EXCLUDE USING gist(rooms WITH &&);

The intarray extension provides GiST index support for integer arrays (int4, int8). After intarray is installed, you can then use GiST with arrays and create exclusion constraints on integer arrays.

Indexes

PostgreSQL comes with a lavish framework for creating and fine-tuning indexes. The art of PostgreSQL indexing could fill a tome all by itself. PostgreSQL is packaged with several types of indexes. If you find these inadequate, you can define new index operators and modifiers to supplement. If still unsatisfied, you’re free to invent your own index type.

PostgreSQL also allows you to mix and match different index types in the same table with the expectation that the planner will consider them all. For instance, one column could use a B-Tree index while an adjacent column uses a GiST index, with both indexes contributing to speed up the queries. To delve more into the mechanics of how the planner takes advantage of indexes, visit Bitmap Index Scan Strategy.

You can create indexes on tables (with the exception of foreign tables) as well as materialized views.

Warning

Index names must be unique within a given schema.

PostgreSQL Stock Indexes

To take full advantage of all that PostgreSQL has to offer, you’ll want to understand the various types of indexes and situations where they will aid or harm. Following is a list of stock indexes:

B-Tree

B-Tree is a general-purpose index common in relational databases. You can usually get by with B-Tree alone if you don’t want to experiment with additional index types. If PostgreSQL automatically creates an index for you or you don’t bother specifying the index method, B-Tree will be chosen. It is currently the only indexing method for primary keys and unique keys.

BRIN

Block range index (BRIN) is an index type introduced in PostgreSQL 9.4. It’s designed specifically for very large tables where using an index such as B-Tree would take up too much space and not fit in memory. The approach of BRIN is to treat a range of pages as one unit. BRIN indexes are much smaller than B-Tree and other indexes and faster to build. But they are slower to use and can’t be used for primary keys or certain other situations.

GiST

Generalized Search Tree (GiST) is an index optimized for FTS, spatial data, scientific data, unstructured data, and hierarchical data. Although you can’t use it to enforce uniqueness, you can create the same effect by using it in an exclusion constraint.

GiST is a lossy index, in the sense that the index itself will not store the value of what it’s indexing, but merely a bounding value such as a box for a polygon. This creates the need for an extra lookup step if you need to retrieve the value or do a more fine-tuned check.

GIN

Generalized Inverted Index (GIN) is geared toward the built-in full text search and binary json data type of PostgreSQL. Many other extensions, such as hstore and pg_trgm, also utilize it. GIN is a descendent of GiST but without the lossiness. GIN will clone the values in the columns that are part of the index. If you ever need a query limited to covered columns, GIN is faster than GiST. However, the extra replication required by GIN means the index is larger and updating the index is slower than a comparable GiST index. Also, because each index row is limited to a certain size, you can’t use GIN to index large objects such as large hstore documents or text. If there is a possibility you’ll be inserting a 600-page manual into a field of a table, don’t use GIN to index that column.

You can find a wonderful example of GIN in Waiting for Faster LIKE/ILIKE. As of version 9.3, you can index regular expressions that leverage the GIN-based pg_trgm extension.

SP-GiST

Space-Partitioned Generalized Search Tree (SP-GiST) can be used in the same situations as GiST but can be faster for certain kinds of data distribution. PostgreSQL’s native geometric data types, such as point and box, and the text data type, were the first to support SP-GiST. In version 9.3, support extended to range types.

hash

Hash indexes were popular prior to the advent of GiST and GIN. General consensus rates GiST and GIN above hash in terms of both performance and transaction safety. The write-ahead log prior to PostgreSQL 10 did not track hash indexes; therefore, you couldn’t use them in streaming replication setups. Although hash indexes were relegated to legacy status for some time, they got some love in PostgreSQL 10. In that version, they gained transactional safety and some performance improvements that made them more efficient than B-Tree in some cases.

B-Tree-GiST/B-Tree-GIN

If you want to explore indexes beyond what PostgreSQL installs by default, either out of need or curiosity, start with the composite B-Tree-GiST or B-Tree-GIN indexes, both available as extensions and included with most PostgreSQL distributions.

These hybrids support the specialized operators of GiST or GIN, but also offer indexability of the equality operator like B-Tree indexes. You’ll find them indispensable when you want to create a compound index comprised of multiple columns containing both simple and complex types. For example, you can have a compound index that consists of a column of plain text and a column of full text. Normally complex types such as full-text, ltree, geometric, and spatial types can use only GIN or GiST indexes, and thus can never be combined with simpler types that can only use B-Tree. These combo methods allow you to combine columns indexed with GIST with columns indexed with B-Tree in a single index.

Although not packaged with PostgreSQL, other indexes can be found in extensions for PostgreSQL. Most popular others are the VODKA and RUM (a variant based on GIN) index method types, which will work with PostgreSQL 9.6 and up. RUM is most suited for work with complex types such as full-text and is required if you need index support for full-text phrase searches. It also offers additional distance operators.

Another recent addition is pgroonga, a PostgreSQL extension currently supported for PostgreSQL 9.5 and 9.6. It brings the power of the groonga full-text engine and column store to PostgreSQL. PGRoonga includes with it an index called pgroonga and companion operators. PGRoonga supports indexing of regular text to produce full-text like functionality without needing to have a full-text vector, as the built-in PostgreSQL FTS requires. PGRoonga also makes ILIKE and LIKE '%something%' indexable similar to the pg_trgm extension. In addition, it supports indexing of text arrays and JSONB. There are binaries available for Linux/Mac and Windows.

Operator Classes

Most of you will skate through your index-capades without ever needing to know what operator classes (opclasses for short) are and why they matter for indexes. But if you falter, you’ll need to understand opclasses to troubleshoot the perennial question, “Why is the planner not taking advantage of my index?”

Index architects intend for their indexes to work only against certain data types and with specific comparison operators. An expert in indexing ranges could obsess over the overlap operator (&&), whereas an expert in indexing text searches may find little meaning in an overlap. A linguist trying to index logographic languages, such as Chinese, probably has little use for inequalities, whereas a linguist trying to index alphabetic languages would find A-to-Z sorting indispensable.

PostgreSQL groups operators into operator classes. For example, the int4_ops operator class includes the operators = < > > < to be applied against the data type of int4 (commonly known as an integer). The pg_opclass system table provides a complete listing of available operator classes, both from your original install and from extensions. A particular index will work only against a given set of opclasses. To see this complete list, you can either open up pgAdmin and look under operator classes, or execute the query in Example 6-12 to get a comprehensive view.

Example 6-12. Which data types and operator classes does B-Tree support?
SELECT am.amname AS index_method, opc.opcname AS opclass_name,
opc.opcintype::regtype AS indexed_type, opc.opcdefault AS is_default
FROM pg_am am INNER JOIN pg_opclass opc ON opc.opcmethod = am.oid
WHERE am.amname = 'btree'
ORDER BY index_method, indexed_type, opclass_name;
index_method | opclass_name        | indexed_type | is_default
-------------+---------------------+--------------+------------
btree        | bool_ops            | boolean      | t
⋮
btree        | text_ops            | text         | t
btree        | text_pattern_ops    | text         | f
btree        | varchar_ops         | text         | f
btree        | varchar_pattern_ops | text         | f
 :

In Example 6-12, we limit our result to B-Tree. Notice that one opclass per indexed data type is marked as the default. When you create an index without specifying the opclass, PostgreSQL chooses the default opclass for the index. Generally, this is good enough, but not always.

For instance, B-Tree against text_ops (aka varchar_ops) doesn’t include the ~~ operator (the LIKE operator), so none of your LIKE searches can use an index in the text_ops opclass. If you plan on doing many wildcard searches on varchar or text columns, you’d be better off explicitly choosing the text_pattern_ops/varchar_pattern_ops opclass for your index. To specify the opclass, just append the opclass after the column name, as in:

CREATE INDEX idx1 ON census.lu_tracts USING btree (tract_name text_pattern_ops);
Note

You will notice that the list contains both varchar_ops and text_ops, but they map only to text. character varying doesn’t have B-Tree operators of its own, because it is essentially text with a length constraint. varchar_ops and varchar_pattern_ops are just aliases for text_ops and text_pattern_ops to satisfy the desire of some to maintain this symmetry of opclasses starting with the name of the type they support.

Finally, remember that each index you create works against only a single opclass. If you would like an index on a column to cover multiple opclasses, you must create separate indexes. To add the default index text_ops to a table, run:

CREATE INDEX idx2 ON census.lu_tracts USING btree (tract_name);

Now you have two indexes against the same column. (There’s no limit to the number of indexes you can build against a single column.) The planner will choose idx2 for basic equality queries and idx1 for comparisons using LIKE.

You’ll find operator classes detailed in the Operator Classes section of the official documentation. We also strongly recommend that you read our article for tips on troubleshooting index issues, Why is My Index Not Used?

Functional Indexes

PostgreSQL lets you add indexes to functions of columns. Functional indexes prove their usefulness in mixed-case textual data. PostgreSQL is a case-sensitive database. To perform a case-insensitive search you could create a functional index:

CREATE INDEX idx ON featnames_short
USING btree (upper(fullname) varchar_pattern_ops);

This next example uses the same function to uppercase the fullname column before comparing. Since we created the index with the same upper(fullname) expression, the planner will be able to use the index for this query:

SELECT fullname FROM featnames_short WHERE upper(fullname) LIKE 'S%';
Warning

Always use the same functional expression when querying to ensure use of the index.

Partial Indexes

Partial indexes (sometimes called filtered indexes) are indexes that cover only rows fitting a predefined WHERE condition. For instance, if you have a table of 1,000,000 rows, but you care about a fixed set of 10,000, you’re better off creating partial indexes. The resulting indexes can be faster because more can fit into RAM, plus you’ll save a bit of disk space on the index itself.

Partial indexes let you place uniqueness constraints only on some rows of the data. Pretend that you manage newspaper subscribers who signed up in the past 10 years and want to ensure that nobody is getting more than one paper delivered per day. With dwindling interest in print media, only about 5% of your subscribers have a current subscription. You don’t care about subscribers being duplicated who have stopped getting newspapers, because they’re not on the carriers’ list anyway. Your table looks like this:

CREATE TABLE subscribers (
    id serial PRIMARY KEY,
    name varchar(50) NOT NULL, type varchar(50),
    is_active boolean);

We add a partial index to guarantee uniqueness only for current subscribers:

CREATE UNIQUE INDEX uq ON subscribers USING btree(lower(name)) WHERE is_active;
Warning

Functions used in the index’s WHERE condition must be immutable. This means you can’t use time functions like CURRENT_DATE or data from other tables (or other rows of the indexed table) to determine whether a record should be indexed.

One warning we stress is that when you query the data, in order for the index to be considered by the planner, the conditions used when creating the index must be a part of your WHERE condition and any functions used in the index must also be used in the query filter. This index is both PARTIAL and functional because what it indexes is upper(name) (not name). An easy way to not have to worry about this is to use a view. Back to our subscribers example, create a view as follows:

CREATE OR REPLACE VIEW vw_subscribers_current AS
SELECT id, lower(name) As name FROM subscribers WHERE is_active = true;

Then always query the view instead of the table (many purists advocate never querying tables directly anyway). A view is a saved query that is transparent to the planner. Any query done on a view will include the view’s WHERE conditions and functional additions as well as what other additions the query adds. The view we created does two things to make indexes available to queries. The view replaces the name column with lower(name), so that when we do a query against name with the view, it’s short-hand for lower(name) against the underlying table. The view also enables is_active = true, which means any query against the view will automatically have that condition in it and be able to use the PARTIAL index:

SELECT * FROM vw_subscribers_current WHERE name = 'sandy';

You can open up the planner and confirm that the planner indeed used your index.

Multicolumn Indexes

You’ve already seen many examples of multicolumn (aka compound) indexes in this chapter, but you can also create functional indexes using more than one underlying column. Here is an example of a multicolumn index:

CREATE INDEX idx ON subscribers 
USING btree (type, upper(name) varchar_pattern_ops);

The PostgreSQL planner uses a strategy called bitmap index scan that automatically tries to combine indexes on the fly, often from single-column indexes, to achieve the same goal as a multicolumn index. If you’re unable to predict how you’ll be querying compound fields in the future, you may be better off creating single-column indexes and let the planner decide how to combine them during search.

If you have a multicolumn B-Tree index on type and upper(name), there is no need for an index on just type, because the planner can still use the compound index for cases in which you just need to filter by type. Although the planner can use the index even if the columns you are querying are not the first in the index, querying by the first column in an index is much more efficient than querying by just secondary columns.

The planner can also employ a strategy called an index-only scan, which enables the planner to use just the index and not the table if the index contains all the columns needed to satisfy a query. So if you commonly filter by the same set of fields and output those, a compound index can improve speed since it can skip the table. Keep in mind that the more columns you have in an index, the fatter your index and the less of it that can easily fit in RAM. Don’t go overboard with compound indexes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset