Structure of a SELECT query

Let's start with a simple example. We will use the sample database of the car web portal, which was described in previous chapters.

To connect to the database, the following command is used:

> psql -h localhost car_portal

There is a table called car that contains information about cars registered in the system. Suppose it is necessary to query the database to get the information about cars that have three doors. They should be sorted by their ID. The output should be limited to five records due to pagination in the user interface. The query will look like this:

SELECT car_id, registration_number, manufacture_year
FROM car_portal_app.car
WHERE number_of_doors=3
ORDER BY car_id
LIMIT 5;

This is the result:

 car_id | registration_number | manufacture_year
--------+---------------------+------------------
2 | VSVW4565 | 2014
5 | BXGK6290 | 2009
6 | ORIU9886 | 2007
7 | TGVF4726 | 2009
8 | JISW6779 | 2013
(5 rows)

The syntax and the logic of the query are the following. The query starts from the keyword SELECT, which determines the type of the statement. Therefore, this keyword is always required. The keyword is followed by the comma-separated list of the fields to be retrieved from the database. Instead of the list, it is possible to use an asterisk *, which would mean that all the fields from the table are selected.

The name of the table is specified after the FROM keyword. It is possible to get the data from several tables at the same time. The filter criteria--predicate--is after the WHERE keyword. The sorting rule is at the end after ORDER BY. The LIMIT keyword makes the database return not more than five rows, even if the number of records in the table is bigger.

These parts of the query—the keywords and the following expressions--are called clauses, such as the FROM clause, the WHERE clause, and so on. All of these clauses have their own purpose and logic. They must follow each other in the specific order. None of them is mandatory. The simplified syntax diagram for the SELECT statement is as follows:

SELECT [DISTINCT | ALL] <expression>[[AS] <output_name>][, …]
[FROM <table>[, <table>… | <JOIN clause>…]
[WHERE <condition>]
[GROUP BY <expression>|<output_name>|<output_number> [,…]]
[HAVING <condition>]
[ORDER BY <expression>|<output_name>|<output_number> [ASC | DESC] [NULLS FIRST | LAST] [,…]]
[OFFSET <expression>]
[LIMIT <expression>];

Some elements were not included here such as the WINDOW clause, the WITH clause, or FOR UPDATE. A complete syntax diagram can be found in the documentation at http://www.postgresql.org/docs/current/static/sql-select.html.

Some of the omitted elements will be described in the next chapters.

There is no part of the SELECT statement that is always mandatory. For example, the query might be simpler if no ordering or filtering is needed:

SELECT * FROM car_portal_app.car;

Even the FROM clause is not mandatory. When one needs to evaluate an expression that does not take any data from the database, the query takes this form:

car_portal=> SELECT 1;
?column?
----------
1

This can be considered as Hello world in SQL.

The FROM clause is optional in PostgreSQL but in other RDBMS such as Oracle, the FROM keyword may be required.

Logically, the sequence of the operations performed by the SELECT query is as follows:

  1. Take all the records from all the source tables. If there are subqueries in the FROM clause, they are evaluated first.
  2. Build all possible combinations of those records and discard the combinations that do not follow the JOIN conditions or set some fields to NULL in case of outer joins.
  3. Filter out the combinations that do not match the condition of the WHERE clause.
  4. Build groups based on the values of the expressions of the GROUP BY list.
  5. Filter the groups that match the HAVING conditions.
  6. Evaluate expressions of the select-list.
  7. Eliminate duplicated rows if DISTINCT is specified.
  8. Apply the set operations UNION, EXCEPT, or INTERSECT.
  9. Sort rows according to the ORDER BY clause.
  10. Discard records according to OFFSET and LIMIT.

In fact, PostgreSQL optimizes that algorithm by performing the steps in a different order or even simultaneously. For example, if LIMIT 1 is specified, then it does not make sense to retrieve all the rows from the source tables, but only one that would match the WHERE condition.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset