Appendix A. SQL fundamentals

A table, with its rows and columns, is a familiar sight to anyone who has worked with an SQL database. Sometimes you'll see tables referred to as relations, rows as tuples, and columns as attributes. This is the language of the relational data model, the mathematical model that SQL databases (imperfectly) implement.

The relational model allows you to define data structures and constraints that guarantee the integrity of your data (for example, by disallowing values that don't accord with your business rules). The relational model also defines the relational operations of restriction, projection, Cartesian product, and relational join [Codd, 1970]. These operations let you do useful things with your data, such as summarizing or navigating it.

Each of the operations produces a new table from a given table or combination of tables. SQL is a language for expressing these operations in your application (therefore called a data language) and for defining the base tables on which the operations are performed.

You write SQL data definition language (DDL) statements to create and manage the tables. We say that DDL defines the database schema. Statements such as CREATE TABLE, ALTER TABLE, and CREATE SEQUENCE belong to DDL.

You write SQL data manipulation language (DML) statements to work with your data at runtime. Let's describe these DML operations in the context of some tables of the CaveatEmptor application.

In CaveatEmptor, you naturally have entities like item, user, and bid. We assume that the SQL database schema for this application includes an ITEM table and a BID table, as shown in figure A.1. The datatypes, tables, and constraints for this schema are created with SQL DDL (CREATE and ALTER operations).

Insertion is the operation of creating a new table from an old table by adding a row. SQL databases perform this operation in place, so the new row is added to the existing table:

insert into ITEM values (4, 'Fum', 45.0)

Figure A-1. Example tables with example data

An SQL update modifies an existing row:

update ITEM set PRICE = 47.0 where ITEM_ID = 4

A deletion removes a row:

delete from ITEM where ITEM_ID = 4

The real power of SQL lies in querying data. A single query may perform many relational operations on several tables. Let's look at the basic operations.

Restriction is the operation of choosing rows of a table that match a particular criterion. In SQL, this criterion is the expression that occurs in the where clause:

select * from ITEM where NAME like 'F%'

Projection is the operation of choosing columns of a table and eliminating duplicate rows from the result. In SQL, the columns to be included are listed in the select clause. You can eliminate duplicate rows by specifying the distinct keyword:

select distinct NAME from ITEM

A Cartesian product (also called cross join) produces a new table consisting of all possible combinations of rows of two existing tables. In SQL, you express a Cartesian product by listing tables in the from clause:

select * from ITEM i, BID b

A relational join produces a new table by combining the rows of two tables. For each pair of rows for which a join condition is true, the new table contains a row with all field values from both joined rows. In ANSI SQL, the join clause specifies a table join; the join condition follows the on keyword. For example, to retrieve all items that have bids, you join the ITEM and the BID table on their common ITEM_ID attribute:

select * from ITEM i inner join BID b on i.ITEM_ID = b.ITEM_ID

A join is equivalent to a Cartesian product followed by a restriction. So, joins are often instead expressed in theta style, with a product in the from clause and the join condition in the where clause. This SQL theta-style join is equivalent to the previous ANSI-style join:

select * from ITEM i, BID b where i.ITEM_ID = b.ITEM_ID

Along with these basic operations, relational databases define operations for aggregating rows (GROUP BY) and ordering rows (ORDER BY):

select b.ITEM_ID, max(b.AMOUNT)
from BID b
group by b.ITEM_ID
having max(b.AMOUNT) > 15
order by b.ITEM_ID asc

SQL was called a structured query language in reference to a feature called subselects. Because each relational operation produces a new table from an existing table or tables, an SQL query may operate on the result table of a previous query. SQL lets you express this using a single query, by nesting the first query inside the second:

select *
from (
    select b.ITEM_ID as ITEM, max(b.AMOUNT) as AMOUNT
    from BID b
    group by b.ITEM_ID
)
where AMOUNT > 15
order by ITEM asc

The result of this query is equivalent to the previous one.

A subselect may appear anywhere in an SQL statement; the case of a subselect in the where clause is the most interesting:

select * from BID b
   where b.AMOUNT >= (select max(c.AMOUNT) from BID c)

This query returns the largest bids in the database. Where clause subselects are often combined with quantification. The following query is equivalent:

select * from BID b
   where b.AMOUNT >= all(select c.AMOUNT from BID c)

An SQL restriction criterion is expressed in a sophisticated expression language that supports mathematical expressions, function calls, string matching, and perhaps even more sophisticated features such as full-text search:

select * from ITEM i
    where lower(i.DESCRIPTION) like '%gc%'
       or lower(i.DESCRIPTION) like '%excellent%'

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset