Images

CHAPTER 10

SQL Joins

Exam Objectives

• 061.6.1    Write SELECT Statements to Access Data from More Than One Table Using Equijoins and Nonequijoins

• 061.6.2    Join a Table to Itself Using a Self-Join

• 061.6.3    View Data that Does Not Meet a Join Condition Using Outer Joins

• 061.6.4    Generate a Cartesian Product of Two or More Tables

The three pillars of relational theory are selection, projection, and joining. This chapter focuses on the practical implementation of joining. Rows from different tables are associated with each other using joins. Support for joining has implications for the way data is stored in database tables. Many data models such as third normal form or star schemas have emerged to exploit this feature.

Tables can be joined in several ways. The most common technique is called an equijoin. A row is associated with one or more rows in another table based on the equality of column values or expressions. Tables can also be joined using a nonequijoin. In this case, a row is associated with one or more rows in another table if its column values fall into a range determined by inequality operators.

A less common technique is to associate rows with other rows in the same table. This association is based on columns with logical and usually hierarchical relationships with each other. This is called a self-join. Rows with null or differing entries in common join columns are excluded when equijoins and nonequijoins collectively known as inner joins are performed. An outer join is available to fetch these one-legged or orphaned rows if necessary.

A cross join or Cartesian product is formed when every row from one table is joined to all rows in another. This join is often the result of missing or inadequate join conditions but is occasionally intentional.

Write SELECT Statements to Access Data from More Than One Table Using Equijoins and Nonequijoins

This certification objective receives extensive coverage in this chapter. It is crucial to learning the concepts and language for performing joins. This section introduces different types of joins in their primitive forms and outlines the broad categories that are available. This section then discusses the various join clauses. The chapter discusses modern American National Standards Institute (ANSI)–compliant and traditional Oracle syntaxes, with emphasis on the modern syntax. This section concludes with a discussion of nonequijoins and additional join conditions. Joining is described by focusing on the following eight areas:

•  Types of joins

•  Joining tables using ANSI SQL syntax

•  Qualifying ambiguous column names

•  The NATURAL JOIN clause

•  The JOIN USING clause

•  The JOIN ON clause

•  N-way joins and additional join conditions

•  Nonequijoins

Types of Joins

Two basic joins are the equijoin and the nonequijoin. Equijoins are probably more frequently used. Joins can be performed between multiple tables, but much of the following discussion will use two hypothetical tables to illustrate the concepts and language of joins. The first table is called the source, and the second is called the target. Rows in the source and target tables comprise one or more columns. As an example, assume that the source and target are the COUNTRIES and REGIONS tables from the HR schema, respectively.

The COUNTRIES table contains three columns named COUNTRY_ID, COUNTRY_NAME, and REGION_ID. The REGIONS table is composed of two columns named REGION_ID and REGION_NAME. The data in these two tables is related to each other based on the common REGION_ID column. Consider the following queries:

Images

The name of the region to which a country belongs may be determined by obtaining its REGION_ID value. This value is used to join it with the row in the REGIONS table with the same REGION_ID. Query 1 retrieves the column values associated with the row from the COUNTRIES table where the COUNTRY_ID=‘CA’. The REGION_ID value of this row is 2. Query 2 fetches the Americas REGION_NAME from the REGIONS table for the row with REGION_ID=2. Equijoining facilitates the retrieval of column values from multiple tables using a single query.

The source and target tables can be swapped, so the REGIONS table could be the source and the COUNTRIES table could be the target. Consider the following two queries:

Images

Query 1 fetches one row with a REGION_ID value of 2. Joining in this reversed manner allows the following question to be asked: What countries belong to the Americas region? The answers from Query 2 are five COUNTRY_NAME values: Argentina, Brazil, Canada, Mexico, and the United States of America. These results can be obtained from a single query that joins the tables together. The language to perform equijoins, nonequijoins, outer joins, and cross joins is introduced next, along with a discussion of the traditional Oracle join syntax.

Inner Joins

The inner join is implemented using three possible join clauses that use the following keywords in different combinations: NATURAL JOIN, USING, and ON.

When the source and target tables share identically named columns, it is possible to perform a natural join between them without specifying a join column. In this scenario, columns with the same names in the source and target tables are automatically associated with each other. Rows with matching column values in both tables are retrieved. The REGIONS and COUNTRIES table both share the REGION_ID column. They may be naturally joined without specifying join columns, as shown in the first two queries in Figure 10-1.

Images

Figure 10-1    Natural joins and other inner joins

The NATURAL JOIN keywords instruct Oracle to identify columns with identical names between the source and target tables. Thereafter, a join is implicitly performed between them. In the first query, the REGION_ID column is identified as the only commonly named column in both tables. REGIONS is the source table and appears after the FROM clause. The target table is therefore COUNTRIES. For each row in the REGIONS table, a match for the REGION_ID value is sought from all the rows in the COUNTRIES table. An interim result set is constructed containing rows matching the join condition. This set is then restricted by the WHERE clause. In this case, because the COUNTRY_NAME value must be Canada, a REGION_NAME of Americas is returned.

The second query shows a natural join where COUNTRIES is the source table. The REGION_ID value for each row in the COUNTRIES table is identified. The set of rows from the REGIONS table to be used in the matching exercise with the COUNTRIES table is first pruned by the WHERE condition to only those with Americas as their REGION_NAME value. The COUNTRY_NAME values from rows with Americas as their REGION_NAME are returned from the COUNTRIES table.

Sometimes more control must be exercised regarding which columns to use for joins. When there are identical column names in the source and target tables you want to exclude as join columns, you can use the JOIN . . . USING format. Remember that Oracle does not impose any rules stating that columns with the same name in two discrete tables must necessarily have any relationship with each other. The third query explicitly specifies that the REGIONS table be joined to the COUNTRIES table based on common values in their REGION_ID columns. This syntax allows inner joins to be formed on specific columns instead of on all commonly named columns.

The fourth query demonstrates the JOIN . . . ON format of the inner join, which allows join columns to be explicitly stated. This format does not depend on the columns in the source and target tables having identical names. This form is more general and is the most widely used inner join format.

Outer Joins

Not all tables share a perfect relationship, where every record in the source table can be matched to at least one row in the target table. It is occasionally required that rows with nonmatching join column values also be retrieved by a query. This may seem to defeat the purpose of joins but has some practical benefits.

Suppose the EMPLOYEES and DEPARTMENTS tables are joined with common DEPARTMENT_ID values. EMPLOYEES records with null DEPARTMENT_ID values are excluded along with values absent from the DEPARTMENTS table. An outer join fetches these rows.

Cross Joins

A cross join or Cartesian product derives its names from mathematics, where it is also referred to as a cross product between two sets or matrices. This join creates one row of output for every combination of source and target table rows.

If the source and target tables have three and four rows, respectively, a cross join between them results in (3 × 4 = 12) rows being returned. Consider the row counts retrieved from the queries in Figure 10-2.

Images

Figure 10-2    Cross join

The first two row counts are performed on the COUNTRIES and REGIONS tables yielding 25 and 4 rows, respectively. Query 3 counts the number of rows returned from a cross join of these tables and yields 100. Query 4 would return 100 records if the WHERE clause was absent. Each of the four rows in the REGIONS table is joined to the one row from the COUNTRIES table. Each row returned contains every column from both tables.

Oracle Join Syntax

A proprietary Oracle join syntax has evolved that is stable and understood by millions of users. This traditional syntax is supported by Oracle and is present in software systems across the world. You will no doubt encounter the traditional Oracle join syntax that is now making way for the standardized ANSI-compliant syntax discussed in this chapter.

The traditional Oracle join syntax supports inner joins, outer joins, and Cartesian joins, as shown in the following queries:

Images

Query 1 performs an inner join by specifying the join as a condition in the WHERE clause. This is the most significant difference between the traditional and ANSI SQL join syntaxes. Take note of the column aliasing using the TABLE.COLUMN_NAME notation to disambiguate the identical column names. This notation is discussed in detail later in this chapter. Query 2 specifies the join between the source and target tables as a WHERE condition. There is a plus (+) symbol enclosed in brackets to the left of the equal sign that indicates to Oracle that a right outer join must be performed. This query returns employees’ LAST_NAME and their matching DEPARTMENT_NAME values. In addition, the outer join retrieves DEPARTMENT_NAME from the rows with DEPARTMENT_ID values not currently assigned to any employee records. Query 3 performs a Cartesian or cross join by excluding the join condition.

Joining Tables Using ANSI SQL Syntax

Prior to Oracle 9i, the traditional join syntax was the only language available to join tables. Since then, Oracle has introduced a new language that is compliant with the latest ANSI standards. It offers no performance benefits over the traditional syntax. Inner, outer, and cross joins can be written using both ANSI SQL and traditional Oracle SQL.

The general form of the SELECT statement using ANSI SQL syntax is as follows:

SELECT table1.column, table2.column
FROM table1
[NATURAL JOIN table2] |
[JOIN table2 USING (column_name)] |
[JOIN table2 ON (table1.column_name = table2.column_name)] |
[LEFT | RIGHT | FULL OUTER JOIN table2 ON (table1.column_name = table2.column_name)] |
[CROSS JOIN table2];

The following sections explain this syntax and provide examples. The general form of the traditional Oracle-proprietary syntax relevant to joins is as follows:

SELECT table1.column, table2.column
FROM table1, table2
[WHERE (table1.column_name = table2.column_name)] |
[WHERE (table1.column_name(+)= table2.column_name)] |
[WHERE (table1.column_name)= table2.column_name) (+)] ;

If no joins or fewer than N-1 joins are specified in the WHERE clause conditions, where N refers to the number of tables in the query, then a Cartesian or cross join is performed. If an adequate number of join conditions is specified, then the first optional conditional clause specifies an inner join, while the second two optional clauses specify the syntax for right and left outer joins.

Qualifying Ambiguous Column Names

Columns with the same names may occur in tables involved in a join. The columns named DEPARTMENT_ID and MANAGER_ID are found in both the EMPLOYEES and DEPARTMENTS tables. The REGION_ID column is present in both the REGIONS and COUNTRIES tables. Listing such columns in a query becomes problematic when Oracle cannot resolve their origin. Columns with unique names across the tables involved in a join cause no ambiguity because Oracle can easily resolve their source table.

The problem of ambiguous column names is addressed with dot notation. A column can be prefixed by its table name and a dot or period symbol to designate its origin. This differentiates it from a column with the same name in another table. Dot notation can be used in queries involving any number of tables. Referencing some columns using dot notation does not imply that all columns must be referenced in this way.

Dot notation is enhanced with table aliases. A table alias provides an alternative, usually shorter name, for a table. A column may be referenced as TABLE_NAME.COLUMN_NAME or TABLE_ALIAS.COLUMN_NAME. Consider the query shown in Figure 10-3.

Images

Figure 10-3    Dot notation

The EMPLOYEES table is aliased with the short name EMP, while the DEPARTMENTS table is not. The SELECT clause references the EMPLOYEE_ID and MANAGER_ID columns as EMP.EMPLOYEE_ID and EMP.MANAGER_ID. The MANAGER_ID column from the DEPARTMENTS table is referred to as DEPARTMENTS.MANAGER_ID. Qualifying the EMPLOYEE_ID column using dot notation is unnecessary because there is only one column with this name between the two tables. Therefore, there is no ambiguity.

The MANAGER_ID column must be qualified to avoid ambiguity because it is featured in both tables. Since the JOIN . . . USING format is applied, only DEPARTMENT_ID is used as the join column. If a NATURAL JOIN was employed, both the DEPARTMENT_ID and MANAGER_ID columns would be used. If the MANAGER_ID column was not qualified, an “ORA-00918: column ambiguously defined” error would be returned. If DEPARTMENT_ID was aliased, an “ORA-25154: column part of USING clause cannot have qualifier” error would be raised.

SQL Developer provides the heading MANAGER_ID to the first reference made in the SELECT clause. The string “_1” is automatically appended to the second reference, creating the heading MANAGER_ID_1.

The NATURAL JOIN Clause

The general syntax for the NATURAL JOIN clause is as follows:

SELECT table1.column, table2.column
FROM table1
NATURAL JOIN table2;

The natural join identifies the columns with common names in table1 and table2 and implicitly joins the tables using all these columns. The columns in the SELECT clause can be qualified using dot notation unless they are one of the join columns. Consider the following queries:

Images

The natural join identifies columns with common names between the two tables. In Query 1, COUNTRY_ID occurs in both tables and becomes the join column. Query 2 is written using traditional Oracle syntax and retrieves the same rows as Query 1. Unless you are familiar with the columns in the source and target tables, natural joins must be used with caution because join conditions are automatically formed between all columns with shared names.

Query 3 performs a natural join between the JOBS and COUNTRIES tables. There are no columns with identical names, resulting in a Cartesian product. Query 4 is equivalent to Query 3, and a Cartesian join is performed using traditional Oracle syntax.

The natural join is simple but prone to a fundamental weakness. It suffers the risk that two columns with the same name might have no relationship and may not even have compatible data types. In Figure 10-4, the COUNTRIES, REGIONS, and SALES_REGIONS tables are described. The SALES_REGIONS table was constructed to illustrate the following important point: Although it has REGION_ID in common with the COUNTRIES table, it cannot be naturally joined to it because their data types are incompatible. The data types of the COUNTRIES.REGION_ID and SALES_REGIONS.REGION_ID columns are NUMBER and VARCHAR2, respectively. The character data cannot be implicitly converted into numeric data, and an “ORA-01722: invalid number” error is raised. The REGIONS.REGION_ID column is of type NUMBER, and its data is related to the data in the COUNTRIES table. Therefore, the natural join between the REGIONS and COUNTRIES table works perfectly.

Images

Figure 10-4    The natural join

Exercise 10-1: Using the NATURAL JOIN    The JOB_HISTORY table shares three identically named columns with the EMPLOYEES table: EMPLOYEE_ID, JOB_ID, and DEPARTMENT_ID. You are required to describe the tables and fetch the EMPLOYEE_ID, JOB_ID, DEPARTMENT_ID, LAST_NAME, HIRE_DATE, and END_DATE values for all rows retrieved using a natural join. Alias the EMPLOYEES table as EMP and the JOB_HISTORY table as JH and use dot notation where possible.

1.  Start SQL*Plus and connect to the HR schema.

2.  The tables are described using the commands DESC EMPLOYEES and DESC JOB_HISTORY, and the columns with identical names and their data types can be examined.

3.  The FROM clause is as follows:

Images

4.  The JOIN clause is as follows:

Images

5.  The SELECT clause is as follows:

Images

6.  Executing this statement returns a single row with the same EMPLOYEE_ID, JOB_ID, and DEPARTMENT_ID values in both tables.

Images

The JOIN USING Clause

The format of the syntax for the JOIN USING clause is as follows:

SELECT table1.column, table2.column
FROM table1
JOIN table2 USING (join_column1, join_column2…);

While the natural join contains the NATURAL keyword in its syntax, the JOIN . . . USING syntax does not. An error is raised if the keywords NATURAL and USING occur in the same join clause. The JOIN . . . USING clause allows one or more equijoin columns to be explicitly specified in brackets after the USING keyword. This avoids the shortcomings associated with the natural join. Many situations demand that tables be joined only on certain columns, and this format caters for this requirement. Consider the following queries:

Images

Query 1 specifies that the LOCATIONS and COUNTRIES tables must be joined on common COUNTRY_ID column values. All columns from these tables are retrieved for the rows with matching join column values. Query 2 shows a traditionally specified query that retrieves the same rows as Query 1. The join columns specified with the JOIN . . . USING syntax cannot be qualified using table names or aliases when they are referenced in the SELECT and JOIN clauses. Since this join syntax potentially excludes some columns with identical names from the join clause, these must be qualified if they are referenced to avoid ambiguity.

As Figure 10-5 shows, the JOB_HISTORY and EMPLOYEES tables were joined based on the presence of equal values in their JOB_ID and EMPLOYEE_ID columns. Rows conforming to this join condition are retrieved. These tables share three identically named columns. In this example, only two of these are specified as join columns. Notice that although the third identically named column is DEPARTMENT_ID, it is qualified with a table alias to avoid ambiguity, while the join columns specified in the SELECT clause cannot be qualified with table aliases.

Images

Figure 10-5    Natural join using the JOIN…USING clause

The JOIN ON Clause

The format of the syntax for the JOIN ON clause is as follows:

SELECT table1.column, table2.column
FROM table1
JOIN table2 ON (table1.column_name = table2.column_name);

The natural join and the JOIN . . . USING clauses depend on join columns with identical column names. The JOIN . . . ON clause allows the explicit specification of join columns, regardless of their column names. This is the most flexible and widely used form of the join clauses. The ON and NATURAL keywords cannot appear together in a join clause. The equijoin columns are fully qualified as table1.column1 = table2.column2 and are optionally specified in brackets after the ON keyword. The following queries illustrate the JOIN . . . ON clause:

Images

Query 1 retrieves all column values from both the DEPARTMENTS and EMPLOYEES tables for the rows that meet an equijoin condition. This condition is fulfilled by EMPLOYEE_ID values matching DEPARTMENT_ID values in the DEPARTMENTS table. The traditional Oracle syntax in Query 2 returns the same results as Query 1. Notice the similarities between the traditional join condition specified in the WHERE clause and the join condition specified after the ON keyword.

The START_DATE column in the JOB_HISTORY table is joined to the HIRE_DATE column in the EMPLOYEES table in Figure 10-6. This equijoin retrieves the details of employees who worked for the organization and changed jobs.

Images

Figure 10-6    Inner join using the JOIN . . . ON clause

Exercise 10-2: Using the NATURAL JOIN . . . ON Clause    Each record in the DEPARTMENTS table has a MANAGER_ID column matching an EMPLOYEE_ID value in the EMPLOYEES table. You are required to produce a report with one column aliased as Managers. Each row must contain a sentence of the following format: FIRST_NAME LAST_NAME is manager of the DEPARTMENT_NAME department. Alias the EMPLOYEES table as E and the DEPARTMENTS table as D and use dot notation where possible.

1.  Start SQL Developer and connect to the HR schema.

2.  The Managers column can be constructed by concatenating the required items and separating them with spaces.

3.  The SELECT clause is as follows:

Images

4.  The FROM clause is as follows:

Images

5.  The JOIN . . . ON clause is as follows:

Images

6.  Executing this statement returns 11 rows describing the managers of each department, as shown in the following illustration:

Images

N-Way Joins and Additional Join Conditions

The joins just discussed were demonstrated using two tables. There is no restriction on the number of tables that can be joined. Third normal form consists of a set of tables connected through a series of primary and foreign key relationships. Traversing these relationships using joins enables consistent and reliable retrieval of data. However, there are instances when primary and foreign key relationships are not defined between tables. These tables can also be joined, but the results do not benefit from referential integrity being enforced by the database. When multiple joins exist in a statement, they are evaluated from left to right. Consider the following query using a mixture of natural joins and Oracle joins:

Images

The natural join between DEPARTMENTS and LOCATIONS creates an interim result set consisting of 27 rows since they are implicitly joined on the LOCATION_ID column. This set is then Cartesian joined to the COUNTRIES table since a join condition is not implicitly or explicitly specified. The 27 interim rows are joined to the 25 rows in the COUNTRIES table, yielding a new interim results set with 675 (27 × 25) rows and three columns: DEPARTMENT_NAME, CITY, and COUNTRY_NAME. This set is then joined to the REGIONS table. Once again, a Cartesian join occurs because the REGION_ID column is absent from any join condition. The final result set contains 2700 (675 × 4) rows and four columns. Using natural joins mixed with Oracle joins is error prone and not recommended since join conditions may sometimes be erroneously omitted.

The JOIN . . . USING and JOIN . . . ON syntaxes are better suited for joining multiple tables. The following query joins four tables using the natural join syntax:

Images

This query correctly yields 27 rows in the final results set since the required join columns are listed in the SELECT clause. The following query demonstrates how the JOIN . . . ON clause is used to fetch the same 27 rows. A join condition can reference only columns in its scope. In the following example, the join from DEPARTMENTS to LOCATIONS may not reference columns in the COUNTRIES or REGIONS tables, but the join between COUNTRIES and REGIONS may reference any column from the four tables involved in the query.

Images

The JOIN . . . USING clause can also be used to join these four tables as follows:

Images

The WHERE clause is used to specify conditions that restrict the results set of a query whether it contains joins or not. The JOIN . . . ON clause is also used to specify conditions that limit the results set created by the join. Consider the following two queries:

Images

Query 1 uses a WHERE clause to restrict the 27 rows created by equijoining the DEPARTMENTS and LOCATIONS tables based on their LOCATION_ID values to the three that contain DEPARTMENT_ID values beginning with the letter P. Query 2 implements the condition within the brackets of the ON subclause and returns the same three rows.

Five tables are joined in Figure 10-7, resulting in a list describing the top-earning employees and geographical information about their departments.

Images

Figure 10-7    N-way joins and additional join conditions

Nonequijoins

Nonequijoins match column values from different tables based on an inequality expression. The value of the join column in each row in the source table is compared to the corresponding values in the target table. A match is found if the expression used in the join, based on an inequality operator, evaluates to true. When such a join is constructed, a nonequijoin is performed.

A nonequijoin is specified using the JOIN . . . ON syntax, but the join condition contains an inequality operator instead of an equal sign.

The format of the syntax for a nonequijoin clause is as follows:

SELECT table1.column, table2.column
FROM table1
[JOIN table2 ON (table1.column_name < table2.column_name)]|
[JOIN table2 ON (table1.column_name > table2.column_name)]|
[JOIN table2 ON (table1.column_name <= table2.column_name)]|
[JOIN table2 ON (table1.column_name >= table2.column_name)]|
[JOIN table2 ON (table1.column BETWEEN table2.col1 AND table2.col2)]

Consider the 16 rows returned by the query in Figure 10-8. The EMPLOYEES table is nonequijoined to the JOBS table based on the inequality join condition (2*E.SALARY < J.MAX_SALARY). The JOBS table stores the salary range for different jobs in the organization. The SALARY value for each employee record is doubled and compared with all MAX_SALARY values in the JOBS table. If the join condition evaluates to true, the row is returned.

Images

Figure 10-8    Nonequijoins

Join a Table to Itself Using a Self-Join

Storing hierarchical data in a single relational table may be accomplished by allocating at least two columns per row. One column stores an identifier of the row’s parent record, and the second stores the row’s identifier. Associating rows with each other based on a hierarchical relationship requires Oracle to join a table to itself. This self-join technique is discussed in the next section.

Joining a Table to Itself Using the JOIN . . . ON Clause

Suppose you need to store a family tree in a relational table; you could take several approaches. One option is to use a table called FAMILY with columns named ID, NAME, MOTHER_ID, and FATHER_ID, where each row stores a person’s name, a unique ID number, and the ID values for their parents.

When two tables are joined, each row from the source table is subjected to the join condition with rows from the target table. If the condition evaluates to true, then the joined row, consisting of columns from both tables, is returned.

When the join columns originate from the same table, a self-join is required. Conceptually, the source table is duplicated to create the target table. The self-join works like a regular join between these tables. Note that, internally, Oracle does not duplicate the table, and this description is merely provided to explain the concept of self-joining. Consider the following three queries:

Images

To identify a person’s father in the FAMILY table, you could use Query 1 to get that person’s ID, NAME, and FATHER_ID value. In Query 2, the FATHER_ID value obtained from the first query could be substituted to obtain the father’s NAME value. Notice that both Queries 1 and 2 source information from the FAMILY table.

Query 3 performs a self-join with the JOIN . . . ON clause by aliasing the FAMILY table as f1 and f2. Oracle treats these as different tables even though they point to the same physical table. The first occurrence of the FAMILY table, aliased as f1, is designated as the source table, while the second occurrence, aliased as f2, is assigned as the target table. The join condition in the ON clause is of the format source.child_id=target.parent_id. Figure 10-9 shows a sample of FAMILY data and demonstrates a three-way self-join to the same table.

Images

Figure 10-9    Self-join

Exercise 10-3: Performing a Self-Join    There is a hierarchical relationship between employees and their managers. For each row in the EMPLOYEES table, the MANAGER_ID column stores the EMPLOYEE_ID of every employee’s manager. Using a self-join on the EMPLOYEES table, you are required to retrieve the employee’s LAST_NAME, EMPLOYEE_ID, MANAGER_ID, manager’s LAST_NAME, and employee’s DEPARTMENT_ID for the rows with DEPARMENT_ID values of 10, 20, or 30. Alias the EMPLOYEES table as E and the second instance of the EMPLOYEES table as M. Sort the results based on the DEPARTMENT_ID column.

1.  Start SQL Developer and connect to the HR schema.

2.  The SELECT clause is as follows:

Images

3.  The FROM clause with source table and alias is as follows:

Images

4.  The JOIN . . . ON clause with aliased target table is as follows:

Images

5.  The WHERE clause is as follows:

Images

6.  The ORDER BY clause is as follows:

Images

7.  Executing this statement returns nine rows describing the managers of each employee in these departments.

Images

View Data That Does Not Meet a Join Condition by Using Outer Joins

Equijoins match rows between two tables based on the equality of the column data stored in each table. Nonequijoins rely on matching rows between tables based on a join condition containing an inequality expression. Target table rows with no matching join column in the source table are usually not required. When they are required, however, an outer join is used to fetch them. Several variations of outer joins can be used depending on whether join column data is missing from the source or target tables or both. These outer join techniques are described in the following topics:

•  Inner versus outer joins

•  Left outer joins

•  Right outer joins

•  Full outer joins

Inner versus Outer Joins

When equijoins and nonequijoins are performed, rows from the source and target tables are matched using a join condition formulated with equality and inequality operators, respectively. These are referred to as inner joins. An outer join is performed when rows, which are not retrieved by an inner join, are returned.

Two tables sometimes share a master-detail or parent-child relationship. In the sample HR schema there are several pairs of tables with such a relationship. One pair is the DEPARTMENTS and EMPLOYEES tables. The DEPARTMENTS table stores a master list of DEPARTMENT_NAME and DEPARTMENT_ID values. Each EMPLOYEES record has a DEPARTMENT_ID column constrained to be either a value that exists in the DEPARTMENTS table or null. This leads to one of the following three scenarios. The fourth scenario could occur if the constraint between the tables was removed.

1.  An employee row has a DEPARTMENT_ID value that matches a row in the DEPARTMENTS table.

2.  An employee row has a null value in its DEPARTMENT_ID column.

3.  There are rows in the DEPARTMENTS table with DEPARTMENT_ID values that are not stored in any employee records.

4.  An employee row has a DEPARTMENT_ID value that is not featured in the DEPARTMENTS table.

The first scenario describes an inner join between the two tables. The second and third scenarios cause many problems. Joining the EMPLOYEES and DEPARTMENTS tables on the DEPARTMENT_ID column may result in rows with null DEPARTMENT_ID values being excluded. An outer join can be used to include these orphaned rows in the results set. The fourth scenario should rarely occur in a well-designed database because foreign key constraints would prevent the insertion of child records with no parent values. Since this row will be excluded by an inner join, it can be retrieved using an outer join.

A left outer join between the source and target tables returns the results of an inner join as well as rows from the source table excluded by that inner join. A right outer join between the source and target tables returns the results of an inner join as well as rows from the target table excluded by that inner join. If a join returns the results of an inner join as well as rows from both the source and target tables excluded by that inner join, then a full outer join has been performed.

Left Outer Joins

The format of the syntax for the LEFT OUTER JOIN clause is as follows:

SELECT table1.column, table2.column
FROM table1
LEFT OUTER JOIN table2
ON (table1.column = table2.column);

A left outer join performs an inner join of table1 and table2 based on the condition specified after the ON keyword. Any rows from the table on the left of the JOIN keyword excluded for not fulfilling the join condition are also returned. Consider the following two queries:

Images

Queries 1 and 2 are identical except for the join clauses, which have the keywords LEFT OUTER JOIN and JOIN, respectively. Query 2 performs an inner join, and seven rows are returned. These rows share identical DEPARTMENT_ID values in both tables. Query 1 returns the same seven rows and one additional row. This extra row is obtained from the table to the left of the JOIN keyword, which is the DEPARTMENTS table. It is the row containing details of the Payroll department. The inner join does not include this row since no employees are currently assigned to the department.

Figure 10-10 shows a left outer join. The inner join produces 27 rows with matching LOCATION_ID values in both tables. There are 43 rows in total, which implies that 16 rows were retrieved from the LOCATIONS table, which is on the left of the JOIN keyword. None of the rows from the DEPARTMENTS table contains any of these 16 LOCATION_ID values.

Images

Figure 10-10    Left outer join

Right Outer Joins

The format of the syntax for the RIGHT OUTER JOIN clause is as follows:

SELECT table1.column, table2.column
FROM table1
RIGHT OUTER JOIN table2
ON (table1.column = table2.column);

A right outer join performs an inner join of table1 and table2 based on the join condition specified after the ON keyword. Rows from the table to the right of the JOIN keyword, excluded by the join condition, are also returned. Consider the following query:

Images

The inner join produces seven rows containing details for the employees with LAST_NAME values that begin with the letter G. The EMPLOYEES table is to the right of the JOIN keyword. Any employee records that do not conform to the join condition are included, provided they conform to the WHERE clause condition. In addition, the right outer join fetches one EMPLOYEE record with a LAST_NAME of Grant. This record currently has a null DEPARTMENT_ID value. The inner join excludes the record since no DEPARTMENT_ID is assigned to this employee.

Figure 10-11 shows a right outer join between the JOB_HISTORY and EMPLOYEES tables. The EMPLOYEES table is on the right of the JOIN keyword. The DISTINCT keyword eliminates duplicate combinations of JOB_ID values from the tables. The results show the jobs that employees have historically left. The jobs that no employees have left are also returned. These have a null value in the “Jobs in JOB_HISTORY” column.

Images

Figure 10-11    Right outer join

Full Outer Joins

The format of the syntax for the FULL OUTER JOIN clause is as follows:

SELECT table1.column, table2.column
FROM table1
FULL OUTER JOIN table2
ON (table1.column = table2.column);

A full outer join returns the combined results of a left and right outer join. An inner join of table1 and table2 is performed before rows excluded by the join condition from both tables are merged into the results set.

The traditional Oracle join syntax does not support a full outer join, which is typically performed by combining the results from a left and right outer join using the UNION set operator described in Chapter 11. Consider the full outer join shown in Figure 10-12. The WHERE clause restricting the results to rows with NULL DEPARTMENT_ID values shows the orphan rows in both tables. There is one record in the EMPLOYEES table that has no DEPARTMENT_ID values, and there are 16 departments to which no employees belong.

Images

Figure 10-12    Full outer join

Exercise 10-4: Performing an Outer Join    The DEPARTMENTS table contains details of all departments in the organization. You are required to retrieve the DEPARTMENT_NAME and DEPARTMENT_ID values for those departments to which no employees are currently assigned.

1.  Start SQL*Plus and connect to the HR schema.

2.  The SELECT clause is as follows:

Images

3.  The FROM clause with source table and alias is as follows:

Images

4.  The LEFT OUTER JOIN clause with aliased target table is as follows:

Images

5.  The WHERE clause is as follows:

Images

6.  Executing this statement returns 16 rows describing the departments to which no employees are currently assigned, as shown in the following illustration:

Images

Generate a Cartesian Product of Two or More Tables

A Cartesian product of two tables may be conceptualized as joining each row of the source table with every row in the target table. The number of rows in the result set created by a Cartesian product is equal to the number of rows in the source table multiplied by the number of rows in the target table. Cartesian products can be formed intentionally using the ANSI SQL cross join syntax. This technique is described in the next section.

Creating Cartesian Products Using Cross Joins

Cartesian product is a mathematical term. It refers to the set of data created by merging the rows from two or more tables together. Cross join is the syntax used to create a Cartesian product by joining multiple tables. Both terms are often used synonymously. The format of the syntax for the CROSS JOIN clause is as follows:

SELECT table1.column, table2.column
FROM table1
CROSS JOIN table2;

It is important to observe that no join condition is specified using the ON or USING keyword. A Cartesian product freely associates the rows from table1 with every row in table2. Conditions that limit the results are permitted in the form of WHERE clause restrictions. If table1 and table2 contain x and y number of rows, respectively, the Cartesian product will contain x times y number of rows. The results from a cross join can be used to identify orphan rows or generate a large dataset for use in application testing. Consider the following queries:

Images

Query 1 takes the 19 rows and 4 columns from the JOBS table and the 10 rows and 5 columns from the JOB_HISTORY table and generates one large set of 190 records with 9 columns. SQL*Plus presents any identically named columns as headings. SQL Developer appends an underscore and number to each shared column name and uses it as the heading. The JOB_ID column is common to both the JOBS and JOB_HISTORY tables. The headings in SQL Developer are labeled JOB_ID and JOB_ID_1, respectively. Query 2 generates the same Cartesian product as the first, but the 190 rows are constrained by the WHERE clause condition, and only 10 rows are returned.

Figure 10-13 shows a cross join between the REGIONS and COUNTRIES tables. There are 4 rows in REGIONS and 25 rows in COUNTRIES. Since the WHERE clause limits the REGIONS table to 2 of 4 rows, the Cartesian product produces 50 (25 × 2) records. The results are sorted alphabetically, first on the REGION_NAME and then on the COUNTRY_NAME. The first record has the pair of values Asia and Argentina. When the REGION_NAME changes, the first record has the following pair of values: Middle East and Africa and Argentina. Notice that the COUNTRY_NAME values are repeated for every REGION_NAME.

Images

Figure 10-13    The cross join

Exercise 10-5: Performing a Cross Join    You are required to obtain the number of rows in the EMPLOYEES and DEPARTMENTS tables as well as the number of records that would be created by a Cartesian product of these two tables. Confirm your results by explicitly counting and multiplying the number of rows present in each of these tables.

1.  Start SQL*Plus and connect to the HR schema.

2.  The SELECT clause to find the number of rows in the Cartesian product is as follows:

Images

3.  The FROM clause is as follows:

Images

4.  The Cartesian product is performed using the following:

Images

5.  Explicit counts of the rows present in the source tables are performed using the following:

Images

6.  Explicit multiplication of the values resulting from the previous queries may be performed by querying the DUAL table.

7.  Executing these statements reveals that there are 107 records in the EMPLOYEES table, 27 records in the DEPARTMENTS table, and 2,889 records in the Cartesian product of these two datasets, as shown in the following illustration:

Images

Two-Minute Drill

Write SELECT Statements to Access Data from More Than One Table Using Equijoins and Nonequijoins

•  Equijoining occurs when one query fetches column values from multiple tables in which the rows fulfill an equality-based join condition.

•  A natural join is performed using the NATURAL JOIN syntax when the source and target tables are implicitly equijoined using all identically named columns.

•  The JOIN . . . USING syntax allows an inner join to be formed on specific columns with shared names.

•  Dot notation refers to qualifying a column by prefixing it with its table name and a dot or period symbol. This designates the table a column originates from and differentiates it from identically named columns from other tables.

•  The JOIN . . . ON clause allows the explicit specification of join columns regardless of their column names. This provides a flexible joining format.

•  The ON, USING, and NATURAL keywords are mutually exclusive and therefore cannot appear together in a join clause.

•  A nonequijoin is performed when the values in the join columns fulfill the join condition based on an inequality expression.

Join a Table to Itself Using a Self-Join

•  A self-join is required when the join columns originate from the same table. Conceptually, the source table is duplicated, and a target table is created. The self-join then works as a regular join between two discrete tables.

•  Storing hierarchical data in a relational table requires a minimum of two columns per row. One column stores an identifier of the row’s parent record, and the second stores the row’s identifier.

View Data that Does Not Meet a Join Condition Using Outer Joins

•  When equijoins and nonequijoins are performed, rows from the source and target tables are matched. These are referred to as inner joins.

•  An outer join is performed when rows, which are not retrieved by an inner join, are included for retrieval in addition to the rows retrieved by the inner join.

•  A left outer join between the source and target tables returns the results of an inner join and the missing rows it excluded from the source table.

•  A right outer join between the source and target tables returns the results of an inner join and the missing rows it excluded from the target table.

•  A full outer join returns the combined results of a left outer join and right outer join.

Generate a Cartesian Product of Two or More Tables

•  A Cartesian product is sometimes called a cross join. It is a mathematical term that refers to the set of data created by merging the rows from two or more tables.

•  The count of the rows returned from a Cartesian product is equal to the number of rows in the source table multiplied by the number of rows in the target table.

•  Joins that specify fewer than N-1 join conditions when joining N tables, or that specify invalid join conditions, inadvertently create Cartesian products.

Self Test

The following questions will help you measure your understanding of the material presented in this chapter. Read all the choices carefully because there may be more than one correct answer. Choose all the correct answers for each question.

1.  The EMPLOYEES and DEPARTMENTS tables have two identically named columns: DEPARTMENT_ID and MANAGER_ID. Which of these statements joins these tables based only on common DEPARTMENT_ID values? (Choose all that apply.)

A.  SELECT * FROM EMPLOYEES NATURAL JOIN DEPARTMENTS;

B.  SELECT * FROM EMPLOYEES E NATURAL JOIN DEPARTMENTS D ON E.DEPARTMENT_ID=D.DEPARTMENT_ID;

C.  SELECT * FROM EMPLOYEES NATURAL JOIN DEPARTMENTS USING (DEPARTMENT_ID);

D.  None of the above

2.  The EMPLOYEES and DEPARTMENTS tables have two identically named columns: DEPARTMENT_ID and MANAGER_ID. Which statements join these tables based on both column values? (Choose all that apply.)

A.  SELECT * FROM EMPLOYEES NATURAL JOIN DEPARTMENTS;

B.  SELECT * FROM EMPLOYEES JOIN DEPARTMENTS USING (DEPARTMENT_ID,MANAGER_ID);

C.  SELECT * FROM EMPLOYEES E JOIN DEPARTMENTS D ON E.DEPARTMENT_ID=D.DEPARTMENT_ID AND E.MANAGER_ID=D.MANAGER_ID;

D.  None of the above

3.  Which join is performed by the following query? (Choose the best answer.)

Images

A.  Equijoin

B.  Nonequijoin

C.  Cross join

D.  Outer join

4.  Which of the following statements are syntactically correct? (Choose all that apply.)

A.  SELECT * FROM EMPLOYEES E JOIN DEPARTMENTS D USING (DEPARTMENT_ID);

B.  SELECT * FROM EMPLOYEES JOIN DEPARTMENTS D USING (D.DEPARTMENT_ID);

C.  SELECT D.DEPARTMENT_ID FROM EMPLOYEES JOIN DEPARTMENTS D USING (DEPARTMENT_ID);

D.  None of the above

5.  Which of the following statements are syntactically correct? (Choose all that apply.)

A.  SELECT E.EMPLOYEE_ID, J.JOB_ID PREVIOUS_JOB, E.JOB_ID CURRENT_JOB FROM JOB_HISTORY J CROSS JOIN EMPLOYEES E ON (J.START_DATE=E.HIRE_DATE);

B.  SELECT E.EMPLOYEE_ID, J.JOB_ID PREVIOUS_JOB, E.JOB_ID CURRENT_JOB FROM JOB_HISTORY J JOIN EMPLOYEES E ON (J.START_DATE=E.HIRE_DATE);

C.  SELECT E.EMPLOYEE_ID, J.JOB_ID PREVIOUS_JOB, E.JOB_ID CURRENT_JOB FROM JOB_HISTORY J OUTER JOIN EMPLOYEES E ON (J.START_DATE=E.HIRE_DATE);

D.  None of the above

6.  Choose one correct statement regarding the following query:

Images

A.  Joining three tables is not permitted.

B.  A Cartesian product is generated.

C.  The JOIN . . . ON clause can be used for joins between multiple tables.

D.  None of the above.

7.  How many rows are returned after executing the following statement? (Choose the best answer.)

Images

The REGIONS table contains the following row data:

Images

A.  2

B.  3

C.  4

D.  None of the above

8.  Choose one correct statement regarding the following query:

Images

A.  No rows in the LOCATIONS table have the COUNTRY_ID values returned.

B.  No rows in the COUNTRIES table have the COUNTRY_ID values returned.

C.  The rows returned represent the COUNTRY_ID values for all the rows in the LOCATIONS table.

D.  None of the above.

9.  Which of the following statements are syntactically correct? (Choose all that apply.)

A.  SELECT JH.JOB_ID FROM JOB_HISTORY JH RIGHT OUTER JOIN JOBS J ON JH.JOB_ID=J.JOB_ID;

B.  SELECT JOB_ID FROM JOB_HISTORY JH RIGHT OUTER JOIN JOBS J ON (JH.JOB_ID=J.JOB_ID);

C.  SELECT JOB_HISTORY.JOB_ID FROM JOB_HISTORY OUTER JOIN JOBS ON JOB_HISTORY.JOB_ID=JOBS.JOB_ID;

D.  None of the above

10.  If the REGIONS table, which contains 4 rows, is cross joined to the COUNTRIES table, which contains 25 rows, how many rows appear in the final results set? (Choose the best answer.)

A.  100 rows

B.  4 rows

C.  25 rows

D.  None of the above

Self Test Answers

1.  Images    D. All the queries are incorrect, so D is the correct choice.
Images    A, B, and C are incorrect. A is incorrect because the query performs a natural join that implicitly joins the two tables on all columns with identical names, which, in this case, are DEPARTMENT_ID and MANAGER_ID. B and C are incorrect because the queries incorrectly contain the NATURAL keyword. If this is removed, they will join the DEPARTMENTS and EMPLOYEES tables based on the DEPARTMENT_ID column.

2.  Images    A, B, and C. These clauses demonstrate different techniques to join the tables on both the DEPARTMENT_ID and MANAGER_ID columns.
Images    D is incorrect because there are correct answers given.

3.  Images    B. The join condition is an expression based on the less than inequality operator. Therefore, this join is a nonequijoin.
Images    A, C, and D are incorrect. A would be correct if the operator in the join condition expression was an equality operator. The CROSS JOIN keywords or the absence of a join condition would result in C being true. D would be true if one of the OUTER JOIN clauses was used instead of the JOIN . . . ON clause.

4.  Images    A. This statement demonstrates the correct usage of the JOIN . . . USING clause.
Images    B, C, and D are incorrect. B is incorrect since only nonqualified column names are allowed in the brackets after the USING keyword. C is incorrect because the column in brackets after the USING keyword cannot be referenced with a qualifier in the SELECT clause. D is incorrect since there is a correct answer.

5.  Images    B demonstrates the correct usage of the JOIN . . . ON clause.
Images    A, C, and D are incorrect. A is incorrect since the CROSS JOIN clause cannot contain the ON keyword. C is incorrect since the OUTER JOIN keywords must be preceded by the LEFT, RIGHT, or FULL keyword. D is incorrect since there is a correct answer.

6.  Images    C. The JOIN…ON clause and the other join clauses can all be used for joins between multiple tables. The JOIN . . . ON and JOIN . . . USING clauses are better suited for N-way table joins.
Images    A, B, and D are incorrect. A is incorrect since you can join as many tables as you want. B is incorrect since a Cartesian product is not created since there are two join conditions and three tables. D is incorrect since there is a correct answer.

7.  Images    B. Three rows are returned. The REGIONS table is being joined to itself. For the row with a REGION_ID value of 2 in the table aliased as R1, a match is found with the row in the table aliased as R2 where the REGION_NAME value is Asia, because half the length of the REGION_NAME value is also 2. Therefore, this row is returned. The same logic results in the rows with REGION_ID values of three and four and matching REGION_NAME values of Europe and Americas being returned.
Images    A, C, and D are incorrect because exactly three rows are returned.

8.  Images    A. The right outer join fetches the COUNTRIES rows that the inner join between the LOCATIONS and COUNTRIES tables has excluded in addition to the inner join results. The WHERE clause then restricts the results by eliminating these inner join results. This leaves the rows from the COUNTRIES table with which no records from the LOCATIONS table records are associated.
Images    B, C, and D are incorrect. B is incorrect because it is rows from the COUNTRIES table that are returned. C is incorrect because these are the rows included by the inner join, but eliminated by the WHERE clause. D is incorrect because A is the correct answer.

9.  Images    A. This statement demonstrates the correct use of the RIGHT OUTER JOIN . . . ON clause.
Images    B, C, and D are incorrect. The JOB_ID column in the SELECT clause in B is not qualified and is therefore ambiguous since the table from which this column comes is not specified. C uses an OUTER JOIN without the keywords LEFT, RIGHT, and FULL. D is incorrect since there is a correct answer.

10.  Images    A. The cross join associates every four rows from the REGIONS table 25 times with the rows from the COUNTRIES table yielding a result set that contains 100 rows.
Images    B, C, and D are incorrect. B and C would be returned if the product of the number of rows in each table were 4 or 25. D is incorrect since A is the correct answer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset