A subquery places one query inside another one. The second query resides somewhere within the WHERE clause of a SELECT statement. One or more values returned by the subquery are used by the main query to return the results to the user.
subquery
A query that is embedded in a main, or parent, query and used to assist in filtering the result set from a query.
The types of operators allowed in the WHERE clause depend on whether the subquery returns one row or more than one row. If only a single row is returned from a query, the comparison operators =, !=, <, >, >=, <=, and so forth are valid. If more than one row is returned from a subquery, operators such as IN, NOT IN, ANY, and ALL are valid.
The boss, King, wants to do his quarterly salary analysis. He would like to see which employees in the IT department are earning more than the average salary across all employees. Janice, the database analyst and DBA, realizes that this could be written as two queries and decides to take that approach first before using a subquery. The average salary for an employee in the company is retrieved by a query you've seen in previous chapters:
select avg(salary) from employees; AVG(SALARY) ----------- 6461.68224 1 row selected.
Using this information as a starting point, Janice writes a second query to see which employees in the IT department (department 60) have a higher salary than the average. She must cut and paste the number returned from the previous query into this new query:
select employee_id, last_name, first_name, salary from employees where salary > 6461.68224 and department_id = 60; EMPLOYEE_ID LAST_NAME FIRST_NAME SALARY ----------- ------------- ----------------- ---------- 103 Hunold Alexander 9000 1 row selected.
The only employee in the IT department making more than the company average salary is Alexander Hunold, who happens to be the manager of that department.
Janice wants to streamline this reporting function for King. She realizes that this can easily be written as a single-row subquery. She will embed the query she used to calculate the average into the second query, replacing the constant value as follows:
single-row subquery
A subquery that returns a single row and is compared to a single value in the parent query.
select employee_id, last_name, first_name, salary from employees where salary > (select avg(salary) from employees) and department_id = 60; EMPLOYEE_ID LAST_NAME FIRST_NAME SALARY ----------- ------------- ---------------------- ---------- 103 Hunold Alexander 9000 1 row selected.
Not only is the query more readable and easier to maintain than the version with two queries, but the Oracle server also will process it much more efficiently.
|
King is starting to realize that the IT department may need some pay increases in the next fiscal year.
Sometimes, you want to compare a column in a table to a list of results from a subquery, not just a single result. This is where a multiple-row subquery comes in handy. For example, King is following up on his analysis of employee salaries in the IT department, and he wants to see who else in the company is making the same salary as anyone in the IT department.
multiple-row subquery
A subquery that can return more than one row for comparison to the main, or parent, query using operators such as IN.
Janice starts out with the subquery to make sure that she starts with the right set of results to use for the main query. She wants to get the salaries for the employees in the IT department (department 60):
select salary from employees where department_id = 60;
SALARY ---------- 9000 6000 4800 4800 4200 5 rows selected.
So far, so good. She takes this query and makes it a subquery in the query that compares the salaries of all employees to this list by using the IN clause:
select employee_id, last_name, first_name, salary from employees where salary in (select salary from employees where department_id = 60); EMPLOYEE_ID LAST_NAME FIRST_NAME SALARY ----------- ------------- ------------------ ---------- 158 McEwen Allan 9000 152 Hall Peter 9000 109 Faviet Daniel 9000 103 Hunold Alexander 9000 202 Fay Pat 6000 104 Ernst Janice 6000 106 Pataballa Valli 4800 105 Austin David 4800 184 Sarchand Nandita 4200 107 Lorentz Diana 4200 10 rows selected.
But wait, something is not quite right here. King did not want to see the IT employees in this list; he wanted to include everyone but the IT employees. So Janice makes a slight change as follows, removing employees whose job title is not an IT job title:
select employee_id, last_name, first_name, salary from employees where salary in (select salary from employees where department_id = 60) and job_id not like 'IT_%';
EMPLOYEE_ID LAST_NAME FIRST_NAME SALARY ----------- ------------- ------------------ ---------- 158 McEwen Allan 9000 152 Hall Peter 9000 109 Faviet Daniel 9000 202 Fay Pat 6000 184 Sarchand Nandita 4200 5 rows selected.
Note that Janice also could have checked for a department ID other than 60, as you have seen in previous queries.
A correlated subquery looks very much like a garden-variety subquery, with one important difference: The correlated subquery references a column in the main query as part of the qualification process to see if a given row will be returned by the query. For each row in the parent query, the subquery is evaluated to see if the row will be returned. In Janice's situation, the salary of each individual employee is compared to the average salary for that employee's department. The check-marked rows in the parent query are returned.
correlated subquery
A subquery that contains a reference to a column in the main, or parent, query.
Janice knows that King will be asking for more queries regarding salaries, so she comes up with a fairly generic query that will identify employees who are making more than the average salary for their department. As a first step, she builds the subquery that retrieves the average salary for a department:
select avg(salary) from employees where department_id = 60; AVG(SALARY) ----------- 5760 1 row selected.
That query returns the average salary for department 60. In the correlated subquery, she will need to generalize it so that it will correlate with any department in the parent query. Next, she builds the parent query that compares a given employee's salary to the average she just calculated:
select employee_id, last_name, salary from employees where department_id = 60 and salary > 5760; EMPLOYEE_ID LAST_NAME SALARY ----------- ------------------ ---------- 103 Hunold 9000 104 Ernst 6000 2 rows selected.
Notice that there are two queries that can now be linked together into a correlated subquery to return all employees who earn more than the average for their department across all departments. If you're not sure how to link these two queries, the hint is in the column names. Janice joins the two queries using the DEPARTMENT_ID column:
select employee_id, last_name, department_id, salary from employees emp where salary > (select avg(salary) from employees where department_id = emp.department_id); EMPLOYEE_ID LAST_NAME DEPARTMENT_ID SALARY ----------- ------------------ ------------- ---------- 100 King 90 24000
103 Hunold 60 9000 104 Ernst 60 6000 108 Greenberg 100 12000 109 Faviet 100 9000 ... 193 Everett 50 3900 201 Hartstein 20 13000 205 Higgins 110 12000 38 rows selected.
As Janice expected, this query still shows that Hunold and Ernst make more than the average salary for department 60.
There are times when you need to use a subquery that compares more than just one column between the parent query and the subquery. This is known as a multiple-column subquery. Typically, the IN clause is used to compare the outer query's columns to the columns of the subquery.
NOTE
Multiple-column subqueries can be rewritten as a compound WHERE clause with multiple logical operators. However, this approach is not as readable or maintainable as a multiple-column subquery.
The boss, King, wants to be able to identify employees who make the same salaries as other employees with the same job. He wants to specify an employee number and have the query return the other employees who have the same job title and make the same salary. Janice immediately realizes that this could be written as a multiple-column subquery. She decides to try out the query on one of the stock clerks, Hazel Philtanker, who has an employee number of 136:
multiple-column subquery
A subquery in which more than one column is selected for comparison to the main query using the same number of columns.
select employee_id, last_name, job_id, salary from employees where (job_id, salary) in (select job_id, salary from employees where employee_id = 136); EMPLOYEE_ID LAST_NAME JOB_ID SALARY ----------- --------------- ---------- ---------- 128 Markle ST_CLERK 2200 136 Philtanker ST_CLERK 2200 2 rows selected.
The query looks good, except that Hazel is included in the results. If King decides he doesn't want to see the selected employee in the results, Janice can modify the query slightly and change it into a correlated multiple-column subquery:
select employee_id, last_name, job_id, salary from employees emp where (job_id, salary) in (select job_id, salary from employees where employee_id = 136 and employee_id != emp.employee_id); EMPLOYEE_ID LAST_NAME JOB_ID SALARY ----------- --------------- ---------- ---------- 128 Markle ST_CLERK 2200 1 row selected.