Chapter 9 Group Functions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Images

CHAPTER 9 Group Functions

Exam Objectives

In this chapter, you will learn to

• 061.5.1 Identify the Available Group Functions

• 061.5.2 Describe the Use of Group Functions

• 061.5.3 Group Data by Using the GROUP BY Clause

• 061.5.4 Include or Exclude Grouped Rows by Using the HAVING Clause

Single-row functions, explored in Chapter 8, return a single value for each row in a set of results. Group or aggregate functions operate on multiple rows. They are used to count the number of rows or to find the average of specific column values in a dataset. Many statistical operations, such as calculating standard deviation, medians, and averages, depend on executing functions against grouped data and not just single rows.

You will examine group functions in two stages. A discussion of their purpose and syntax precedes a detailed analysis of the AVG, SUM, MIN, MAX, COUNT, LISTAGG, STDDEV, and VARIANCE functions. Grouping or aggregating data based on one or more column values is examined before the GROUP BY clause is introduced. The WHERE clause restricts rows in a dataset before grouping, while the HAVING clause restricts them after grouping. This chapter concludes with a discussion of the HAVING clause.

Describe the Group Functions

This section defines SQL group functions and discusses the different variants. The syntax and examples demonstrating the selected group functions are provided along with a discussion of their data types and the effect of the DISTINCT keyword and null values.

Definition of Group Functions

Group functions operate on aggregated data and return a single result per group. These groups usually consist of one or more rows of data. Single-row functions are defined with the formula F(x, y, z, . . .) = result, where x, y, z . . . are input parameters. The function F executes on one row of the dataset at a time and returns a result for each row. Group functions can be defined using the following formula:

F(g1, g2, g3, . . . , gn) = result1, result2, result2, . . . , resultn;

The group function executes once for each cluster of rows and returns a single result per group. The rows within these groups are associated using a common value or attribute. If a table is presented as one group to the group function in its entirety, then one result is returned. One or more group functions may appear in the SELECT list as follows:

Consider the EMPLOYEES table. There are 107 rows in this table. Groups may be created based on the common values that rows share. For example, the rows that share the same DEPARTMENT_ID value may be clustered together. Thereafter, group functions are executed separately against each unique group.

Figure 9-1 shows 12 distinct DEPARTMENT_ID values in the EMPLOYEES table, including a null value. The rows are distributed into 12 groups based on common DEPARTMENT_ID values. The COUNT function executes 12 times, once for each group. Notice that the distinct groups do not contain the same number of rows.

Images

Figure 9-1 Group functions operating on 12 groups

Using Group Functions

AVG, SUM, MIN, MAX, COUNT, STDDEV, and VARIANCE demonstrate the practical application of group functions. These group functions all return numeric results. Additionally, the MIN and MAX functions may return character and date results. These functions operate on non-null values, but unlike the others, the COUNT function call also counts rows with null values under certain conditions.

The COUNT Function

The COUNT function counts the number of rows in a group. Its syntax is as follows:

This syntax may be deconstructed into the following forms:

• COUNT(*)

• COUNT(DISTINCT expr)

• COUNT(ALL expr)

• COUNT(expr)

When COUNT(*) is invoked, all rows in the group, including those with nulls or duplicate values, are counted. When COUNT(DISTINCT expr) is executed, only unique occurrences of expr are counted for each group. The ALL keyword is part of the default syntax, so COUNT(ALL expr) and COUNT(expr) are equivalent. If expr is based on named columns, then nulls are ignored, but if expr is based on anything else, it will be evaluated for every row, whether there are null values in the row or not. The data type of expr may be NUMBER, DATE, CHAR, or VARCHAR2. Consider these queries:

Images

Query 1 counts the rows in the EMPLOYEES table and returns the integer 107. Query 2 counts the rows with non-null COMMISSION_PCT values and returns 36. It also counts the literal expression 1, which is not based on a named column and is therefore evaluated for every row, returning 107. Query 3 considers the 107 non-null rows, determines the number of unique SALARY values, and returns 58. Query 4 demonstrates how the COUNT function is used on both a DATE column and a NUMBER column. The integers 107 and 106 are returned, since there are 107 non-null HIRE_DATE values and 106 non-null MANAGER_ID values in the group.

The SUM Function

The SUM function returns the aggregated total of the non-null numeric values in a group. It has this syntax:

This syntax may be deconstructed into the following forms:

• SUM(DISTINCT expr)

• SUM(ALL expr)

• SUM(expr)

SUM(DISTINCT expr) provides a total by adding all the unique values returned after expr is evaluated for each row in the group. SUM(expr) and SUM(ALL expr) provide a total by adding expr for each row in the group. Null values are ignored. The expr parameter must be a numeric value. Consider the following queries:

Images

There are 107 rows in the EMPLOYEES table. Query 1 adds the number 2 across 107 rows and returns 214. Query 2 takes the SALARY column value for every row in the group, which in this case is the entire table, and returns the total salary amount of 721166. Query 3 returns a total of 417158, since many employees get paid the same salary, and the DISTINCT keyword only adds unique values in the column to the total. Query 4 returns 7.9 after adding the non-null COMMISSION_PCT values.

The AVG Function

The average value of a column or expression is obtained by dividing the sum by the number of non-null rows in the group. The AVG function has this syntax:

This syntax may be deconstructed into the following forms:

• AVG(DISTINCT expr)

• AVG(ALL expr)

• AVG(expr)

When AVG(DISTINCT expr) is invoked, the distinct values of expr are summed and divided by the number of unique occurrences of expr. AVG(ALL expr) and AVG(expr) add the non-null values of expr for each row and divide the sum by the number of non-null rows in the group. The expr parameter must be a numeric value. Consider these queries:

Images

There are 107 rows in the EMPLOYEES table. Query 1 adds the number 2 across 107 rows and divides the total by the number of rows to return the number 2. Numeric literals submitted to the AVG function are returned unchanged. Query 2 adds the SALARY value for each row to obtain the total salary amount of 721166, which is divided by the rows with non-null SALARY values (107) to return the average 6739.86916. There are 58 unique salary values, which, when added, yield a total of 417158. Dividing 417158 by 58 returns 7192.37931 as the average of the distinct salary values, which is returned by the third query. Adding the non-null COMMISSION_PCT values produces a total of 7.9. Dividing this by the employee records with non-null COMMISSION_PCT values (36) yields 0.219444444, which is returned by query 4.

The STDDEV and VARIANCE Functions

The standard deviation is a measure of the spread or distribution of the values in a column or expression. It is obtained by calculating the square root of the variance. The variance refers to the sum of the squared differences between the actual value and the average value divided by N-1 or N depending on whether variance is being established for a sample or the entire population. The VARIANCE function establishes variance for a sample and therefore divides the sum of squared differences between the actual and average values by N-1. For more information, look up the VAR_POP and VAR_SAMP functions, which are beyond the scope of this exam.

Variance is calculated by first calculating the average of the values in the set. Then for each number, subtract the average value and square the result. This is the squared difference. The variance is then the sum of all the squared differences divided by N-1. The STDDEV and VARIANCE functions have two forms, an aggregate form and an analytic form that is beyond the scope of the exam. The aggregate form has this syntax:

Consider the queries in Figure 9-2.

Images

Figure 9-2 The VARIANCE and STDDEV functions

There are three employees who belong to department 90 with salary values of 24000, 17000, and 17000. The average salary is 19333.3333 as returned by query 2, which shows the variance and standard deviation. For the statistically inclined, queries 3 and 4 show alternative ways to calculate variance and standard deviation. These are nonexaminable and provided to exemplify the descriptions of these functions. Query 3 uses the power function, which raises (salary – average salary) to power 2, giving you the squared differences, which are summed using the sum function and divided by 2 (N-1). It also uses the sqrt function to calculate standard deviation. Remember, N is 3 because there are three salary values in the sample set. The average salary, 19333.3333, is hard-coded in query 3, while query 4 uses a nested subquery (details in Chapter 11) to dynamically fetch the average salary value.

The MAX and MIN Functions

The MAX and MIN functions return the maximum (largest) and minimum (smallest) expr value in a group. The MAX and MIN functions operate on NUMBER, DATE, CHAR, and VARCHAR2 data types. They return a value of the same data type as their input arguments, which are either the largest or smallest items in the group. When applied to DATE items, MAX returns the latest date, and MIN returns the earliest one. Character strings are converted to numeric representations of their constituent characters based on the NLS settings in the database. When the MIN function is applied to a group of character strings, the word that appears first alphabetically is returned, while MAX returns the word that appears last. The MAX and MIN functions have this syntax:

This syntax may be deconstructed into the following forms:

• MAX(DISTINCT expr); MIN(DISTINCT expr)

• MAX(ALL expr); MIN(ALL expr)

• MAX(expr); MIN(expr);

MAX(expr), MAX(ALL expr), and MAX(DISTINCT expr) examine the values for expr in a group of rows and return the largest value. Null values are ignored. MIN(expr), MIN(ALL expr), and MIN(DISTINCT expr) examine the values for expr in a group of rows and return the smallest value. Consider these queries:

Images

Query 1 returns 0.1 and 0.4 for the minimum and maximum COMMISSION_PCT values in the EMPLOYEES table. Notice that null values for COMMISSION_PCT are ignored. Query 2 evaluates a DATE column and indicates that the earliest START_DATE in the JOB_HISTORY table is 17-SEP-1995 and the latest END_DATE is 07-JAN-2015. Query 3 returns AC_ACCOUNT and ST_MAN as the JOB_ID values appearing first and last alphabetically in the EMPLOYEES table.

The LISTAGG Function

The LISTAGG function returns a string aggregation of column values. If an ORDER BY clause is present, values are sorted and concatenated. This function operates as a single-set aggregate function that operates on all rows and returns a single output row or as a group-set aggregate function that returns an output row for each group in the GROUP BY clause (discussed in the next section). The LISTAGG function has this syntax:

The expr parameter can be any valid expression. Null values are ignored. The delimiter specifies the optional string to separate the expr. The ORDER BY clause determines the order in which the concatenated values are returned. Consider the queries in Figure 9-3.

Images

Figure 9-3 The LISTAGG function

Query 1 returns the familiar output format listing countries sorted by REGION_ID and reverse alphabetically by COUNTRY_NAME. Query 2 demonstrates the single-set aggregate form of the LISTAGG function, and a single row that concatenates the COUNTRY_NAME values in a comma-delimited list is returned. Query 3 shows the group-set aggregate form, and a comma-delimited concatenated list of COUNTRY_NAME values are returned for each grouping by REGION_ID along with a COUNT of the number of items per group.

Exercise 9-1: Use the Group Functions The COUNTRIES table stores a list of COUNTRY_NAME values. You are required to calculate the average length of all the country names. Any fractional components must be rounded to the nearest whole number.

1. Start SQL*Plus or SQL Developer and connect to the HR schema.

2. The length of the country name value for each row is to be calculated using the LENGTH function. The average length may be determined using the AVG function. It may be rounded to the nearest whole number using the ROUND function. The following is a possible solution:

3. Executing this statement shows that the average length of all the country names in the COUNTRIES table is eight characters.

Group Data Using the GROUP BY Clause

The group functions discussed earlier use groups of rows making up the entire table. This section explores partitioning a set of data into groups using the GROUP BY clause. Group functions may be applied to these subsets or clusters of rows.

Creating Groups of Data

A table has at least one column and zero or more rows of data. In many tables, data requires analysis to transform it into useful information. It is a common reporting requirement to calculate statistics from a set of data divided into groups using different attributes. Previous examples using group functions operated against all the rows in a table. The entire table was treated as one large group. Groups of data within a set are created by associating rows with common attributes with each other. Thereafter, group functions can execute against each of these groups. Groups of data include entire rows and not specific columns.

Consider the EMPLOYEES table. It comprises 11 columns and 107 rows. You could create groups of rows that share a common DEPARTMENT_ID value. The SUM function can then be used to create salary totals per department. Another possible set of groups may share common JOB_ID column values. The AVG group function can then be used to identify the average salary paid to employees in different jobs.

A group is defined as a subset of the entire dataset sharing one or more common attributes. These attributes are typically column values but can also be expressions. The number of groups created depends on the distinct values present in the common attribute.

As Figure 9-4 shows, there are 12 unique DEPARTMENT_ID values in the EMPLOYEES table. If rows are grouped using common DEPARTMENT_ID values, there will be 12 groups. If a group function is executed against these groups, there will be 12 values returned because it will execute once for each group.

Images

Figure 9-4 Unique DEPARTMENT_ID values in the EMPLOYEES table

The GROUP BY Clause

The SELECT statement is enhanced by the addition of the GROUP BY clause. This clause facilitates the creation of groups. It appears after the WHERE clause but before the ORDER BY clause, as follows:

Images

The column or expression specified in the GROUP BY clause is also known as the grouping attribute and is the component that rows are grouped by. The dataset is segmented according to the grouping attribute. Consider the following query:

Images

The grouping attribute in this example is the DEPARTMENT_ID column. The dataset, on which the group functions in the SELECT list must operate, is divided into 12 groups, one for each department. For each group (department), the maximum salary value and the number of rows are returned. Since the results are sorted by DEPARTMENT_ID, the third row in the set of results contains the values 11000 and 6. This indicates that 6 employees have the same DEPARTMENT_ID value (which you happen to know is 30). Of these 6, the highest earner has a SALARY value of 11000. This query demonstrates that the grouping attribute does not have to be included in the SELECT list.

It is common to see the grouping attribute in the SELECT list alongside grouping functions. If an item that is not a group function appears in the SELECT list and there is no GROUP BY clause, an “ORA-00937: not a single-group group function” error is raised. If a GROUP BY clause is present but that item is not a grouping attribute, then an “ORA-00979: not a GROUP BY expression” error is returned.

If a group function is placed in a WHERE clause, an “ORA-00934: group function is not allowed here” error is returned. Imposing group-level conditions is achieved using the HAVING clause discussed later in this chapter. Group functions may, however, be used as part of the ORDER BY clause.

The first query in Figure 9-5 raises an error because the END_DATE column is in the SELECT list with a group function and there is no GROUP BY clause. An ORA-00979 error is returned from the second query, since the START_DATE item is listed in the SELECT clause, but it is not a grouping attribute.

Images

Figure 9-5 The GROUP BY clause

The third query divides the JOB_HISTORY rows into groups based on the four-digit year component from the END_DATE column. Four groups are created using this grouping attribute. These represent different years when employees ended their jobs. The COUNT shows the number of employees who quit their jobs during each of these years. The results are listed in descending order based on the “Number of Employees” expression. Note that the COUNT group function is present in the ORDER BY clause.

Grouping by Multiple Columns

A powerful extension to the GROUP BY clause uses multiple grouping attributes. Oracle permits datasets to be partitioned into groups and allows these groups to be further divided into subgroups using a different grouping attribute. Consider the following two queries:

Images

Query 1 restricts the rows returned from the EMPLOYEES table to the 35 rows with non-null COMMISSION_PCT values. These rows are then divided into two groups: 80 and NULL based on the DEPARTMENT_ID grouping attribute. The result set contains two rows, which return the sum of the COMMISSION_PCT values for each group.

Query 2 is similar to the first one except it has an additional item: JOB_ID in both the SELECT and GROUP BY clauses. This second grouping attribute decomposes the two groups by DEPARTMENT_ID into the constituent JOB_ID components belonging to the rows in each group. The distinct JOB_ID values for rows with DEPARTMENT_ID=80 are AC_ACCOUNT, SA_REP, and SA_MAN. The distinct JOB_ID value for rows with a null DEPARTMENT_ID is SA_REP. Therefore, query 2 returns two groupings: one that consists of three subgroups, and the other with only one, as shown in Figure 9-6.

Images

Figure 9-6 The GROUP BY clause with multiple columns

Exercise 9-2: Group Data Based on Multiple Columns Analysis of staff turnover is a common reporting requirement. You are required to create a report that contains the number of employees who left their jobs, grouped by the year in which they left. The jobs they performed is also required. The results must be sorted in descending order based on the number of employees in each group. The report must list the year, the JOB_ID, and the number of employees who left a particular job in that year.

1. Start SQL Developer and connect to the HR schema.

2. The JOB_HISTORY table contains the END_DATE and JOB_ID columns, which constitute the source data for this report.

3. The year component may be extracted using the TO_CHAR function. The number of employees who quit a particular job in each year may be obtained using the COUNT(*) function.

4. Executing the following statement returns the staff turnover report as requested:

Images

Nested Group Functions

Recall that single-row functions may be nested or embedded to any level of depth. Group functions can be nested only two levels deep. Three formats using group functions are shown here:

G1(group_item) = result
G1(G2(group_item) = result
G1(G2(G3(group_item))) is NOT allowed.

Group functions are represented by the letter G followed by a number. The first simple form contains no nested functions. Examples include the SUM(group_item) and AVG(group_item) functions that return a single result per group. The second form supports two nested group functions, like SUM(AVG(group_item)). In this case, a GROUP BY clause is necessary because the average value of the group_item per group is calculated before being aggregated by the SUM function.

The third form is disallowed by Oracle. Consider an expression that nests three group functions. If the MAX function is applied to the previous example, the expression MAX(SUM(AVG(group_item))) is formed. The two inner group functions return a single value representing the sum of a set of average values. This expression becomes MAX(single value), which is not sensible since a group function cannot be applied to a single value.

Images

EXAM TIP Single-row functions may be nested to any level, but group functions may be nested to, at most, two levels deep. The nested function call COUNT(SUM(AVG(X))) returns the error “ORA-00935: group function is nested too deeply.” It is acceptable to nest single-row functions within group functions. Consider the following query: SELECT SUM(AVG(LENGTH(LAST_NAME))) FROM EMPLOYEES GROUP BY DEPARTMENT_ID. It calculates the sum of the average length of LAST_NAME values per department.

Include or Exclude Grouped Rows Using the HAVING Clause

Creating groups of data and applying aggregate functions are useful. A refinement to these features is the ability to include or exclude results based on group-level conditions. This section introduces the HAVING clause. A clear distinction is made between the WHERE clause and the HAVING clause.

Restricting Group Results

WHERE clause conditions restrict rows returned by a query. Rows are included if they fulfill the conditions listed and are sometimes known as row-level results. Clustering rows using the GROUP BY clause and applying an aggregate function to these groups returns results often referred to as group-level results. The HAVING clause restricts group-level results.

The following query limits the rows retrieved from the JOB_HISTORY table by specifying a WHERE condition based on the DEPARTMENT_ID column values.

Images

This query returns seven rows. If the WHERE clause were absent, all ten rows would be retrieved. Suppose you want to know how many employees were previously employed in each of these departments. There are seven rows that can be manually grouped and counted. However, if there are a large number of rows, an aggregate function like COUNT may be used, as shown in the following query:

Images

This query adds to the previous statement. The aggregate function COUNT was added to the SELECT list, and a GROUP BY DEPARTMENT_ID clause was also added. Four rows with their aggregate row count are returned, and it is clear that the original seven rows restricted by the WHERE clause were clustered into four groups based on common DEPARTMENT_ID values, as shown in the following table:

Images

Suppose you wanted to restrict this list to only those departments with more than one employee. The HAVING clause limits or restricts the group-level rows as required.

This query must perform the following steps:

1. Consider the entire row-level dataset.

2. Limit the dataset by any WHERE clause conditions.

3. Segment the data into one or more groups using the grouping attributes specified in the GROUP BY clause.

4. Apply any aggregate functions to create a new group-level dataset. Each row may be regarded as an aggregation of its source row-level data based on the groups created.

5. Limit or restrict the group-level data with a HAVING clause condition. Only group-level results matching these conditions are returned.

The HAVING Clause

The general form of the SELECT statement is further enhanced by the addition of the HAVING clause and becomes the following:

Images

An important difference between the HAVING clause and the other SELECT statement clauses is that it may be specified only if a GROUP BY clause is present. This dependency is sensible since group-level rows must exist before they can be restricted. The HAVING clause can occur before the GROUP BY clause in the SELECT statement. However, it is more common to place the HAVING clause after the GROUP BY clause. All grouping is performed and group functions are executed prior to evaluating the HAVING clause.

The following query shows how the HAVING clause is used to restrict an aggregated dataset. Records from the JOB_HISTORY table are divided into four groups. The rows that meet the HAVING clause condition (contributing more than one row to the group row count) are returned:

Images

Three rows with DEPARTMENT_ID values of 50, 80, and 110, each with a COUNT(*) value of 2, 2, and 3, are returned, respectively.

Figure 9-7 shows three queries. Query 1 divides the 107 records from the EMPLOYEES table into 19 groups based on common JOB_ID values. The average salary for each JOB_ID group and the aggregate row count are computed. Query 2 refines the results by conditionally excluding those aggregated rows where the average salary is less than or equal to 12000, using a HAVING clause. Query 3 demonstrates that the Boolean operators may be used to specify multiple HAVING clause conditions.

Images

Figure 9-7 The HAVING clause

Exercise 9-3: Use the HAVING Clause The company is planning a recruitment drive and wants to identify which days of the week 20 or more staff members were hired. Your report must list the days and the number of employees hired on each of them.

1. Start SQL*Plus or SQL Developer and connect to the HR schema.

2. Divide EMPLOYEES records into groups based on the day component of the HIRE_DATE column. You can obtain the number of employees per group using the COUNT function. Use the HAVING clause to restrict these rows to only those where the count is greater than or equal to 20.

3. A possible solution is the following statement, which returns the days of the week on which 20 or more employees were hired:

Images

Two-Minute Drill

Describe the Group Functions

• Group functions are also known as multiple-row, aggregate, or summary functions. They execute once for each group of data and aggregate the data from multiple rows into a single result for each group.

• Groups can be entire tables or portions of a table grouped together by a common grouping attribute.

• The COUNT of a column or an expression returns an integer value representing the number of rows in a group, where the specified column or expression is not null.

• The SUM function returns an aggregated total of all the non-null numeric values in a group.

• The AVG function divides the sum of a column or expression by the number of non-null rows in a group.

• The MAX and MIN functions operate on NUMBER, DATE, CHAR, and VARCHAR2 data types. They return a value that is either the largest or smallest item in the group.

Group Data Using the GROUP BY Clause

• The GROUP BY clause specifies the grouping attribute rows must have in common for them to be clustered together.

• The GROUP BY clause facilitates the creation of groups within a selected set of data and appears after the WHERE clause but before the ORDER BY clause.

• Any item on the SELECT list that is not a group function must be a grouping attribute.

• Group functions may not be placed in a WHERE clause.

• Datasets may be partitioned into groups and further divided into subgroups based on multiple grouping attributes.

• The LISTAGG function returns a concatenated string of sorted column values, sorted by the ORDER BY expression specified after the WITHIN GROUP clause.

Include or Exclude Grouped Rows Using the HAVING Clause

• Clustering rows using a common grouping attribute with the GROUP BY clause and applying an aggregate function to each of these groups returns group-level results.

• The HAVING clause provides the language to limit the group-level results returned.

• The HAVING clause can be specified only if there is a GROUP BY clause present.

• All grouping is performed and group functions are executed prior to evaluating the HAVING clause.

Self Test

1. What result is returned by the following statement?

(Choose the best answer.)

A. NULL

B. 0

C. 1

D. None of the above

2. Choose one correct statement regarding group functions.

A. Group functions can be used only when a GROUP BY clause is present.

B. Group functions can operate on multiple rows at a time.

C. Group functions operate on only a single row at a time.

D. Group functions can execute multiple times within a single group.

3. What value is returned after executing the following statement?

Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. (Choose the best answer.)

A. 900

B. 1000

C. NULL

D. None of the above

4. Which values are returned after executing the following statement?

Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. (Choose all that apply.)

A. 10 and 10

B. 10 and NULL

C. 10 and 9

D. None of the above

5. What value is returned after executing the following statement?

Assume there are ten employee records and each contains a SALARY value of 100, except for one employee, who has a null value in the SALARY field. (Choose the best answer.)

A. NULL

B. 90

C. 100

D. None of the above

6. What value is returned after executing the following statement?

Images

Assume there are ten employee records and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. (Choose the best answer.)

A. An error

B. 3

C. 4

D. None of the above

7. How many rows are returned by the following query?

Assume there are 11 non-null and 1 null unique DEPARTMENT_ID values. All records have a non-null SALARY value. (Choose the best answer.)

A. 12

B. 11

C. NULL

D. None of the above

8. What values are returned after executing the following statement?

Assume that the JOBS table has ten records with the same JOB_ID value of DBA and the same MAX_SALARY value of 100. (Choose the best answer.)

A. One row of output with the values DBA, 100

B. Ten rows of output with the values DBA, 100

C. An error

D. None of the above

9. How many rows of data are returned after executing the following statement?

Assume the EMP table has ten rows and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. The first five rows have a DEPT_ID value of 10, while the second group of five rows, which includes the row with a null SALARY value, has a DEPT_ID value of 20. (Choose the best answer.)

A. Two rows

B. One row

C. Zero rows

D. None of the above

10. How many rows of data are returned after executing the following statement?

Assume the EMP table has ten rows and each contains a SALARY value of 100, except for one, which has a null value in the SALARY field. The first five rows have a DEPT_ID value of 10, while the second five rows, which include the row with a null SALARY value, have a DEPT_ID value of 20. (Choose the best answer.)

A. Two rows

B. One row

C. Zero rows

D. None of the above

11. Choose two statements that are true.

A. The STDDEV function returns the square root of the VARIANCE.

B. The VARIANCE function returns the square root of the STDDEV.

C. The AVG function works on date and numeric data.

D. The LISTAGG function returns a total numeric sum of a list of data.

E. LISTAGG returns the concatenated values of the measure column.

12. How many rows of data are returned after executing the following statement?

Images

Assume the EMPLOYEES table has ten distinct DEPARTMENT_ID values and 107 rows. (Choose the best answer.)

A. 0

B. 1

C. 10

D. 107

E. None of the above

Self Test Answers

1. C. The DUAL table has one row and one column. The COUNT(*) function returns the number of rows in a table or group.
A, B, and D are incorrect. A is incorrect because a table can have zero or more rows, but never NULL rows. B is incorrect since the DUAL table has one row. D is incorrect because the correct answer is C.

2. B. By definition, group functions can operate on multiple rows at a time, unlike single-row functions.
A, C, and D are incorrect. A is incorrect because a group function can be used without a GROUP BY clause. In this case, the entire dataset is operated on as a group. C is incorrect because group functions are often executed against an entire table, which is treated as one group. D is incorrect because once a dataset has been partitioned into different groups, any group functions execute once per group.

3. A. The SUM aggregate function ignores null values and adds non-null values. Since nine rows contain the SALARY value 100, 900 is returned.
B, C, and D are incorrect. B would be returned if SUM(NVL(SALARY,100)) were executed. C is a tempting choice since regular arithmetic with NULL values returns a NULL result. However, the aggregate functions, except for COUNT(*), ignore NULL values. D is incorrect because the correct answer is A.

4. C. COUNT(*) considers all rows, including those with NULL values, while COUNT(SALARY) only considers the non-null rows.
A, B, and D are incorrect.

5. C. The NVL function converts the one NULL value into 100. Thereafter, the average function adds the SALARY values and obtains 1000. Dividing this by the number of records returns 100.
A, B, and D are incorrect. B would be returned if AVG(NVL(SALARY,0)) were selected. It is interesting to note that if AVG(SALARY) were selected, 100 would have also been returned, since the AVG function would sum the non-null values and divide the total by the number of rows with non-null SALARY values. So, AVG(SALARY) would be calculated as 900/9=100.

6. C. The dataset is segmented by the SALARY column. This creates two groups: one with SALARY values of 100 and the other with a null SALARY value. The average length of SALARY value 100 is 3 for the rows in the first group. The NULL salary value is first converted into the number 0 by the NVL function, and the average length of SALARY is 1. The SUM function operates across the two groups adding the values 3 and 1, returning 4.
A, B, and D are incorrect. A seems plausible since group functions may not be nested more than two levels deep. Although there are four functions, only two are group functions, while the others are single-row functions evaluated before the group functions. B would be returned if the expression SUM(AVG(LENGTH(SALARY))) were selected. D is incorrect because the correct answer is C.

7. A. There are 12 distinct DEPARTMENT_ID values. Since this is the grouping attribute, 12 groups are created, including 1 with a null DEPARTMENT_ID value. Therefore, 12 rows are returned.
B, C, and D are incorrect. B is incorrect because the null DEPARTMENT_ID is a valid unique value by which the group is partitioned. C is incorrect because 12 rows are returned. D is incorrect because the correct answer is A.

8. C. For a GROUP BY clause to be used, a group function must appear in the SELECT list.
A, B, and D are incorrect. These are incorrect since the statement is syntactically inaccurate and is disallowed by Oracle. Do not mistake the column named MAX_SALARY for the MAX(SALARY) function.

9. B. Two groups are created based on their common DEPT_ID values. The group with DEPT_ID values of 10 consists of five rows with SALARY values of 100 in each of them. Therefore, the SUM(SALARY) function returns 500 for this group, and it satisfies the HAVING SUM(SALARY) > 400 clause. The group with DEPT_ID values of 20 has four rows with SALARY values of 100 and one row with a NULL SALARY. SUM(SALARY) only returns 400, and this group does not satisfy the HAVING clause.
A, C, and D are incorrect. Beware of the SUM(NVL(SALARY,100)) expression in the SELECT clause. This expression selects the format of the output. It does not restrict or limit the dataset in any way. A would be correct if the HAVING clause determined there to be two groups to which the aggregate function would be applied. C is incorrect because there is one set to be aggregated, which results in one row being returned. D is incorrect because the correct answer is B.

10. A. Two groups are created based on their common DEPT_ID values. The group with DEPT_ID values of 10 consists of five rows with SALARY values of 100 in each of them. Therefore, the SUM(NVL(SALARY,100)) function returns 500 for this group and satisfies the HAVING SUM(NVL(SALARY,100))>400 clause. The group with DEPT_ID values of 20 has four rows with SALARY values of 100 and one row with a null SALARY. SUM(NVL(SALARY,100)) returns 500, and this group satisfies the HAVING clause. Therefore, two rows are returned.
B, C, and D are incorrect. Although the SELECT clause contains SUM(SALARY), which returns 500 and 400 for the two groups, the HAVING clause contains the SUM(NVL(SALARY,100)) expression, which specifies the inclusion or exclusion criteria for a group-level row. B would be correct if the HAVING clause determined there to be one group to which the aggregate function would be applied. C is incorrect because there are two sets to be aggregated, which results in two rows being returned. D is incorrect because the correct answer is A.

11. A and E. The standard deviation is mathematically defined as the square root of its variance. The LISTAGG function sorts the measure using the ORDER BY in the WITHIN GROUP clause and returns an ordered concatenated string.
B, C, and D are incorrect. All these statements are false.

12. B. The dataset is not divided into groups by a GROUP BY clause, and the LISTAGG function therefore operates on the entire table as a single dataset. One long row consisting of a concatenated list on LAST_NAME values is returned.
A, C, D, and E are incorrect. A is incorrect because LISTAGG returns at least one row. C and D would be correct if the dataset was divided into 10 and 107 groups, respectively. E is incorrect because the correct answer is B.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9 Group Functions

Create new playlist

Sign In

Sign Up

CHAPTER 9

Group Functions

Exam Objectives

Describe the Group Functions

Definition of Group Functions

Using Group Functions

The COUNT Function

The SUM Function

The AVG Function

The STDDEV and VARIANCE Functions

The MAX and MIN Functions

The LISTAGG Function

Group Data Using the GROUP BY Clause

Creating Groups of Data

The GROUP BY Clause

Grouping by Multiple Columns

Nested Group Functions

Include or Exclude Grouped Rows Using the HAVING Clause

Restricting Group Results

The HAVING Clause

Two-Minute Drill

Describe the Group Functions

Group Data Using the GROUP BY Clause

Include or Exclude Grouped Rows Using the HAVING Clause

Self Test

Self Test Answers

Table of Contents for
Chapter 9 Group Functions