Suppose you want to
identify the air travel destinations that experienced the worst delays
in March. You would like your output to show all of the following
data:
Your PROC SQL query
uses an in-line view to calculate all of the new columns except for
the last one:
proc sql;
title "Flight Destinations and Delays";
select destination,
average format=3.0 label='Average Delay',
max format=3.0 label='Maximum Delay',
late/(late+early) as prob format=5.2
label='Probability of Delay'
from (select destination,
avg(delay) as average,
max(delay) as max,
sum(delay > 0) as late,
sum(delay <= 0) as early
from sasuser.flightdelays
group by destination)
order by average;
Consider each clause
of the outer query, starting with the FROM clause, because PROC SQL
evaluates the FROM clause before the SELECT clause.
The FROM clause specifies
an in-line view rather than a table. The in-line view (nested query)
specifies the following columns that are in the table Sasuser.Flightdelays
or are based on a column in that table:
-
the existing column Destination
-
-
-
-
The columns Average,
Max, Late, and Early are all calculated by using summary functions.
In the calculation for
the columns Late and Early, a Boolean expression is used as the argument
for the summary function. A Boolean function resolves either to 1
(true) or 0 (false). For example, Late is calculated by taking the
sum of the Boolean expression delay > 0
.
For every value of Delay that is greater than 0, the Boolean expression
resolves to 1; values that are equal to or less than 0 resolve to
0. The SUM function adds all values of Delay to indicate the number
of delays that occurred for each destination.
The in-line view concludes
with the clause group by destination
, specifying
that the in-line view data should be grouped, and summarized by the
values of Destination.
If you submitted this
in-line view (nested query) as a separate query, it would generate
the following output.
Consider the outer query's
SELECT and ORDER BY clauses:
proc sql;
title "Flight Destinations and Delays";
select destination,
average format=3.0 label='Average Delay',
max format=3.0 label='Maximum Delay',
late/(late+early) as prob format=5.2
label='Probability of Delay'
from (select destination,
avg(delay) as average,
max(delay) as max,
sum(delay > 0) as late,
sum(delay <= 0) as early
from sasuser.flightdelays
group by destination)
order by average;
The outer query's
SELECT clause specifies columns as follows:
-
Destination is an existing column
in the table.
-
Average and Max are calculated
in the in-line view, and are assigned labels and formats in this SELECT
clause.
-
Prob (with the label “Probability
of Delay”) is calculated in this SELECT clause by using two
columns that were calculated in the in-line view: Late and Early.
The outer query's SELECT clause can refer to the calculated columns
Late and Early without using the keyword CALCULATED because PROC SQL
evaluates the in-line view (the outer query's FROM clause) first.
The outer query's
last clause is an ORDER BY clause. The output is sorted by the values
of Average.
This PROC SQL query
generates the following output.
Later in this chapter,
a PROC SQL query that combines multiple tables and uses an in-line
view is explained.