Creating simple aggregates

For this example, the goal is to solve a very simple problem. If a customer takes a taxi, they usually have to pay for getting in the taxi—for example, 2.50. Now, let's assume that for each kilometer, the customer has to pay 2.20. The question now is, what is the total price of a trip? 

Of course, this example is simple enough to solve without a custom aggregate; however, let's see how it works. First, some test data needs to be created:

test=# CREATE TABLE t_taxi (trip_id int, km numeric);  
CREATE TABLE
test=# INSERT INTO t_taxi VALUES (1, 4.0), (1, 3.2), (1, 4.5), (2, 1.9), (2, 4.5);
INSERT 0 5

To create aggregates, PostgreSQL offers the CREATE AGGREGATE command. The syntax of this command has become so powerful and long over time that it doesn't make sense anymore to include the output of it here in this book. Instead, I recommend going to the PostgreSQL documentation, which can be found at https://www.postgresql.org/docs/devel/static/sql-createaggregate.html.

The first thing that's needed when writing an aggregate is a function, which is called for every line. It will take an intermediate value, and data that's taken from the line processed. Here is an example:

test=# CREATE FUNCTION taxi_per_line (numeric, numeric) 
RETURNS numeric AS $$
BEGIN RAISE NOTICE 'intermediate: %, per row: %', $1, $2; RETURN $1 + $2*2.2; END; $$ LANGUAGE 'plpgsql';

CREATE FUNCTION

Now, it is already possible to create a simple aggregate:

test=# CREATE AGGREGATE taxi_price (numeric)  
( 
   INITCOND = 2.5, 
   SFUNC = taxi_per_line,  
   STYPE = numeric 
); 
CREATE AGGREGATE

As we stated previously, every trip starts at €2.50 for getting in the taxi, which is defined by INITCOND (the init condition). It represents the starting value for each group. Then, a function is called for each line in the group. In my example, this function is taxi_per_line and has already been defined. As you can see, it needs two parameters. The first parameter is an intermediate value. Those additional parameters are the parameters that are passed to the function by the user.

The following statement shows what data is passed, when, and how:

test=# SELECT trip_id, taxi_price(km) FROM t_taxi GROUP  BY 1; 
psql: NOTICE: intermediate: 2.5, per row: 4.0 psql: NOTICE: intermediate: 11.30, per row: 3.2 psql: NOTICE: intermediate: 18.34, per row: 4.5 psql: NOTICE: intermediate: 2.5, per row: 1.9 psql: NOTICE: intermediate: 6.68, per row: 4.5 psql: trip_id | taxi_price ---------+------------ 1 | 28.24 2 | 16.58
(2 rows)

The system starts with trip 1 and €2.50 (the init condition). Then, 4 kilometers are added. Overall, the price is now 2.50 + 4 x 2.2. Then, the next line is added, which will add 3.2 x 2.2 and so on. The first trip, therefore, costs €28.24.

Then, the next trip starts. Again, there is a fresh init condition, and PostgreSQL will call one function per line.

In PostgreSQL, an aggregate can automatically be used as a windowing function too. No additional steps are needed—you can use the aggregate directly:

test=# SELECT *, taxi_price(km) OVER (PARTITION BY trip_id ORDER BY km)  
   FROM  t_taxi; 
psql: NOTICE: intermediate: 2.5, per row: 3.2 psql: NOTICE: intermediate: 9.54, per row: 4.0 psql: NOTICE: intermediate: 18.34, per row: 4.5 psql: NOTICE: intermediate: 2.5, per row: 1.9 psql: NOTICE: intermediate: 6.68, per row: 4.5 trip_id | km | taxi_price ---------+-----+------------ 1 | 3.2 | 9.54
1 | 4.0 | 18.34
1 | 4.5 | 28.24
2 | 1.9 | 6.68
2 | 4.5 | 16.58

(5 rows)

What the query does is give us the price up to a given point on the trip.

The aggregate we have defined will call one function per line. However, how would users be able to calculate an average? Without adding a FINALFUNC function, calculations like that are not possible. To demonstrate how FINALFUNC works, we must extend our example. Suppose the customer wants to give the taxi driver a 10% tip as soon as they leave the taxi. That 10% has to be added at the end, as soon as the total price is known. This is the point where FINALFUNC kicks in. Here is how it works:

test=# DROP AGGREGATE taxi_price(numeric);
DROP AGGREGATE

First of all, the old aggregate is dropped. Then, FINALFUNC is defined. It will get the intermediate result as a parameter and do its magic:

test=# CREATE FUNCTION taxi_final (numeric)  
              RETURNS numeric AS 
$$ 
   SELECT $1 * 1.1; 
$$ 
LANGUAGE sql IMMUTABLE;  
CREATE FUNCTION

The calculation is pretty simple, in this case—as we stated previously, 10% is added to the final sum.

Once the function has been deployed, it is already possible to recreate the aggregate:

test=# CREATE AGGREGATE taxi_price (numeric)  
( 
   INITCOND = 2.5, 
   SFUNC  = taxi_per_line,  
   STYPE  = numeric,  
   FINALFUNC = taxi_final 
); 
CREATE AGGREGATE

Finally, the price will simply be a bit higher than before:

test=# SELECT trip_id, taxi_price(km) FROM t_taxi GROUP BY 1; 
psql: NOTICE: intermediate: 2.5, per row: 4.0 ... trip_id | taxi_price ---------+------------ 1 | 31.064 2 | 18.238 (2 rows)

PostgreSQL takes care of all of the grouping and so on automatically.

For simple calculations, simple data types can be used for the intermediate result. However, not all operations can be done by just passing simple numbers and text around. Fortunately, PostgreSQL allows the use of composite data types, which can be used as intermediate results.

Imagine that you want to calculate an average of some data, maybe a time series. An intermediate result might look as follows:

test=# CREATE TYPE my_intermediate AS (c int4, s numeric);
CREATE TYPE

Feel free to compose any arbitrary type that serves your purpose. Just pass it as the first parameter and add data as additional parameters as needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset