Using cursors to fetch data in chunks

If you execute SQL, the database will calculate the result and send it to your application. Once the entire result set has been sent to the client, the application can continue to do its job. The problem is just this: what happens if the result set is so large that it does not fit into the memory anymore? What if the database returns 10 billion rows? The client application usually cannot handle so much data at once, and actually, it shouldn't. The solution to the problem is a cursor. The idea behind a cursor is that data is generated only when it is needed (when FETCH is called). Therefore, the application can already start to consume data while it is actually being generated by the database. On top of that, the memory required to perform an operation is a lot lower.

When it comes to PL/pgSQL, cursors also play a major role. Whenever you loop over a result set, PostgreSQL will automatically use a cursor internally. The advantage is that the memory consumption of your applications will be reduced dramatically and there is hardly a chance of ever running out of memory due to processing large amounts of data. There are various ways to use cursors. Here is the most simplistic example of a cursor inside a function:

CREATE OR REPLACE FUNCTION c(int) 
  RETURNS setof text AS 
$$ 
DECLARE 
  v_rec record; 
BEGIN 
  FOR v_rec IN SELECT tablename 
               FROM pg_tables 
               LIMIT $1 
  LOOP 
    RETURN NEXT v_rec.tablename; 
  END LOOP; 
  RETURN; 
END; 
$$ LANGUAGE 'plpgsql'; 

This code is interesting for two reasons. First of all, it is a Set Returning Function (SRF). It produces an entire column and not just a single row. The way to achieve this is to use the set of variables instead of just the datatype. The RETURN NEXT clause will build up the result set until we have reached the end. The RETURN clause will tell PostgreSQL that we want to leave the function and that the result is done.

The second important issue is that looping over the cursor will automatically create an internal cursor. In other words, there is no need to be afraid that you could potentially run out of memory. PostgreSQL will optimize the query in a way that it tries to produce the first 10% of the data (defined by the cursor_tuple_fraction variable) as fast as possible. Here is what the query will return:

test=# SELECT * FROM c(3); 
c -------------- t_test pg_statistic pg_type (3 rows)

In this example, there will simply be a list of random tables. If the result differs on your side, this is somewhat expected.

What you have just seen is, in my opinion, the most frequent and most common way to use implicit cursors in PL/pgSQL. The following example shows an older mechanism that many people from Oracle might know:

CREATE OR REPLACE FUNCTION d(int) 
  RETURNS setof text AS 
$$ 
DECLARE 
  v_cur refcursor; 
  v_data text; 
BEGIN 
  OPEN v_cur FOR 
    SELECT tablename 
    FROM pg_tables 
    LIMIT $1; 
  WHILE true LOOP 
    FETCH v_cur INTO v_data; 
    IF FOUND THEN 
      RETURN NEXT v_data; 
    ELSE 
      RETURN; 
    END IF; 
  END LOOP; 
END; 
$$ LANGUAGE 'plpgsql'; 

In this example, the cursor is explicitly declared and opened. Inside, the loop data is then explicitly fetched and returned to the caller. Basically, the query does exactly the same thing. It is merely a matter of taste which syntax developers actually prefer.

Do you still have the feeling that you don't know enough about cursors yet? There is more; here is a third option to do exactly the same thing:

CREATE OR REPLACE FUNCTION e(int) 
  RETURNS setof text AS 
$$ 
DECLARE 
  v_cur CURSOR (param1 int) FOR 
          SELECT tablename 
          FROM pg_tables 
          LIMIT param1; 
  v_data text; 
BEGIN 
  OPEN v_cur ($1); 
  WHILE true LOOP 
    FETCH v_cur INTO v_data; 
    IF FOUND THEN 
      RETURN NEXT v_data; 
    ELSE 
      RETURN; 
    END IF; 
  END LOOP; 
END; 
$$ LANGUAGE 'plpgsql'; 

In this case, the cursor is fed with an integer parameter, which comes directly from the function call ($1).

Sometimes, a cursor is not used up by the stored procedure itself but returned for later use. In this case, you can return a simple use refcursor as the return value:

CREATE OR REPLACE FUNCTION cursor_test(c refcursor) 
  RETURNS refcursor AS 
$$ 
BEGIN 
  OPEN c FOR SELECT * 
             FROM generate_series(1, 10) AS id;
RETURN c; END; $$ LANGUAGE plpgsql;

The logic here is quite simple. The name of the cursor is passed to the function. Then, the cursor is opened and returned. The beauty here is that the query behind the cursor can be created on the fly and compiled dynamically.

The application can fetch from the cursor just like from any other application. Here is how it works:

test=# BEGIN; 
BEGIN test=# SELECT cursor_test('mytest');
cursor_test ------------- mytest (1 row) test=# FETCH NEXT FROM mytest;
id ---- 1 (1 row) test=# FETCH NEXT FROM mytest;
id ---- 2 (1 row)

Note that it works only when a transaction block is used.

In this section, we have learned that cursors will only produce data as it is consumed. This holds true for most queries. However, there is a catch to this example; whenever an SRF is used, the entire result has to be materialized. It is not created on the fly, but instead, at once. The reason is that SQL must be able to rescan a relation, which is easily possible in the case of a normal table. However, for functions, the situation is different. Therefore, an SRF is always calculated and materialized, making the cursor in this example totally useless. In other words, we need to be careful while writing functions. In some cases, danger is hidden in the nifty details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset