Understanding memory

When a procedure is written, it makes sense to think for a moment about memory consumption. There are three important issues here:

  • Procedures and cursors
  • Handling set returning functions
  • Assigning memory parameters to functions

In this section, you will be guided through all of these topics.

Procedures and cursors

The first thing to be discussed is the way large result sets should be handled inside a procedure. In normal programming, the general rule is that in the case of large amounts of data, a cursor is needed. The idea behind a cursor is that only small portions of data are fetched and processed at a time.

Languages such as PL/pgSQL provide functionality to open, close, and fetch data from cursors. However, in most cases, you don't actually need to rely on this functionality. The beauty of a procedural language is that internally, everything is a cursor! Here is an example:

CREATE FUNCTION process() RETURNS int AS
$$
        DECLARE
                v_rec           RECORD;
        BEGIN
                FOR v_rec IN SELECT * FROM large_table
                LOOP
                        -- do something
                END LOOP;

                RETURN 0;
        END;
$$ LANGUAGE 'plpgsql';

In this example, the procedure loops through a large result set. PL/pgSQL will automatically use an implicit cursor. You don't have to worry about memory at all because internally, data is processed smoothly without ever materializing completely in the memory.

In short, there is no need to fool around with cursor handling and explicit, manual cursor management.

Handling set-returning functions

While loops are processed nicely by the system, there are functions that do need some attention. I am talking about the so-called set-returning functions. These are functions that actually return more than a row. The set-returning function that is most widely used in this book is generate_series:

SELECT * FROM generate_series(1, 10) AS x;

The main issue here is that PostgreSQL has to calculate the entire function, store its output somewhere, and continue from there. This means that if you call generate_series (or a similar function) with a large range, everything has to be kept in the memory as long as things are needed. Dealing with two thousand rows or so is definitely not a problem here, as modern hardware is easily able to handle that. However, if you want to return hundreds of millions of rows, you need to switch to a different technique. You should not return such large result sets with a set-returning function.

In many cases, returning a cursor can help.

Assigning memory parameters to functions

Tuning PostgreSQL also involves modifying memory parameters in postgresql.conf. In many cases, setting things globally is quite sufficient. Most applications seen in the field these days are perfectly fine with global settings. However, in some cases, passing config parameters directly to a function can make sense.

Consider the following example:

CREATE FUNCTION func() RETURNS int AS
$$
        DECLARE
        BEGIN
                -- some code
                RETURN 0;
        END;
$$ LANGUAGE 'plpgsql' SET work_mem = '1 GB';

SELECT func();

The most important issue is the SET statement at the end of the CREATE FUNCTION statement. It tells the system that inside this function, 1 GB of work_mem integer should be available. Outside the function, other values can be used.

You can specify any of the runtime-alterable settings within the function definition. One example is of time zones. Maybe, you want to run your function according to UTC time. No problem! Just tell the function to do so.

Of course, you can change those settings later on with ALTER FUNCTION if it turns out that a change is needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset