Fetching files with file_fdw

In some cases, it can make sense to read a file from disk and expose it to PostgreSQL as a table. This is exactly what you can achieve with the file_fdw extension. The idea is to have a module that allows you to read data from a disk and query it using SQL.

Installing the module works as expected:

CREATE EXTENSION file_fdw;

Now, we need to create a virtual server:

CREATE SERVER file_server 
FOREIGN
DATA WRAPPER file_fdw;

file_server is based on the file_fdw extension foreign data wrapper, which tells PostgreSQL how to access the file.

To expose a file as a table, use the following command:

CREATE FOREIGN TABLE t_passwd 
(
username text,
passwd text,
uid int,
gid int,
gecos text,
dir text,
shell text
) SERVER file_server
OPTIONS (format 'text', filename '/etc/passwd', header 'false', delimiter ':');

In this example, the /etc/passwd file will be exposed. All the fields have to be listed and the data types have to be mapped accordingly. All of the additional important information is passed to the module using options. In this example, PostgreSQL has to know the type of the file (text), the name and path of the file, as well as the delimiter. It is also possible to tell PostgreSQL whether there is a header. If the setting is true, the first line will be skipped and deemed not important. Skipping headers is especially important if you happen to load a CSV file.

Once the table has been created, it is possible to read data:

SELECT * FROM t_passwd;

Unsurprisingly, PostgreSQL returns the content of /etc/passwd:

test=# x
Expanded display is on.
test=# SELECT * FROM t_passwd LIMIT 1;
-[ RECORD 1 ]-------
username | root
passwd | x
uid | 0
gid | 0
gecos | root
dir | /root
shell | /bin/bash

When looking at the execution plan, you will see that PostgreSQL uses what is known as a foreign scan to fetch the data from the file:

test=# explain (verbose true, analyze true) SELECT * FROM t_passwd;
QUERY PLAN
----------------------------------------------------------------------
Foreign Scan on public.t_passwd (cost=0.00..2.80 rows=18 width=168)
(actual time=0.022..0.072 rows=61 loops=1)
Output: username, passwd, uid, gid, gecos, dir, shell
Foreign File: /etc/passwd
Foreign File Size: 3484
Planning time: 0.058 ms
Execution time: 0.138 ms
(6 rows)

The execution plan also tells us about the file's size and so on. Since we're talking about the planner, there is a side note that is worth mentioning: PostgreSQL will even fetch statistics for the file. The planner checks the file size and assigns the same costs to the file, just like it would to a normal PostgreSQL table of the same size.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset