The most important function in a block driver is the request function, which performs the low-level operations related to reading and writing data. This section discusses the basic design of the request procedure.
When the kernel schedules a data transfer, it queues the request in a list, ordered in such a way that it maximizes system performance. The queue of requests is then passed to the driver’s request function, which has the following prototype:
void request_fn(request_queue_t *queue);
The request function should perform the following tasks for each request in the queue:
Check the validity of the request. This test is performed by the macro
INIT_REQUEST
, defined in
blk.h
; the test consists of looking for problems
that could indicate a bug in the system’s request queue handling.
Perform the actual data transfer. The CURRENT
variable (a macro, actually) can be used to retrieve the details of
the current request. CURRENT
is a pointer to
struct request
, whose fields are described in the
next section.
Clean up the request just processed. This operation is performed by
end_request, a static function whose code resides
in blk.h
. end_request
handles the management of the request queue and wakes up processes
waiting on the I/O operation. It also manages the
CURRENT
variable, ensuring that it points to the
next unsatisfied request. The driver passes the function a single
argument, which is 1 in case of success and 0 in case of failure.
When end_request is called with an argument of 0,
an “I/O error” message is delivered to the system logs (via
printk).
Loop back to the beginning, to consume the next request.
Based on the previous description, a minimal request function, which does not actually transfer any data, would look like this:
void sbull_request(request_queue_t *q) { while(1) { INIT_REQUEST; printk("<1>request %p: cmd %i sec %li (nr. %li) ", CURRENT, CURRENT->cmd, CURRENT->sector, CURRENT->current_nr_sectors); end_request(1); /* success */ } }
Although this code does nothing but print messages, running this
function provides good insight into the basic design of data transfer.
It also demonstrates a couple of features of the macros defined in
<linux/blk.h>
. The first is that, although
the while
loop looks like it will never terminate,
the fact is that the INIT_REQUEST
macro performs a
return
when the request queue is empty. The loop
thus iterates over the queue of outstanding requests and then returns
from the request function. Second, the
CURRENT
macro always describes the request to be
processed. We get into the details of CURRENT
in
the next section.
A block driver using the request function just shown will actually work—for a short while. It is possible to make a filesystem on the device and access it for as long as the data remains in the system’s buffer cache.
This empty (but verbose) function can still be run in
sbull by defining the symbol
SBULL_EMPTY_REQUEST
at compile time. If you want
to understand how the kernel handles different block sizes, you can
experiment with blksize=
on the
insmod command line. The empty
request function shows the internal workings of
the kernel by printing the details of each request.
The request function has one very important constraint: it must be atomic. request is not usually called in direct response to user requests, and it is not running in the context of any particular process. It can be called at interrupt time, from tasklets, or from any number of other places. Thus, it must not sleep while carrying out its tasks.
To understand how to build a working request
function for sbull, let’s look at how the
kernel describes a request within a struct request
.
The structure is defined in <linux/blkdev.h>
.
By accessing the fields in the request
structure,
usually by way of CURRENT
, the driver can retrieve
all the information needed to transfer data between the buffer cache
and the physical block device.[48]
CURRENT
is just a pointer into
blk_dev[MAJOR_NR].request_queue
. The following
fields of a request hold information that is useful to the
request function:
kdev_t rq_dev;
The device accessed by the request. By default, the same
request function is used for every device managed
by the driver. A single request function deals
with all the minor numbers; rq_dev
can be used to
extract the minor device being acted upon. The
CURRENT_DEV
macro is simply defined as
DEVICE_NR(CURRENT->rq_dev)
.
int cmd;
This field describes the operation to be performed; it is either
READ
(from the device) or WRITE
(to the device).
unsigned long sector;
The number of the first sector to be transferred in this request.
unsigned long current_nr_sectors;
,
unsigned long nr_sectors;
The number of sectors to transfer for the current request. The driver
should refer to current_nr_sectors
and ignore
nr_sectors
(which is listed here just for
completeness). See Section 12.4.2 later in this
chapter for more detail on nr_sectors
.
char *buffer;
The area in the buffer cache to which data should be written
(cmd==READ
) or from which data should be read
(cmd==WRITE
).
struct buffer_head *bh;
The structure describing the first buffer in the list for this request. Buffer heads are used in the management of the buffer cache; we’ll look at them in detail shortly in Section 12.4.1.1.
There are other fields in the structure, but they are primarily meant for internal use in the kernel; the driver is not expected to use them.
The implementation for the working request
function in the sbull device is shown
here. In the following code, the Sbull_Dev
serves
the same function as Scull_Dev
, introduced in
Section 3.6 in Chapter 3.
void sbull_request(request_queue_t *q) { Sbull_Dev *device; int status; while(1) { INIT_REQUEST; /* returns when queue is empty */ /* Which "device" are we using? */ device = sbull_locate_device (CURRENT); if (device == NULL) { end_request(0); continue; } /* Perform the transfer and clean up. */ spin_lock(&device->lock); status = sbull_transfer(device, CURRENT); spin_unlock(&device->lock); end_request(status); } }
This code looks little different from the empty version shown earlier;
it concerns itself with request queue management and pushes off the
real work to other functions. The first,
sbull_locate_device, looks at the device number
in the request and finds the right Sbull_Dev
structure:
static Sbull_Dev *sbull_locate_device(const struct request *req) { int devno; Sbull_Dev *device; /* Check if the minor number is in range */ devno = DEVICE_NR(req->rq_dev); if (devno >= sbull_devs) { static int count = 0; if (count++ < 5) /* print the message at most five times */ printk(KERN_WARNING "sbull: request for unknown device "); return NULL; } device = sbull_devices + devno; /* Pick it out of device array */ return device; }
The only “strange” feature of the function is the conditional
statement that limits it to reporting five errors. This is intended to
avoid clobbering the system logs with too many messages, since
end_request(0)
already prints an “I/O error”
message when the request fails. The static
counter
is a standard way to limit message reporting and is used several times
in the kernel.
The actual I/O of the request is handled by sbull_transfer:
static int sbull_transfer(Sbull_Dev *device, const struct request *req) { int size; u8 *ptr; ptr = device->data + req->sector * sbull_hardsect; size = req->current_nr_sectors * sbull_hardsect; /* Make sure that the transfer fits within the device. */ if (ptr + size > device->data + sbull_blksize*sbull_size) { static int count = 0; if (count++ < 5) printk(KERN_WARNING "sbull: request past end of device "); return 0; } /* Looks good, do the transfer. */ switch(req->cmd) { case READ: memcpy(req->buffer, ptr, size); /* from sbull to buffer */ return 1; case WRITE: memcpy(ptr, req->buffer, size); /* from buffer to sbull */ return 1; default: /* can't happen */ return 0; } }
Since sbull is just a RAM disk, its “data transfer” reduces to a memcpy call.
[48] Actually, not all blocks passed to a block driver need be in the buffer cache, but that’s a topic beyond the scope of this chapter.