In the next few sections, we’ll look at the various operations a
driver can perform on the devices it manages. An open device is
identified internally by a file
structure, and the
kernel uses the file_operations
structure to access
the driver’s functions. The structure, defined in
<linux/fs.h>
, is an array of function
pointers. Each file is associated with its own set of functions (by
including a field called f_op
that points to a
file_operations
structure). The operations are
mostly in charge of implementing the system calls and are thus named
open, read, and so on. We
can consider the file to be an “object” and the functions operating
on it to be its “methods,” using object-oriented programming
terminology to denote actions declared by an object to act on itself.
This is the first sign of object-oriented programming we see in the
Linux kernel, and we’ll see more in later chapters.
Conventionally, a file_operations
structure or a
pointer to one is called fops
(or some variation
thereof); we’ve already seen one such pointer as an argument to the
register_chrdev call. Each field in the structure
must point to the function in the driver that implements a specific
operation, or be left NULL
for unsupported
operations. The exact behavior of the kernel when a
NULL
pointer is specified is different for each
function, as the list later in this section shows.
The file_operations
structure has been slowly
getting bigger as new functionality is added to the kernel. The
addition of new operations can, of course, create portability problems
for device drivers. Instantiations of the structure in each driver
used to be declared using standard C syntax, and new operations were
normally added to the end of the structure; a simple recompilation of
the drivers would place a NULL
value for that
operation, thus selecting the default behavior, usually what you
wanted.
Since then, kernel developers have switched to a “tagged” initialization format that allows initialization of structure fields by name, thus circumventing most problems with changed data structures. The tagged initialization, however, is not standard C but a (useful) extension specific to the GNU compiler. We will look at an example of tagged structure initialization shortly.
The following list introduces all the operations that an application
can invoke on a device. We’ve tried to keep the list brief so it can
be used as a reference, merely summarizing each operation and the
default kernel behavior when a NULL
pointer is
used. You can skip over this list on your first reading and return to
it later.
The rest of the chapter, after describing another important data
structure (the file
, which actually includes a
pointer to its own file_operations
), explains the
role of the most important operations and offers hints, caveats, and
real code examples. We defer discussion of the more complex operations
to later chapters because we aren’t ready to dig into topics like
memory management, blocking operations, and asynchronous notification
quite yet.
The following list shows what operations appear in struct file_operations
for the 2.4 series of kernels, in the order
in which they appear. Although there are minor differences between 2.4
and earlier kernels, they will be dealt with later in this chapter, so
we are just sticking to 2.4 for a while. The return value of each
operation is 0 for success or a negative error code to signal an
error, unless otherwise noted.
loff_t (*llseek) (struct file *, loff_t, int);
The llseek method is used to change the current
read/write position in a file, and the new position is returned as a
(positive) return value. The loff_t
is a “long
offset” and is at least 64 bits wide even on 32-bit platforms. Errors
are signaled by a negative return value. If the function is not
specified for the driver, a seek relative to end-of-file fails, while
other seeks succeed by modifying the position counter in the
file
structure (described in Section 3.4 later in this chapter).
ssize_t (*read) (struct file *, char *, size_t, loff_t *);
Used to retrieve data from the device. A null pointer in this position
causes the read system call to fail with
-EINVAL
(“Invalid argument”). A non-negative
return value represents the number of bytes successfully read (the
return value is a “signed size” type, usually the native integer
type for the target platform).
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
Sends data to the device. If missing, -EINVAL
is
returned to the program calling the write system
call. The return value, if non-negative, represents the number of
bytes successfully written.
int (*readdir) (struct file *, void *, filldir_t);
This field should be NULL
for device files; it is
used for reading directories, and is only useful to filesystems.
unsigned int (*poll) (struct file *, struct poll_table_struct *);
The poll method is the back end of two system calls, poll and select, both used to inquire if a device is readable or writable or in some special state. Either system call can block until a device becomes readable or writable. If a driver doesn’t define its poll method, the device is assumed to be both readable and writable, and in no special state. The return value is a bit mask describing the status of the device.
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
The ioctl system call offers a way to issue
device-specific commands (like formatting a track of a floppy disk,
which is neither reading nor writing). Additionally, a few
ioctl commands are recognized by the kernel
without referring to the fops
table. If the device
doesn’t offer an ioctl entry point, the system
call returns
an error for any request that isn’t predefined
(-ENOTTY
, “No such ioctl for device”). If the
device method returns a non-negative value, the same value is passed
back to the calling program to indicate successful completion.
int (*mmap) (struct file *, struct vm_area_struct *);
mmap is used to request a mapping of device
memory to a process’s address space. If the device doesn’t implement
this method, the mmap system call returns
-ENODEV
.
int (*open) (struct inode *, struct file *);
Though this is always the first operation performed on the
device file, the driver is not required to declare a
corresponding method. If this entry is NULL
, opening
the device always succeeds, but your driver isn’t
notified.
int (*flush) (struct file *);
The flush operation is invoked when a process
closes its copy of a file descriptor for a device; it should execute
(and wait for) any outstanding operations on the device. This must not
be confused with the fsync operation requested by
user programs. Currently, flush is used only in
the network file system (NFS) code. If flush is
NULL
, it is simply not invoked.
int (*release) (struct inode *, struct file *);
This operation is invoked when the file
structure
is being released. Like open,
release can be missing.[18]
int (*fsync) (struct inode *, struct dentry *, int);
This method is the back end of the fsync system
call, which a user calls to flush any pending data. If not implemented
in the driver, the system call returns -EINVAL
.
int (*fasync) (int, struct file *, int);
This operation is used to notify the device of a change in its
FASYNC
flag. Asynchronous notification is an
advanced topic and is described in Chapter 5. The field
can be NULL
if the driver doesn’t support
asynchronous notification.
int (*lock) (struct file *, int, struct file_lock *);
The lock method is used to implement file locking; locking is an indispensable feature for regular files, but is almost never implemented by device drivers.
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
,
ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
These methods, added late in the 2.3 development cycle, implement scatter/gather read and write operations. Applications occasionally need to do a single read or write operation involving multiple memory areas; these system calls allow them to do so without forcing extra copy operations on the data.
struct module *owner;
This field isn’t a method like everything else in the
file_operations
structure. Instead, it is a pointer
to the module that “owns” this structure; it is used by the kernel
to maintain the module’s usage count.
The scull device driver implements only the
most important device methods, and uses the tagged format to declare
its file_operations
structure:
struct file_operations scull_fops = { llseek: scull_llseek, read: scull_read, write: scull_write, ioctl: scull_ioctl, open: scull_open, release: scull_release, };
This declaration uses the tagged structure initialization syntax, as we described earlier. This syntax is preferred because it makes drivers more portable across changes in the definitions of the structures, and arguably makes the code more compact and readable. Tagged initialization allows the reordering of structure members; in some cases, substantial performance improvements have been realized by placing frequently accessed members in the same hardware cache line.
It is also necessary to set the owner
field of the
file_operations
structure. In some kernel code, you
will often see owner
initialized with the rest of
the structure, using the tagged syntax as follows:
owner: THIS_MODULE,
That approach works, but only on 2.4 kernels. A more portable approach
is to use the SET_MODULE_OWNER
macro, which is
defined in
<linux/module.h>
. scull
performs this initialization as follows:
SET_MODULE_OWNER(&scull_fops);
This macro works on any structure that has an owner
field; we will encounter this field again in other contexts later in
the book.
[18] Note that
release isn’t invoked every time a process calls
close. Whenever a file
structure is shared (for example, after a fork or
a dup), release won’t be
invoked until all copies are closed. If you need to flush pending data
when any copy is closed, you should implement the
flush method.