As we’ve explained in the section “Mounting Filesystems” in Chapter 10, partitions on local hard disks are accessed by mounting them onto a directory in the Linux filesystem. To be able to read and write to a specific filesystem, the Linux kernel needs to have support for it.
Linux has filesystem drivers that can read and write files on the traditional FAT filesystem and the newer VFAT filesystem, which was introduced with Windows 95 and supports long filenames. It also can read and (with some caveats) write to the NTFS filesystem of Windows NT/2000/XP.
In “Building a New Kernel” in Chapter 18, you learned how to
build your own kernel. In order to be able to access DOS (used by
MS-DOS and Windows 3.x) and VFAT (used by Windows 95/98/ME)
partitions, you need to enable DOS FAT
fs
support in the File
systems
section during kernel configuration. After you say
yes to that option, you can choose MSDOS fs
support
and VFAT
(Windows-95)
fs support. The first lets you mount FAT
partitions, and the second lets you mount FAT32 partitions.
If you want to access files on a Windows NT partition that
carries an NTFS filesystem, you need another driver. Activate the
option NTFS filesystem support
during the kernel configuration. This lets you mount NTFS partitions
by specifying the file system type ntfs. Note,
however, that the current NTFS driver supports just read-only
access. There is a version of this driver available that supports
writing as well, but at the time of this writing, it was still under
development, and not guaranteed to work reliably when writing to the
NTFS partition. Read the documentation carefully before installing
and using it!
While Linux is running, you can mount a Windows partition like any other type of partition. For example, if the third partition on your first IDE hard disk contains your Windows 98 installation, you can make the files in it accessible with the following command, which must be executed as root:
# mount -t vfat /dev/hda3 /mnt/windows98
The /dev/hda3 argument specifies the disk drive corresponding to the Windows 98 disk, and the /mnt/windows98 argument can be changed to any directory you’ve created for the purpose of accessing the files. But how do you know that you need (in this case) /dev/hda3? If you’re familiar with the naming conventions for Linux filesystems, you’ll know that hda3 is the third partition on the hard disk that is the master on the primary IDE port. You’ll find life easier if you write down the partitions while you are creating them with fdisk, but if you neglected to do that, you can run fdisk again to view the partition table.
The filesystem drivers support a number of options that can be
specified with the -o option of the
mount command. The mount
(8) manual page documents the options
that can be used, with sections that explain options specific to the
fat and ntfs filesystem
types. The section for fat
applies to both the msdos and
vfat filesystems, and there are two options
listed there that are of special interest.
The check option determines whether the
kernel should accept filenames that are not permissible on MS-DOS
and what it should do with them. This applies only to creating and
renaming files. You can specify three values for
check. relaxed
lets you do just about everything
with the filename. If it doesn’t fit into the 8.3 convention of
MS-DOS files, the filename will be truncated accordingly. normal
, the default, will also truncate
the filenames as needed, and also removes special characters such as
*
and ?
that are not allowed in MS-DOS
filenames. Finally, strict
forbids both long filenames and the special characters. To make
Linux more restrictive with respect to filenames on the partition
mounted in our example, the mount command could
be used as follows:
# mount -o check=strict -t msdos /dev/sda5 /mnt/dos
This option is used with msdos filesystems only; the restrictions on filename length do not apply to vfat filesystems.
The conv option can be useful, but not as
commonly as you might at first think. Windows and Unix systems have
different conventions for how a line ending is marked in text files.
Windows uses both a carriage return and a linefeed character,
whereas Unix only uses a linefeed. Although this does not make the
files completely illegible on the other system, it can still be a
bother. To tell the kernel to perform the conversion between Windows
and Unix text-file styles automatically, pass the
mount command the option
conv, which has three possible values: binary
, the default, does not perform any
conversion; text
converts every
file; and auto
tries to guess
whether the file in question is a text file or a binary file.
auto
does this by looking at the
filename extension. If this extension is included in the list of
“known binary extensions,” it is not converted; otherwise, it will
be converted.
It is not generally advisable to use text
, because this will invariably damage
any binary files , including graphics files and files written by word
processors, spreadsheets, and other programs. Likewise, auto
can be dangerous, because the
extension-based detection mechanism is not very sophisticated. So we
suggest you don’t use the conv option unless
you are sure the partition contains only text files. Stick with
binary (the default) and convert your files
manually on an as-needed basis. See "File Translation
Utilities,” later in this chapter, for directions on how to
do this.
As with other filesystem types, you can mount MS-DOS and NTFS filesystems automatically at system bootup by placing an entry in your /etc/fstab file. For example, the following line in /etc/fstab mounts a Windows 98 partition onto /win:
/dev/hda1 /win vfat defaults,umask=002,uid=500,gid=500 0 0
When accessing any of the msdos, vfat, or ntfs filesystems from Linux, the system must somehow assign Unix permissions and ownerships to the files. By default, ownerships and permissions are determined using the user ID and group ID, and umasking of the calling process. This works acceptably well when using the mount command from the shell, but when run from the boot scripts, it will assign file ownerships to root, which may not be desired. In the previous example, we use the umask option to specify the file and directory creation mask the system will use when creating files and directories in the filesystem. The uid option specifies the owner (as a numeric user ID, rather than a text name), and the gid option specifies the group (as a numeric group ID). All files in the filesystem will appear on the Linux system as having this owner and group. Since dual-boot systems are generally used as workstations by a single user, you will probably want to set the uid and gid options to the user ID and group ID of that user’s account.
One of the most prominent problems when it comes to sharing files between Linux and Windows is that the two systems have different conventions for the line endings in text files. Luckily, there are a few ways to solve this problem:
If you access files on a mounted partition on the same machine, let the kernel convert the files automatically, as described in "Filesystems and Mounting" earlier in this chapter. Use this with care!
When creating or modifying files on Linux, common editors such as Emacs and vi can handle the conversion automatically for you.
There are a number of tools that convert files from one line-ending convention to the other. Some of these tools can also handle other conversion tasks as well.
Use your favorite programming language to write your own conversion utility.
If all you are interested in is converting newline characters,
writing programs to perform the conversions is surprisingly simple.
To convert from DOS format to Unix format, replace every occurrence
of <CR><LF> (
f
or
) in the file to a newline
(
). To go the other way,
convert every newline to a <CR><LF>. For example, we
show you two Perl programs that do the job. The first, which we call
d2u, converts from DOS format to Unix
format:
#!/usr/bin/perl while (<STDIN>) { s/ $//; print }
And the following program (which we call u2d) converts from Unix format to DOS format:
#!/usr/bin/perl while (<STDIN>) { s/$/ /; print }
Both commands read the input file from the standard input, and write the output file to standard output. You can easily modify our examples to accept the input and output filenames on the command line. If you are too lazy to write the utilities yourself, you can see if your Linux installation contains the programs dos2unix and unix2dos, which work similarly to our simple d2u and u2d utilities, and also accept filenames on the command line. Another similar pair of utilities is fromdos and todos. If you cannot find any of these, then try the flip command, which is able to translate in both directions.
If you find these simple utilities underpowered, you may want to try recode, a program that can convert just about any text-file standard to any other.
The most simple way to use recode is to specify both the old and the new character sets (encodings of text-file conventions) and the file to convert. recode will overwrite the old file with the converted one; it will have the same filename. For example, to convert a text file from Windows to Unix, you would enter:
recode ibmpc:latin1 textfile
textfile
is then replaced by the
converted version. You can probably guess that to convert the same
file back to Windows conventions, you would use:
recode latin1:ibmpc textfile
In addition to ibmpc
(as
used on Windows) and latin1
(as
used on Unix), there are other possibilities available, such as
latex
for the
LATEX style of encoding diacritics and
texte
for encoding French email
messages. You can get the full list by issuing:
recode -l
If you do not like recode’s habit of overwriting your old file with the new one, you can make use of the fact that recode can also read from standard input and write to standard output. To convert dostextfile to unixtextfile without deleting dostextfile, you could use:
recode ibmpc:latin1 < dostextfile > unixtextfile
With the tools just described, you can handle text files quite comfortably, but this is only the beginning. For example, pixel graphics on Windows are usually saved as bmp files. Fortunately, there are a number of tools available that can convert bmp files to graphics file formats, such as png or xpm, that are more common on Unix. Among these are the GIMP, which is probably included with your distribution.
Things are less easy when it comes to other file formats, such as those saved by office productivity programs. Although the various incarnations of the .doc file format used by Microsoft Word have become a de facto lingua franca for word processor files on Windows, it was until recently almost impossible to read those files on Linux. Fortunately, a number of software packages have appeared that can read (and sometimes even write) .doc files. Among them are the office productivity suite KOffice, the freely available OpenOffice.org, and the commercial StarOffice 6.0, a close relative to OpenOffice.org. Be aware, though, that these conversions will never be perfect; it is very likely that you will have to manually edit the files afterward. Even on Windows, conversions can never be 100% correct; if you try importing a Microsoft Word file into WordPerfect (or vice versa), you will see what we mean.
In general, the more common a file format is on Windows, the more likely it is that Linux developers will provide a means to read or even write it. Another approach might be to switch to open file formats, such as Rich Text Format (RTF) or Extensible Markup Language (XML), when creating documents on Windows. In the age of the Internet, where information is supposed to float freely, closed, undocumented file formats are an anachronism.