Most non-trivial programs involve interacting with the host filesystem. In this chapter, you’ll learn how to open, close, delete, and rename files. The chapter also shows you how to perform file I/O using the puts
(output) and gets
(input) commands and how to use the format
command to “pretty print” output. Finally, you’ll learn how to navigate the filesystem programmatically and work with file and directory names in a platform-neutral manner.
This chapter’s game, word_search.tcl in the code directory, is a simplified version of the classic bus and plane game. It shows you a grid of space-delimited litters. Each row of letters contains an embedded word that you have to find. Each row has one word oriented left to right; there are no words (at least deliberately) oriented on vertical or diagonal axes. As a hint, the words you have to find are commands used or introduced in this chapter. You start the game executing the script. Review the game grid and when you find a word in one of the rows, type the row number, press Enter, and then type the word you found and press Enter. After the script evaluates your input, it shows you the result and asks if you want to play again. To keep the screen tidy, I use the hoary UNIX command tput clear
; on Windows, you will probably have to use the old DOS command cls
unless you are using a UNIX emulator like Cygwin. Here are a few iterations of word_search.tcl:
$ ./word_search.tcl
1 e o p e n u g r i v c
2 n l v n j c l o s e d
3 j b p u t s s z m h i
4 s q n i d g g e t s t
5 h e r r e a d e r s e
6 z o t z g v a n e r s
7 f o r m a t a l b m c
8 d h n p s e e k p g e
9 a m a j y r a t e l l
Select a line (1–9): 2
What word do you see: closed
player: 'closed' puzzle: 'closed'
Correct!
Play again (Y/n)? y
1 e o p e n u g r i v c
2 n l v n j c l o s e d
3 j b p u t s s z m h i
4 s q n i d g g e t s t
5 h e r r e a d e r s e
6 z o t z g v a n e r s
7 f o r m a t a l b m c
8 d h n p s e e k p g e
9 a m a j y r a t e l l
Select a line (1–9): 7
What word do you see: mat
Sorry.
Try again (Y/n)? y
1 e o p e n u g r i v c
2 n l v n j c l o s e d
3 j b p u t s s z m h i
4 s q n i d g g e t s t
5 h e r r e a d e r s e
6 z o t z g v a n e r s
7 f o r m a t a l b m c
8 d h n p s e e k p g e
9 a m a j y r a t e l l
Select a line (1–9): 1
What word do you see: open
Correct!
Play again (Y/n)? n
You’ve already seen and used many of the commands used in word_search.tcl. The file handling commands are new, though, as are some of the ways I’ve combined the commands. The balance of the chapter will fill in the gaps.
Before you can do much else with a file, you have to open it. When you’re done with a file, it is good practice, but not necessarily required, to close it. The syntax for opening a file is:
open name ?access? ?perms?
name
identifies the name of the file to open. If specified, access
defines the type of file access you want (see Table 8.1). Similarly, if perms
is specified, it defines the UNIX-style file permissions to set on newly created files.
open
returns a channel ID, a unique identifier or handle used to refer to the file in subsequent operations on it. Although you might not have realized it, you’ve already used channel IDs with the puts
and gets
commands. Recall that puts
writes to stdout by default (that is, puts "foo"
and puts stdout "foo"
are identical commands). stdout is a channel ID. Similarly, when you use gets
to read keyboard input, you have to write gets stdin
. stdin is another channel ID.
The access
argument indicates whether you want to read a file, write to a file, read and write a file, or append to a file. If not specified, Tcl assumes you merely want to read the file. Table 8.1 lists the possible values for access
.
Table 8.1. File Access Modes
Argument | Mode | Description |
---|---|---|
r | Read-only | Open for output: |
r+ | Read/write | Open for input and output: |
w | Write-only | Open for output: If |
w+ | Read/Write | Open for input and output; If |
a | Append | Open for output, appending data to |
a+ | Read/Write | Open for input or output, appending data to |
File permissions control who can do what to a file. I’m going to skip a tedious, detailed excursus on UNIX file permissions. Any good UNIX or Linux reference (and most bad ones, too) can get you up to speed on UNIX-style file permissions. What you most need to know is that unless otherwise specified, open
commands that result in creating files apply a default mode of 0666, which means that they are readable and writable by everyone. As a matter of habit and security, I prefer to create files with mode 0644, which means that I can read and write them, but everyone else can only read them. If you are extremely paranoid, you can use a mode of 0600, which means that you can read and write the file but no one else can.
To close a file, the syntax is quite simple:
close id
id
must be a channel ID returned by a previous open
(or socket
) command. One would think that closing a file is a simple operation, but the reality is slightly more complicated. When you issue the close
command, several tasks occur before the file is really, truly closed:
Any buffered output is flushed to disk.
Any buffered input is discarded.
The associated disk file or I/O device is closed.
The channel ID is invalidated and cannot be used for subsequence I/O operations.
Although not strictly necessary, I vigorously encourage you to close files explicitly. When your script exits, any files that you opened will be closed. In the nominal case, this is fine. However, long-running scripts or scripts that open lots of files might use up operating system resources (on UNIX and UNIX-like systems, for example, file descriptors are a finite resource), so get into the habit of closing your files.
The following short script (open.tcl in this chapter’s code directory) illustrates opening and closing a text file. The text file, sonnet20.txt, is Shakespeare’s Sonnet XX and is also included in this chapter’s code directory:
set fileId [open sonnet20.txt r] puts "opened 'sonnet20.txt' with channel ID '$fileId'" close $fileId if {[catch {set fileId [open sonnet21.txt r+]} err]} { puts "open failed: $err" return 1 } else { puts "opened 'sonnet21.txt' with channel ID '$fileId'" close $fileId }
Here’s what the output should look like when you execute this script:
$ ./open.tcl
opened 'sonnet20.txt' with channel ID 'file5'
open failed: couldn't open "sonnet21.txt": no such file or directory
The first block of code opens the file sonnet20.txt in read-only mode, storing the returned ID in the $fileId
variable. After opening the file, I promptly close it.
The second block of code attempts to open sonnet21.txt in read-write mode. However, because this is a cooked-up example and I knew that sonnet21.txt didn’t exist, I embedded the open
command in a catch
statement to illustrate how to handle file access errors. If open
fails for some reason, it raises an error. In the absence of the catch
command, you’d see the standard, ugly Tcl stack trace followed by an abrupt, graceless exit. My error handler is only slightly more graceful and attractive, but the point I want to emphasize is that in real-world code, you need to code defensively and try to anticipate possible or common errors (such as files not existing).
The code samples in this book set a bad example. For clarity and brevity, most of the scripts in this book don’t include error handling. Do as I say, not as I do!
If you review Table 8.1, you’ll see that most of open
’s access modes will create files if they don’t exist (w, w+, a
, and a+
). The key difference is that write operations (w
and w+
) will truncate a file that already exists (provided you have write permissions for the file), whereas append operations (a
and a+
) don’t truncate an existing file. Rather, when you append to an existing file, it is opened for writing, and the file pointer is positioned at the end of the file so that write operations don’t overwrite existing data. The following scripts, trunc.tcl and append.tcl in this chapter’s code directory, illustrate the difference. trunc.tcl opens an existing file, junk, for writing, and then closes it:
set fileId [open junk w] puts "opened 'junk' with channel ID '$fileId'" close $fileId
Before you run this script, make sure a file named junk exists in the directory from which you execute the script. On Linux, UNIX, and Mac OS X, you could execute the command touch junk
to create an empty, zero-length file named junk.
Using ls -l
before and after running the script, you can see what happens:
$ touch junk $ ls -l junk -rw-r--r-- 1 kwall kwall 110622 2007-08-03 01:39 junk $ ./trunc.tcl opened 'junk' with channel ID 'file5' $ ls -l junk -rw-r--r-- 1 kwall kwall 0 2007-08-03 01:42 junk
You’ll need a file named “junk” for this script to work and, naturally, the output of the ls
commands will be different.
append.tcl, on the other hand, opens junk in append mode, which preserves its contents:
set fileId [open junk a] puts "opened 'junk' with channel ID '$fileId'" puts $fileId [info vars] close $fileId
As you can see from the following commands, appending a file for appending leaves its existing contents remain intact and adds new data to the end of the file:
$ date > junk $ cat junk Sat Sep 1 20:04:05 EDT 2007 $ ./append.tcl opened 'junk' with channel ID 'file5' $ cat junk Sat Sep 1 20:04:05 EDT 2007 tcl_rcFileName tcl_version argv0 argv tcl_interactive fileId errorCode auto_path error- Info env tcl_pkgPath tcl_patchLevel argc tcl_libPath tcl_library tcl_platform
First, I redirect the output of the date command to the file named junk and then cat junk’s contents. After I execute the append.tcl
script, I cat junk a second time to show the the data I added was put at the end of the file and that its existing contents untouched.
In an unfortunate bit of perversity, the access modes r+ and w+ open a file for both reading and writing. In the case of r+, the file must exist. If you specify the w+ mode, the file will be created if it doesn’t exist. The perversity to which I refer is not that there are two modes that do (almost) the same thing, but that the “r” in r+ mnemonically suggests reading, not reading and writing. Likewise, the “w” in w+ mnemonically suggest writing, not writing and reading.
The moral of this story is to be careful when opening files for writing. If you need to preserve the existing data, use the a
or a+
mode and append data. If you don’t care about the existing contents, use w
or w+
as the situation requires.
I’m going to go out on a limb here and guess that you want to do more than just open and close files. Reading and writing them will probably be helpful. Fair enough. You have at least three options for reading a file for input: the gets
command, which you’ve already seen; the read
command; and the scan
command. Which one should you use? Here are three rules of thumb:
Use gets
to read and process one line of input at a time.
Use read
if you want to read blocks of input without regard to end-of-line markers.
Use scan
to read formatted input.
The following subsections cover the specifics of using each of these three input commands.
So far in this book, you used gets
to read input from stdin (the keyboard), using a command such as one of the following:
set line [gets stdin] get stdin line
The first gets
command reads input from stdin, discards the terminating newline, and returns the fetched line, which the set
command stores in the variable $line
. If a blank line had been read, gets
would have returned the empty string, which would have been stored in $line. To differentiate between a blank line and the end-of-file, you have to use the EOF command on the I/O channel (stdin in this case). If EOF returns 1, end-of-file has been reached; if EOF returns 0, gets
has not reached end-of-file.
The second gets
command also reads input from stdin but, in this case, stores the input in the variable $line
itself after discarding the trailing newline. In this form, gets
returns the number of characters it read (not counting the newline). If it reads a blank line, then gets
returns 0
. If it encounters the end-of-file, gets
returns -1
. For file I/O, I think this form of gets
is easiest to use because it automatically detects end-of-file, saving you from having to check for end-of-file conditions with an EOF call. This is the form of the command I’ll use for the rest of the book when dealing with input from a file. For keyboard input, I’ll continue to use the first form of the gets
command.
Applying what you learned in the previous section about I/O channels, if you replace stdin with a channel ID returned by the open
command, you can read from a file. The following script, gets.tcl in this chapter’s code directory, demonstrates opening and reading a file:
set fileId [open sonnet20.txt r] set totalChars 0 set totalLines 0 while {[set cnt [gets $fileId line]] != -1} { puts "($cnt chars) $line" incr totalChars $cnt incr totalLines } puts "read $totalChars chars" puts "read $totalLines lines" close $fileId
The first command opens sonnet20.txt for reading. The next two commands set a couple of counter variables I use while reading the input file. The most complicated part of the script is the while
loop. In English, it simply means, “Read a line of input from the file, store the input text in the variable named line
and the number of characters read in cnt
. Keep doing this until you encounter end-of-file.” Inside the while
loop, for each line read, I display the number of characters read (not counting the terminating newline) and the text of the line; then I increment the number of characters read (incr totalChars $cnt
) and the number of lines read. When gets
hits the end-of-file, control drops out of the while
loop, at which point I display the total number of characters and lines read, close the input file, and exit the program.
If you execute gets.tcl, the output should look like the following:
$ ./gets.tcl
(2 chars) XX
(0 chars)
(46 chars) A woman's face with nature's own hand painted,
(45 chars) Hast thou, the master mistress of my passion;
(42 chars) A woman's gentle heart, but not acquainted
(50 chars) With shifting change, as is false women's fashion:
(54 chars) An eye more bright than theirs, less false in rolling,
(39 chars) Gilding the object whereupon it gazeth;
(43 chars) A man in hue all 'hues' in his controlling,
(50 chars) Which steals men's eyes and women's souls amazeth.
(40 chars) And for a woman wert thou first created;
(48 chars) Till Nature, as she wrought thee, fell a-doting,
(36 chars) And by addition me of thee defeated,
(42 chars) By adding one thing to my purpose nothing.
(54 chars) But since she prick'd thee out for women's pleasure,
(53 chars) Mine be thy love and thy love's use their treasure.
(0 chars)
read 644 chars
read 17 lines
If you don’t want or need to read and process an input file line-by-line, you can use the read
command, which reads a specific number of characters or the entire file. read
’s syntax is:
read ?-nonewline? id read id numChars
id
is the file to read and numChars
, if present, indicates how many characters to read from id
. read
’s return value is the data read from id. In the first form of the command, read
reads the entire file and, if -nonewline
is specified, discards the last character of the file if it is a newline. In the second form of the command, read
reads exactly numChars
characters, unless it encounters EOF before reading the specified number of characters. In the latter case, read
returns the data it was able to read.
Before explaining why you might want to use read
instead of gets
, have a look at the following script, read.tcl in this chapter’s code directory. The source file, wssnt10.txt, is the complete text of Shakespeare’s sonnets, courtesy of Project Gutenberg (http://www.gutenberg.org/etext/1041), and is also included in the code directory:
# Read the entire file set fileId [open wssnt10.txt r] set input [read $fileId] puts "Read [string length $input] characters" close $fileId # Read the file 1024 characters at a time set fileId [open wssnt10.txt r] while {![eof $fileId]} { set input [read $fileId 1024] puts "Read [string length $input] characters" } close $fileId
In the first block of code, I read the entire file and then closed it. The second block of code reopens the file, reads it in 1024-character blocks, and then closes it. First, I have to close the input file explicitly and then reopen it before trying to read it a second time. Why? After the first
read command completes, the file pointer is positioned at the end of the file. Accordingly, the next read
or gets
command has nothing to read. Closing and reopening the file resets the file pointer to the beginning of the file. As it happens, there’s a smarter way to move the file pointer, the seek
command, which you’ll meet in the section, “Moving the File Pointer: Random Access I/O,” later in this chapter.
Second, notice that the while
condition uses the EOF command to test for an end-of-file condition on $fileId
. Unlike the gets
command, read
does not return a special value (referred to as a sentinel value, or just a sentinel) to indicate it’s at the end of the file. In fact, in the absence of the EOF command, read
would happily continue to “read” the file, it just wouldn’t return anything, so the script would be stuck in an infinite loop.
When you execute this script, the output should look like the following. I’ll only show the first and last three lines of the output here to preserve space:
$ ./read.tcl
read 107701 characters
read 1024 characters
read 1024 characters
...
read 1024 characters
read 1024 characters
read 181 characters
Hardly riveting output, but the last line bears discussion. Although the read
command requested 1024 characters, there were only 181 left in the input file, so read
returned what was available.
Why use read
instead of gets
? Suitability is one reason, but the primary reason is efficiency. In this context, suitability just means that the task you are trying to perform might not require processing a file line-by-line or that the data itself isn’t appropriate for line-by-line input. For example, a binary file can contain embedded newline characters that aren’t used for line breaks per se. In such a case, read
is the right command to use.
Although reading and processing input line-by-line with gets
is convenient and easy, it is inefficient for large files because multiple small disk read operations are much slower than a single large read that takes advantage of the disk’s read-ahead functionality. How inefficient? Consider Table 8.2. It shows the time required to read a 1GB text file using gets
, using read
to slurp up the entire file in one large read, and using read
with various block sizes.
Table 8.2. I/O Times for gets
and reads
on a 1GB File
Command | Read Size (chars) | Elapsed Time (secs) | MB/sec |
---|---|---|---|
gets | N/A | 65.9 | 15.5 |
read | N/A | 25.9 | 39.5 |
read | 64 | 68.4 | 15.0 |
read | 128 | 37.4 | 27.4 |
read | 256 | 22.4 | 45.7 |
read | 512 | 14.7 | 69.6 |
read | 1024 | 10.9 | 93.3 |
read | 2048 | 9.2 | 111.8 |
read | 4096 | 8.8 | 116.9 |
read | 8192 | 8.3 | 122.8 |
read | 16384 | 8.3 | 123.0 |
read | 32768 | 8.4 | 121.7 |
As you can see in Table 8.2, read
is much more efficient than gets
. If you want to try this experiment yourself, create a 1GB file named bigfile and execute the script readtest.tcl in the readtest subdirectory of this chapter’s code directory. Of course, the performance you see will be different on your system.
The I/O speeds reported by readtest.tcl are relative. The results are influenced by CPU speed, available memory, the other processes running on the system, the type and speed of your hard disk, the amount of on-disk cache, the filesystem type, the phase of the moon, and what you ate for lunch today. Use readtest.tcl to gain insight into the performance of gets
and read
, not to establish whether your computer is an I/O speed machine or a boat anchor.
Now that you’ve seen how to get data in to your program, I’ll show you how to get data out of it. The two workhorse Tcl commands for output are puts
, which you’ve already seen and used a good deal, and format
. puts
is great if you don’t care about how the output looks, don’t have any requirements for precisely formatted output, or if you are in a hurry. The format
command is the tool to use if you do care how the output looks, do have requirements for carefully formatted output, and can take a little bit longer to write your script (but only a little bit longer).
As explained and shown in previous chapters, puts
writes data to an output channel. So far, the output “channels” have been the screen, specifically, standard output and standard error (stdout and stderr, respectively) and, as you saw earlier in this chapter, disk files. In the general case, though, a channel is any stream capable of receiving output. So, in addition to stdout, stderr, and file IDs returned by the open
command, puts
can also write a network socket created by the socket
command (I don’t discuss network I/O in this book) or an output medium created by a Tcl extension. For example, you can use puts
to send data to a printer or to a serial device (such as a mouse or a modem) if you have an output channel that has been set up for such a purpose.
To simplify the presentation, I’ve glossed over some of puts
’ subtleties because they are fine points that would obscure the point I am trying to make. For example, Tcl buffers output, so data you want to print using puts
won’t appear until the buffer is full or the buffer is specifically flushed (using the flush
command). Buffering is handled by the underlying operating system, although you can modify buffering behavior using special-purpose Tcl commands.
Another issue I haven’t addressed is how puts
handles newlines. For better or worse, each of the major operating systems uses different end-of-line (EOL) sequences differently. Linux, UNIX, Macintosh OS X, and related systems use a linefeed character (
) to indicate EOL; Macintosh systems before OS X use a carriage return (
); and Microsoft Windows (and MS-DOS and OS/2) use a carriage return followed by a linefeed (
). In large part, you don’t have to concern yourself with this because Tcl handles the EOL translations for you automatically, converting EOLs to the character sequence appropriate for the host operating system. However, you can modify this behavior using the fconfigure
command. Again, this is an advanced topic I won’t cover in this book.
The point to take away from this discussion is that Tcl and puts
by and large do the right thing with respect to output. If you find you need greater control, the capability is there. In the meantime, you can use puts
for output and be blissfully ignorant of its under-the-covers details.
If you have ever written C, chances are very good that you have used C’s printf()
function to print formatted output. Tcl’s format
command is much like printf()
. The biggest difference is that format
doesn’t print the string it formats, it just returns the formatted string. Printing the formatted string is handled with the puts
command. format
’s syntax is:
format spec ?val ...?
format
formats one or more values, specified by val
in the syntax diagram, according to the format specification defined by spec
. The format specification can consist of up to six parts:
A position specifier
Zero or more flags
A field width
A precision specifier
A word length specifier
A conversion character
I’m going to focus on the items on the flags: field width, precision specifier, and conversion character. The position and word-length specifiers are less commonly used and are used in situations this book won’t cover. Each argument of the format specifier begins with a percent sign, %
, followed by zero or more modifiers, and ends with a conversion character.
Conversion characters indicate how to print, or convert, the corresponding argument in the value list. Although conversion characters appear last in the format specification, I cover them first so you’ll know what you’re trying to format. Table 8.3 lists the most frequently used conversion characters.
Table 8.3. Common Format Conversion Characters
Character | Description |
---|---|
c | Displays an integer as the ASCII character it represents. |
d | Signed integer. |
f | Floating point value in m.n format. |
s | String. |
u | Unsigned integer. |
X | Unsigned hex value in uppercase format. |
x | Unsigned hex value in lowercase format. |
A complete list of conversion characters is available in the format
man page (man 3tcl format). For example, to format a string, you would use the command, format "%s" "
string to format
"
. The command format "%d:%x"
int_val hex_val
would format a signed integer, followed by a literal colon, followed by a lowercase hexadecimal value. Although not specifically necessary, I use double quotes around the format specifier as a matter of habit. If the format specifier or the value to format contains embedded spaces, the quotes would be necessary.
Flags are modifiers used to specify padding and justification of the formatted output. Table 8.4 lists the valid flags.
Table 8.4. Valid Format Flags
Flag | Description |
---|---|
- | Left-justify the field. |
+ | Right-justify the field. |
0 | Pad with zeros. |
# | Print hex numbers with a leading 0x, octal numbers with a leading 0. |
space | Precede a number with a space unless a sign is specified. |
After the flags, you can specify a minimum field width and an optional precision value. For example, to format the floating point value 1.98, you could use any of the following commands (see format.tcl in this chapter’s code directory):
puts [format "%f" 1.98] puts [format "%5f" 1.98] puts [format "%5.2f" 1.98]
The first command uses the default floating point formatting (%f
). The second command uses a field width of 5 (%5f
). The third command uses the same field width and adds a precision specifier (%5.2f
). These commands correspond to the following output:
1.980000 1.980000 1.98
On my OS X system, the second line of output was not indented as it should have been. This is known as A Bug. Most of the example scripts in this chapter use format
commands, so you can refer to these scripts for more examples of using the format
command. The man page has complete details.
Earlier in this chapter, I noted that a read
operation advances the file pointer through an I/O channel. In an example, I closed and reopened the input file to reposition the file pointer at the beginning of the file. While this type of sequential I/O is a common operation, you often want or need to read from arbitrary file locations or need to be able to reposition the file pointer without closing and reopening the file. Tcl’s seek
and tell
commands provide this ability, which is referred to as random access I/O.
As an I/O operation proceeds, the file pointer’s current position in the file, known as the seek offset, can be determined by using the tell
command. tell
’s syntax is:
tell channelID
tell
returns an integer string that indicates the current seek offset. If the specified I/O channel does not support seeking (process pipelines, for example, do not support seeking), tell
returns -1.
To move the file pointer (change the seek offset), use the aptly-named seek
command. Its syntax is:
seek channelID offset ?origin?
This command moves the file pointer offset
bytes forward or backward relative to origin
in the file referred to by channelID
. origin must be one of start
, end
, or current
and defaults to start
if not specified. offset
can be negative or positive. It is an error to seek backward (using a negative offset) from the beginning of a file but not to seek forward from the end of a file.
The following script, randread.tcl in this chapter’s code directory, shows how you might use the seek
and tell
commands:
set fileId [open wssnt10.txt r] seek $fileId 10 start set input [read $fileId 10] puts "Text between bytes 10 and 20: =>$input<=" puts "File pointer at byte: [tell $fileId]" seek $fileId -25 end set input [read $fileId 25] puts "Last 25 characters: =>$input<=" puts "File pointer at byte: [tell $fileId]" if {[catch {seek $fileId -5 start} err]} { puts "seek back from start: $err" } else { puts "seek back from start: [tell $fileId]" } if {[catch {seek $fileId 5 end} err]} { puts "seek forward from end: $err" } else { puts "seek forward from end: [tell $fileId]" } seek $fileId 0 end puts "file size: [tell $fileId] bytes" close $fileId
After opening the file, the first block of code moves the pointer 10 bytes into the file, reads the next 10 characters, and then displays the text it read between => and <= and the current position of the file pointer. The second code block positions the file pointer 25 bytes from the end of the file, reads 25 characters, and then displays the text it read and the current position of the file pointer.
The next two sections of code attempt to seek backward from the beginning of the file and forward from the end of the file. I use the catch
command so an error during either operation won’t abort the script. Notice in the output that reading backward from the beginning of the file causes an error but that reading forward from the end of the file moves the file pointer five characters forward, to offset 107706.
Positioning the file pointer past the end of the file works for several reasons. First, seek
simply reports the position of the file pointer, an operation independent of reading or writing. seek
has no idea whether you are going to read or write the underlying file. Secondly, while no filesystem of which I’m aware supports the notion of adding data to the front of a file, most (if not all) permit data to be appended to the end of a file. Accordingly, you have to be able to position the pointer past the end of the file to do so.
Finally, most filesystems allow you to create sparse files, or files that have holes in them. Such a file will have a length of N bytes, yet will contain fewer than N bytes of data. Byte ranges of files that contain no data are known as holes, and files that contain such holes are referred to as sparse files.
The last section of code shows you a trick for finding out a file’s size in bytes: seek
to the end of the file and then use tell
to get the location of the file pointer. Unfortunately, you can’t use this trick to determine the length of sparse files.
When executed, the script’s output should look like the following:
$ ./randread.tcl
Text between bytes 10 and 20: => Project G<=
File pointer at byte: 20
Last 25 characters: =>ented as Public Domain.
<=
File pointer at byte: 107706
Seek back from start: error during seek on "file5": invalid argument
Seek forward from end: 107706
File size: 107701 bytes
You can also use seek
and tell
with output operations, as demonstrated in the following script (see randwrite.tcl in this chapter’s code directory):
set fileId [open output.txt r+] seek $fileId 0 end; set oldSize [tell $fileId] seek $fileId 10 start puts "Offset before puts: [tell $fileId]" puts -nonewline $fileId [string repeat * 10] puts "Offset after puts: [tell $fileId]" seek $fileId [expr $oldSize - 25] puts "Offset before puts: [tell $fileId]" puts -nonewline $fileId [string repeat * 10] puts "Offset after puts: [tell $fileId]" seek $fileId [expr $oldSize + 800] puts "Offset before puts: [tell $fileId]" puts $fileId [string repeat * 10] puts "Offset after puts: [tell $fileId]" seek $fileId 0 end puts "New file length: [tell $fileId] bytes" close $fileId
Perhaps the first question you’ll ask when you look at this script is why I open the file I want to write in r+
mode (read/write), rather than for writing or appending. To insert new data or overwrite existing data, you must read the existing data before adding new data. If I open the file in w
or w+
mode, I’ll truncate the existing file. Similarly, if I open the file in a
or a+
mode, data written to the file will wind up appended to the end of the file, regardless of where I position the file pointer before starting the write
. The behavior in the append modes is somewhat counterintuitive, but if you think about it, it is called append mode. If it really bothers you, you could write a procedure that adds insert and overwrite modes to the open
command, but that would just result in all the other Tcl programmers teasing you.
After opening the file, I seek to the end and then store its original size (actually, its byte length) in the variable $oldSize
. I’ll explain why in a moment. Next, I seek 10 bytes into the file and write 10 asterisks starting at that offset.
The next code block scribbles 10 more asterisks 25 bytes from the end of the file. In this case, though, I use the expression $oldSize–25
to calculate the offset. I do this because I want to insert data at the original EOF; after the first puts
command, the EOF has moved from byte 661 to byte 671. Schlepping around the original EOF offset enables me to write in the correct location.
The last write
adds another 10 asterisks 800 bytes past the original EOF. Again, I use $oldSize
as the reference point for the offset. After all the writing is done, I calculate and display the length of the modified file and close the file.
To execute this script and verify for yourself that it behaves as I’ve described, use the following sequence of commands:
$ ccp sonnet20.txt output.txt $ ls -l sonnet20.txt output.txt -rw-r--r-- 1 kwall kwall 661 2007-08-08 03:18 output.txt -rw-r--r-- 1 kwall kwall 661 2007-08-06 23:30 sonnet20.txt $ ./randwrite.tcl Offset before puts: 10 Offset after puts: 20 Offset before puts: 636 Offset after puts: 646 Offset before puts: 1461 Offset after puts: 1469 New file length: 1469 bytes $ ls -l sonnet20.txt output.txt -rw-r--r-- 1 kwall kwall 1472 2007-08-08 03:13 output.txt -rw-r--r-- 1 kwall kwall 661 2007-08-06 23:30 sonnet20.txt $ diff -a sonnet20.txt output.txt 3c3 < A woman's face with nature's own hand painted, --- > A woma**********ith nature's own hand painted, 16c16 < Mine be thy love and thy love's use their treasure. --- > Mine be thy love and thy lov**********eir treasure. 17a18 > **********
The cp
command creates a copy of sonnet20.txt. The ls
command verifies that the two files are identical. After executing randwrite.tcl, the second ls
command shows that the two files have different sizes. The diff
command, finally, shows the actual differences between the original file and its modified copy.
The seek
and tell
commands calculate file positions in terms of bytes, or, rather, byte offsets. However, the read
command operates in terms of character offsets. In most situations, this distinction doesn’t matter because in the ASCII character set, each character is one byte long. Thus, reading five characters grabs five bytes of data. The distinction becomes important when you work with multibyte character sets (such as Asian language character sets), which use multiple bytes to encode a single character. For the purposes of this book, one byte equals one character; just be aware that this is not always the case.
Like any proper programming language, Tcl enables you move around in the filesystem and create, delete, and rename directories. When a Tcl script begins executing, its working directory is the directory from which it was invoked. To change your working directory, use the cd
command. If you want to find out the current working directory, use the pwd
command. The syntax of these commands is:
cd ?dirName?
pwd
If you omit dirName
, cd
sets the script’s working directory to the directory specified by the $HOME
environment variable. If $HOME
is not set or the directory it references does not exist, cd
raises an error and the script aborts. After successful execution, cd
returns the empty string.
pwd
returns the absolute pathname
of the current directly. The short script that follows illustrates cd
and pwd
(see dirs.tcl in this chapter’s code directory):
puts "Current directory: [pwd]" cd /tmp puts "Current directory: [pwd]" cd puts "Current directory: [pwd]"
The output from this script is what you would expect:
$ pwd /home/kwall/tclbook/08 $ ./dirs.tcl Current directory: /home/kwall/tclbook/08 Current directory: /tmp Current directory: /home/kwall $ pwd /home/kwall/tclbook/08
As you can see, after the script terminates, the working directory of my shell is unchanged. This is because the Tcl script executes in a subshell, so when the subshell terminates, any changes it made to its execution environment (such as the initial working directory) are destroyed.
As I noted at the beginning of the chapter, what’s new in word_search.tcl is the file handling and the way commands are combined to get the desired results.
#!/usr/bin/tclsh # word_search.tcl # Find words embedded in a string of letters stored in a text file # # Block 0 # # Read the puzzle data from a file proc ReadPuzzle {srcFile} { global starts stops lines # Open the puzzle file set fileId [open $srcFile r] # Read the source file while {[gets $fileId input] > -1} { lappend starts [lindex $input 0] lappend stops [lindex $input 1] lappend lines [lrange $input 2 end] } # Close the source file close $fileId } # Clear the screen and redraw the puzzle proc DisplayPuzzle {} { global starts lines # Display the puzzle exec clear >@ stdout for {set i 0} {$i < [llength $starts]} {incr i} { puts [format "%-4d%s" [expr $i + 1] [lindex $lines $i]] } } # Get the line on which the player wants to work proc GetPlayerLine {min max} { puts -nonewline " Select a line (1-9): " flush stdout set playerLine [gets stdin] # Did player choose a valid line number? if {$playerLine < $min || $playerLine > $max} { puts "Select a line number between $min and $max" exit 1 } return $playerLine } # Get the word the player found proc GetPlayerWord {} { puts -nonewline "What word do you see: " flush stdout set playerWord [gets stdin] return $playerWord } # Compare the player's guess to the correct answer proc GuessCorrect {playerLine playerWord} { global starts stops lines # Did user guess correctly? set start [expr [lindex $starts [expr $playerLine - 1]] - 1] set stop [expr [lindex $stops [expr $playerLine - 1]] - 1] set line [lindex $lines [expr $playerLine - 1]] set puzzleWord [join [lrange $line $start $stop] ""] if {[string match -nocase $puzzleWord $playerWord]} { return true } else { return false } } # # Block 1 # # Main game loop ReadPuzzle puzzle.txt set continue "y" while {$continue ne "n"} { DisplayPuzzle set playerLine [GetPlayerLine 1 9] set playerWord [GetPlayerWord] if {[GuessCorrect $playerLine $playerWord] == true} { puts "Correct!" puts -nonewline "Play again (Y/n)? " } else { puts "Sorry." puts -nonewline "Try again (Y/n)? " } flush stdout set continue [string tolower [gets stdin]] }
Most of the code is in Block 0, the procedure definitions. The first procedure, ReadPuzzle
, opens the puzzle data file passed as an argument and splits the data into three lists. To make sense of the data parsing, have a look at a sample data line from puzzle.txt:
2 5 e o p e n u g r i v c
Each row of data translates to a single row in the game grid. The data points are space-delimited. The first two values contain the starting and ending locations of the word in that row, and the rest of the data (11 letters) constitutes the row to display on the game grid. For example, in the record above, the word of interest begins in the second column and ends in the fifth column of the row. The columns are numbered from one, so the word in this row is “open.” The while
loop reads the data file line-by-line and uses lappend
to create three ordered lists of starting and stopping locations and the text lines (starts
, stop
, and lines
, respectively).
DisplayPuzzle
uses a UNIX-specific command, clear
, to clear
the screen between each round. Because clear
is an external command, not a Tcl built-in, I use the Tcl exec
command to execute and redirect clear
’s ouput to stdout. The balance of the procedure is a simple for
loop that uses the format
command to display a nicely formatted line, consisting of the row number and the letters. I use the length of one of the lists as the loop control value; each of the three lists has the same length, so I could have used any of them.
GetPlayerLine
solicits the row number in which the player is interested. The min
and max
arguments set the minimum and maximum values for the row number. If the player inputs a number outside of that range, the script terminates after printing a short usage message. Otherwise, GetPlayerLine
returns the line number the user entered. GetPlayerWord
asks the player to type in the word and returns it to the calling procedure.
The GuessCorrect
procedure is word_search.tcl’s workhorse. It accepts two arguments, the line number entered in GetPlayerLine
and the word entered in GetPlayerWord
, and then it compares the player’s guess to the target word embedded in the data line. It returns true
if the player’s word matches the puzzle’s word and false
otherwise. I use list manipulation to extract the target word from the puzzle data. Recall that lists are indexed from zero. The line number displayed to the player and the starting and ending points for each word in the data file, however, are indexed from one. Accordingly, to extract the correct data, I have to subtract 1
from both $playerLine
and from the index value passed to the lindex
commands. I use the join
command to convert the list of discrete letters returned by the lrange
to a proper string. This step is necessary because lrange
returns a list of elements that are separated by spaces, and I need a string to perform the comparison in the string match
command.
Block 1, as you can see, is short and to the point. It invokes ReadPuzzle
, sets the game loop control variable to y
, and then enters the game play loop. The while
loop displays the game grid, calls GetPlayerLine
and GetPlayerWord
to set up the comparison, and then calls GuessCorrect
to evaluate the guess. It displays the result and then asks the player to play again. The way the enclosing while
loop is written, gameplay terminates if the player enters anything but Y or y.
Here are some exercises you can try to practice what you learned in this chapter:
8.1 Modify the GetPlayerLine
procedure to loop until the player enters a line number between min
and max
, inclusive, rather than terminating.
8.2 Modify the while
loop in Block 1 so that only N or n will cause the game to exit.
8.3 Modify the code to support keeping score. The score should include how many words players guess correctly and incorrectly and how many total guesses the players made. Show a scoring percentage in addition to the raw scores for right and wrong guesses.
You won’t get very far in your Tcl programming before it will become desirable, if not downright necessary, to read and write files. Use open
and close
to create I/O channels, the essential first step for performing file I/O. The gets
and read
commands can be used to read files, while the puts
command works for writing files. If you prefer attractive, easy-to-read output, you’ll spend quality time with the format
command. Sequential file I/O is often the appropriate way to access files, but there are many situations in which you know exactly where in a file you need to be. In other cases, you might want to update a particular piece of data in a file. In such situations, random file access, brought to you by seek
and tell
, are the tickets to file I/O happiness.
This chapter concludes your whirlwind introduction to Tcl programming. With the material in these first eight chapters, and plenty of practice, you have everything you need to get started writing GUI programs using Tcl’s graphical counterpart, Tk.