Chapter 12. Streams and Files

In this chapter, we cover the methods for handling files and directories as well as the methods for actually writing and reading data. This chapter also shows you the object serialization mechanism that lets you store objects as easily as you can store text or numeric data. Next, we turn to several improvements that were made in the “new I/O” package java.nio, introduced in JDK 1.4 We finish the chapter with a discussion of regular expressions, even though they are not actually related to streams and files. We couldn’t find a better place to handle that topic, and apparently neither could the Java team—the regular expression API specification was attached to the specification request for the “new I/O” features of JDK 1.4.

Streams

Input/output techniques are not particularly exciting, but without the ability to read and write data, your programs are severely limited. This chapter is about how to get input from any source of data that can send out a sequence of bytes and how to send output to any destination that can receive a sequence of bytes. These sources and destinations of byte sequences can be—and often are—files, but they can also be network connections and even blocks of memory. There is a nice payback to keeping this generality in mind: for example, information stored in files and information retrieved from a network connection are handled in essentially the same way. (See Volume 2 for more information about programming with networks.) Of course, while data are always ultimately stored as a sequence of bytes, it is often more convenient to think of data as having some higher-level structure such as being a sequence of characters or objects. For that reason, we dispense with low-level input/output quickly and focus on higher-level facilities for the majority of the chapter.

In the Java programming language, an object from which we can read a sequence of bytes is called an input stream. An object to which we can write a sequence of bytes is called an output stream. These are specified in the abstract classes InputStream and OutputStream. Because byte-oriented streams are inconvenient for processing information stored in Unicode (recall that Unicode uses two bytes per code unit), there is a separate hierarchy of classes for processing Unicode characters that inherit from the abstract Reader and Writer classes. These classes have read and write operations that are based on two-byte Unicode code units rather than on single-byte characters.

You saw abstract classes in Chapter 5. Recall that the point of an abstract class is to provide a mechanism for factoring out the common behavior of classes to a higher level. This leads to cleaner code and makes the inheritance tree easier to understand. The same game is at work with input and output in the Java programming language.

As you will soon see, Java derives from these four abstract classes a zoo of concrete classes. You can visit almost any conceivable input/output creature in this zoo.

Reading and Writing Bytes

The InputStream class has an abstract method:

abstract int read()

This method reads one byte and returns the byte that was read, or –1 if it encounters the end of the input source. The designer of a concrete input stream class overrides this method to provide useful functionality. For example, in the FileInputStream class, this method reads one byte from a file. System.in is a predefined object of a subclass of InputStream that allows you to read information from the keyboard.

The InputStream class also has nonabstract methods to read an array of bytes or to skip a number of bytes. These methods call the abstract read method, so subclasses need to override only one method.

Similarly, the OutputStream class defines the abstract method

abstract void write(int b)

which writes one byte to an output location.

Both the read and write methods can block a thread until the byte is actually read or written. This means that if the stream cannot immediately be read from or written to (usually because of a busy network connection), Java suspends the thread containing this call. This gives other threads the chance to do useful work while the method is waiting for the stream to again become available. (We discuss threads in depth in Volume 2.)

The available method lets you check the number of bytes that are currently available for reading. This means a fragment like the following is unlikely to ever block:

int bytesAvailable = in.available();
if (bytesAvailable > 0)
{
   byte[] data = new byte[bytesAvailable];
   in.read(data);
}

When you have finished reading or writing to a stream, close it by calling the close method. This call frees up operating system resources that are in limited supply. If an application opens too many streams without closing them, system resources may become depleted. Closing an output stream also flushes the buffer used for the output stream: any characters that were temporarily placed in a buffer so that they could be delivered as a larger packet are sent off. In particular, if you do not close a file, the last packet of bytes may never be delivered. You can also manually flush the output with the flush method.

Even if a stream class provides concrete methods to work with the raw read and write functions, Java programmers rarely use them because programs rarely need to read and write streams of bytes. The data that you are interested in probably contain numbers, strings, and objects.

Java gives you many stream classes derived from the basic InputStream and OutputStream classes that let you work with data in the forms that you usually use rather than at the low byte-level.

Reading and Writing Bytes
java.io.InputStream 1.0
  • abstract int read()

    reads a byte of data and returns the byte read. The read method returns a –1 at the end of the stream.

  • int read(byte[] b)

    reads into an array of bytes and returns the actual number of bytes read, or –1 at the end of the stream. The read method reads at most b.length bytes.

  • int read(byte[] b, int off, int len)

    reads into an array of bytes. The read method returns the actual number of bytes read, or –1 at the end of the stream.

    Parameters:

    b

    The array into which the data is read

     

    off

    The offset into b where the first bytes should be placed

     

    len

    The maximum number of bytes to read

  • long skip(long n)

    skips n bytes in the input stream. Returns the actual number of bytes skipped (which may be less than n if the end of the stream was encountered).

  • int available()

    returns the number of bytes available without blocking. (Recall that blocking means that the current thread loses its turn.)

  • void close()

    closes the input stream.

  • void mark(int readlimit)

    puts a marker at the current position in the input stream. (Not all streams support this feature.) If more than readlimit bytes have been read from the input stream, then the stream is allowed to forget the marker.

  • void reset()

    returns to the last marker. Subsequent calls to read reread the bytes. If there is no current marker, then the stream is not reset.

  • boolean markSupported()

    returns true if the stream supports marking.

Reading and Writing Bytes
java.io.OutputStream 1.0
  • abstract void write(int n)

    writes a byte of data.

  • void write(byte[] b)

    writes all bytes in the array b.

  • void write(byte[] b, int off, int len)

    writes a range of bytes in the array b.

    Parameters:

    b

    The array from which to write the data

     

    off

    The offset into b to the first byte that will be written

     

    len

    The number of bytes to write

  • void close()

    flushes and closes the output stream.

  • void flush()

    flushes the output stream, that is, sends any buffered data to its destination.

The Complete Stream Zoo

Unlike C, which gets by just fine with a single type FILE*, Java has a whole zoo of more than 60 (!) different stream types (see Figures 12–1 and 12–2). Library designers claim that there is a good reason to give users a wide choice of stream types: it is supposed to reduce programming errors. For example, in C, some people think it is a common mistake to send output to a file that was open only for reading. (Well, it is not actually that common.) Naturally, if you do this, the output is ignored at run time. In Java and C++, the compiler catches that kind of mistake because an InputStream (Java) or istream (C++) has no methods for output.

Input and output stream hierarchyhierarchiesinput and output streaminput streamsclass hierarchyoutput streamsclass hierarchy

Figure 12–1. Input and output stream hierarchy

Reader and writer hierarchyhierarchiesreader and writerreadersclass hierarchywritersclass hierarchy

Figure 12–2. Reader and writer hierarchy

(We would argue that in C++, and even more so in Java, the main tool that the stream interface designers have against programming errors is intimidation. The sheer complexity of the stream libraries keeps programmers on their toes.)

C++ Note

C++ Note

ANSI C++ gives you more stream types than you want, such as istream, ostream, iostream, ifstream, ofstream, fstream, wistream, wifstream, istrstream, and so on (18 classes in all). But Java really goes overboard with streams and gives you separate classes for selecting buffering, lookahead, random access, text formatting, and binary data.

Let us divide the animals in the stream class zoo by how they are used. Four abstract classes are at the base of the zoo: InputStream, OutputStream, Reader, and Writer. You do not make objects of these types, but other methods can return them. For example, as you saw in Chapter 10, the URL class has the method openStream that returns an InputStream. You then use this InputStream object to read from the URL. As we said, the InputStream and OutputStream classes let you read and write only individual bytes and arrays of bytes; they have no methods to read and write strings and numbers. You need more capable child classes for this. For example, DataInputStream and DataOutputStream let you read and write all the basic Java types.

For Unicode text, on the other hand, as we said, you use classes that descend from Reader and Writer. The basic methods of the Reader and Writer classes are similar to the ones for InputStream and OutputStream.

abstract int read()
abstract void write(int b)

They work just as the comparable methods do in the InputStream and OutputStream classes except, of course, the read method returns either a Unicode code unit (as an integer between 0 and 65535) or –1 when you have reached the end of the file.

Finally, there are streams that do useful stuff, for example, the ZipInputStream and ZipOutputStream that let you read and write files in the familiar ZIP compression format.

Moreover, JDK 5.0 introduces four new interfaces: Closeable, Flushable, Readable, and Appendable (see Figure 12–3). The first two interfaces are very simple, with methods

void close() throws IOException
The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable

Figure 12–3. The Closeable, Flushable, Readable, and Appendable interfaces

and

void flush()

respectively. The classes InputStream, OutputStream, Reader, and Writer all implement the Closeable interface. OutputStream and Writer implement the Flushable interface.

The Readable interface has a single method

int read(CharBuffer cb)

The CharBuffer class has methods for sequential and random read/write access. It represents an in-memory buffer or a memory-mapped file (see page 694).

The Appendable interface has two methods, for appending single characters and character sequences:

Appendable append(char c)
Appendable append(CharSequence s)

The CharSequence type is yet another interface, describing minimal properties of a sequence of char values. It is implemented by String, CharBuffer, and StringBuilder/StringBuffer (see page 656).

Of the stream zoo classes, only Writer implements Appendable.

The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable
java.io.Closeable 5.0
  • void close()

    closes this Closeable. This method may throw an IOException.

The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable
java.io.Flushable 5.0
  • void flush()

    flushes this Flushable.

The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable
java.lang.Readable 5.0
  • int read(CharBuffer cb)

    attempts to read as many char values into cb as it can hold. Returns the number of values read, or -1 if no further values are available from this Readable.

The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable
java.lang.Appendable 5.0
  • Appendable append(char c)

    appends the code unit c to this Appendable; returns this.

  • Appendable append(CharSequence cs)

    appends all code units in cs to this Appendable; returns this.

The Closeable, Flushable, Readable, and Appendable interfacesCloseableFlushableReadableAppendable
java.lang.CharSequence 1.4
  • char charAt(int index)

    returns the code unit at the given index.

  • int length()

    returns the number of code units in this sequence.

  • CharSequence subSequence(int startIndex, int endIndex)

    returns a CharSequence consisting of the code units stored at index startIndex to endIndex - 1.

  • String toString()

    returns a string consisting of the code units of this sequence.

Layering Stream Filters

FileInputStream and FileOutputStream give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example,

FileInputStream fin = new FileInputStream("employee.dat");

looks in the current directory for a file named "employee.dat".

Caution

Caution

Because the backslash character is the escape character in Java strings, be sure to use \ for Windows-style path names ("C:\Windows\win.ini"). In Windows, you can also use a single forward slash ("C:/Windows/win.ini") because most Windows file handling system calls will interpret forward slashes as file separators. However, this is not recommended—the behavior of the Windows system functions is subject to change, and on other operating systems, the file separator may yet be different. Instead, for portable programs, you should use the correct file separator character. It is stored in the constant string File.separator.

You can also use a File object (see page 684 for more on file objects):

File f = new File("employee.dat");
FileInputStream fin = new FileInputStream(f);

Like the abstract InputStream and OutputStream classes, these classes support only reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin.

byte b = (byte) fin.read();

Tip

Tip

Because all the classes in java.io interpret relative path names as starting with the user’s current working directory, you may want to know this directory. You can get at this information by a call to System.getProperty("user.dir").

As you will see in the next section, if we just had a DataInputStream, then we could read numeric types:

DataInputStream din = . . .;
double s = din.readDouble();

But just as the FileInputStream has no methods to read numeric types, the DataInputStream has no method to get data from a file.

Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream and the input stream returned by the openStream method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream and the PrintWriter) can assemble bytes into more useful data types. The Java programmer has to combine the two into what are often called filtered streams by feeding an existing stream to the constructor of another stream. For example, to be able to read numbers from a file, first create a FileInputStream and then pass it to the constructor of a DataInputStream.

FileInputStream fin = new FileInputStream("employee.dat");
DataInputStream din = new DataInputStream(fin);
double s = din.readDouble();

It is important to keep in mind that the data input stream that we created with the above code does not correspond to a new disk file. The newly created stream still accesses the data from the file attached to the file input stream, but the point is that it now has a more capable interface.

If you look at Figure 12–1 again, you can see the classes FilterInputStream and FilterOutputStream. You combine their subclasses into a new filtered stream to construct the streams you want. For example, by default, streams are not buffered. That is, every call to read contacts the operating system to ask it to dole out yet another byte. If you want buffering and the data input methods for a file named employee.dat in the current directory, you need to use the following rather monstrous sequence of constructors:

DataInputStream din = new DataInputStream(
   new BufferedInputStream(
      new FileInputStream("employee.dat")));

Notice that we put the DataInputStream last in the chain of constructors because we want to use the DataInputStream methods, and we want them to use the buffered read method. Regardless of the ugliness of the above code, it is necessary: you must be prepared to continue layering stream constructors until you have access to the functionality you want.

Sometimes you’ll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream for this purpose.

PushbackInputStream pbin = new PushbackInputStream(
   new BufferedInputStream(
      new FileInputStream("employee.dat")));

Now you can speculatively read the next byte

int b = pbin.read();

and throw it back if it isn’t what you wanted.

if (b != '<') pbin.unread(b);

But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference.

DataInputStream din = new DataInputStream(
   pbin = new PushbackInputStream(
      new BufferedInputStream(
         new FileInputStream("employee.dat"))));

Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to layering stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 12–4).

ZipInputStream zin = new ZipInputStream(new FileInputStream("employee.zip"));
DataInputStream din = new DataInputStream(zin);
A sequence of filtered streamsstreamsfiltered, diagram

Figure 12–4. A sequence of filtered streams

(See the section on ZIP file streams starting on page 643 for more on Java’s ability to handle ZIP files.)

All in all, apart from the rather monstrous constructors that are needed to layer streams, the ability to mix and match streams is a very useful feature of Java!

A sequence of filtered streamsstreamsfiltered, diagram
java.io.FileInputStream 1.0
  • FileInputStream(String name)

    creates a new file input stream, using the file whose path name is specified by the name string.

  • FileInputStream(File f)

    creates a new file input stream, using the information encapsulated in the File object. (The File class is described at the end of this chapter.)

A sequence of filtered streamsstreamsfiltered, diagram
java.io.FileOutputStream 1.0
  • FileOutputStream(String name)

    creates a new file output stream specified by the name string. Path names that are not absolute are resolved relative to the current working directory. Caution: This method automatically deletes any existing file with the same name.

  • FileOutputStream(String name, boolean append)

    creates a new file output stream specified by the name string. Path names that are not absolute are resolved relative to the current working directory. If the append parameter is true, then data are added at the end of the file. An existing file with the same name will not be deleted.

  • FileOutputStream(File f)

    creates a new file output stream using the information encapsulated in the File object. (The File class is described at the end of this chapter.) Caution: This method automatically deletes any existing file with the same name as the name of f.

A sequence of filtered streamsstreamsfiltered, diagram
java.io.BufferedInputStream 1.0
  • BufferedInputStream(InputStream in)

    creates a new buffered stream with a default buffer size. A buffered input stream reads characters from a stream without causing a device access every time. When the buffer is empty, a new block of data is read into the buffer.

  • BufferedInputStream(InputStream in, int n)

    creates a new buffered stream with a user-defined buffer size.

A sequence of filtered streamsstreamsfiltered, diagram
java.io.BufferedOutputStream 1.0
  • BufferedOutputStream(OutputStream out)

    creates a new buffered stream with a default buffer size. A buffered output stream collects characters to be written without causing a device access every time. When the buffer fills up or when the stream is flushed, the data are written.

  • BufferedOutputStream(OutputStream out, int n)

    creates a new buffered stream with a user-defined buffer size.

A sequence of filtered streamsstreamsfiltered, diagram
java.io.PushbackInputStream 1.0
  • PushbackInputStream(InputStream in)

    constructs a stream with one-byte lookahead.

  • PushbackInputStream(InputStream in, int size)

    constructs a stream with a pushback buffer of specified size.

  • void unread(int b)

    pushes back a byte, which is retrieved again by the next call to read. You can push back only one byte at a time.

Parameters:

b

The byte to be read again

Data Streams

You often need to write the result of a computation or read one back. The data streams support methods for reading back all the basic Java types. To write a number, character, Boolean value, or string, use one of the following methods of the DataOutput interface:

writeChars
writeByte
writeInt
writeShort
writeLong
writeFloat
writeDouble
writeChar
writeBoolean
writeUTF

For example, writeInt always writes an integer as a 4-byte binary quantity regardless of the number of digits, and writeDouble always writes a double as an 8-byte binary quantity. The resulting output is not humanly readable, but the space needed will be the same for each value of a given type and reading it back in will be faster. (See the section on the PrintWriter class later in this chapter for how to output numbers as human-readable text.)

Note

Note

There are two different methods of storing integers and floating-point numbers in memory, depending on the platform you are using. Suppose, for example, you are working with a 4-byte int, say the decimal number 1234, or 4D2 in hexadecimal (1234 = 4 x 256 + 13 x 16 + 2). This can be stored in such a way that the first of the 4 bytes in memory holds the most significant byte (MSB) of the value: 00 00 04 D2. This is the so-called big-endian method. Or we can start with the least significant byte (LSB) first: D2 04 00 00. This is called, naturally enough, the little-endian method. For example, the SPARC uses big-endian; the Pentium, little-endian. This can lead to problems. When a C or C++ file is saved, the data are saved exactly as the processor stores them. That makes it challenging to move even the simplest data files from one platform to another. In Java, all values are written in the big-endian fashion, regardless of the processor. That makes Java data files platform independent.

The writeUTF method writes string data by using a modified version of 8-bit Unicode Transformation Format. Instead of simply using the standard UTF-8 encoding (which is shown in Table 12–1), character strings are first represented in UTF-16 (see Table 12–2) and then the result is encoded using the UTF-8 rules. The modified encoding is different for characters with code higher than 0xFFFF. It is used for backwards compatibility with virtual machines that were built when Unicode had not yet grown beyond 16 bits.

Table 12–1. UTF-8 Encoding

Character Range

Encoding

0...7F

0a6a5a4a3a2a1a0

80...7FF

110a10a9a8a7a6 10a5a4a3a2a1a0

800...FFFF

1110a15a14a13a12 10a11a10a9a8a7a6 10a5a4a3a2a1a0

10000...10FFFF

11110a20a19a18 10a17a16a15a14a13a12 10a11a10a9a8a7a6 10a5a4a3a2a1a0

Table 12–2. UTF-16 Encoding

Character Range

Encoding

0...FFFF

a15a14a13a12a11a10a9a8 a7a6a5a4a3a2a1a0

10000...10FFFF

110110b19b18 b17a16a15a14a13a12a11a10 110111a9a8 a7a6a5a4a3a2a1a0

where b19b18b17b16 = a20a19a18a17a16 -1

Because nobody else uses this modification of UTF-8, you should only use the writeUTF method to write strings that are intended for a Java virtual machine; for example, if you write a program that generates bytecodes. Use the writeChars method for other purposes.

Note

Note

See RFC 2279 (http://ietf.org/rfc/rfc2279.txt) and RFC 2781 (http://ietf.org/rfc/rfc2781.txt) for definitions of UTF-8 and UTF-16.

To read the data back in, use the following methods:

readInt

readDouble

readShort

readChar

readLong

readBoolean

readFloat

readUTF

Note

Note

The binary data format is compact and platform independent. Except for the UTF strings, it is also suited to random access. The major drawback is that binary files are not readable by humans.

Note
java.io.DataInput 1.0
  • boolean readBoolean()

    reads in a Boolean value.

  • byte readByte()

    reads an 8-bit byte.

  • char readChar()

    reads a 16-bit Unicode character.

  • double readDouble()

    reads a 64-bit double.

  • float readFloat()

    reads a 32-bit float.

  • void readFully(byte[] b)

    reads bytes into the array b, blocking until all bytes are read.

    Parameters:

    b

    The buffer into which the data is read

  • void readFully(byte[] b, int off, int len)

    reads bytes into the array b, blocking until all bytes are read.

    Parameters:

    b

    The buffer into which the data is read

     

    off

    The start offset of the data

     

    len

    The maximum number of bytes to read

  • int readInt()

    reads a 32-bit integer.

  • String readLine()

    reads in a line that has been terminated by a , , , or EOF. Returns a string containing all bytes in the line converted to Unicode characters.

  • long readLong()

    reads a 64-bit long integer.

  • short readShort()

    reads a 16-bit short integer.

  • String readUTF()

    reads a string of characters in “modified UTF-8” format.

  • int skipBytes(int n)

    skips n bytes, blocking until all bytes are skipped.

    Parameters:

    n

    The number of bytes to be skipped

Note
java.io.DataOutput 1.0
  • void writeBoolean(boolean b)

    writes a Boolean value.

  • void writeByte(int b)

    writes an 8-bit byte.

  • void writeChar(int c)

    writes a 16-bit Unicode character.

  • void writeChars(String s)

    writes all characters in the string.

  • void writeDouble(double d)

    writes a 64-bit double.

  • void writeFloat(float f)

    writes a 32-bit float.

  • void writeInt(int i)

    writes a 32-bit integer.

  • void writeLong(long l)

    writes a 64-bit long integer.

  • void writeShort(int s)

    writes a 16-bit short integer.

  • void writeUTF(String s)

    writes a string of characters in “modified UTF-8” format.

Random-Access File Streams

The RandomAccessFile stream class lets you find or write data anywhere in a file. It implements both the DataInput and DataOutput interfaces. Disk files are random access, but streams of data from a network are not. You open a random-access file either for reading only or for both reading and writing. You specify the option by using the string "r" (for read access) or "rw" (for read/write access) as the second argument in the constructor.

RandomAccessFile in = new RandomAccessFile("employee.dat", "r");
RandomAccessFile inOut = new RandomAccessFile("employee.dat", "rw");

When you open an existing file as a RandomAccessFile, it does not get deleted.

A random-access file also has a file pointer setting that comes with it. The file pointer always indicates the position of the next record that will be read or written. The seek method sets the file pointer to an arbitrary byte position within the file. The argument to seek is a long integer between zero and the length of the file in bytes.

The getFilePointer method returns the current position of the file pointer.

To read from a random-access file, you use the same methods—such as readInt and readChar—as for DataInputStream objects. That is no accident. These methods are actually defined in the DataInput interface that both DataInputStream and RandomAccessFile implement.

Similarly, to write a random-access file, you use the same writeInt and writeChar methods as in the DataOutputStream class. These methods are defined in the DataOutput interface that is common to both classes.

The advantage of having the RandomAccessFile class implement both DataInput and DataOutput is that this lets you use or write methods whose argument types are the DataInput and DataOutput interfaces.

class Employee
{  . . .
   read(DataInput in) { . . . }
   write(DataOutput out) { . . . }
}

Note that the read method can handle either a DataInputStream or a RandomAccessFile object because both of these classes implement the DataInput interface. The same is true for the write method.

Random-Access File Streamsstreamsrandom-access file streamsrandom-accessfile streamsfile streamsrandom-access
java.io.RandomAccessFile 1.0
  • RandomAccessFile(String file, String mode)

  • RandomAccessFile(File file, String mode)

    Parameters:

    file

    The file to be opened

     

    mode

    "r" for read-only mode, "rw" for read/write mode, "rws" for read/write mode with synchronous disk writes of data and metadata for every update, and "rwd" for read/write mode with synchronous disk writes of data only.

  • long getFilePointer()

    returns the current location of the file pointer.

  • void seek(long pos)

    sets the file pointer to pos bytes from the beginning of the file.

  • long length()

    returns the length of the file in bytes.

Text Streams

In the last section, we discussed binary input and output. While binary I/O is fast and efficient, it is not easily readable by humans. In this section, we will focus on text I/O. For example, if the integer 1234 is saved in binary, it is written as the sequence of bytes 00 00 04 D2 (in hexadecimal notation). In text format, it is saved as the string "1234".

Unfortunately, doing this in Java requires a bit of work, because, as you know, Java uses Unicode characters. That is, the character encoding for the string "1234" really is 00 31 00 32 00 33 00 34 (in hex). However, at the present time most environments in which your Java programs will run use their own character encoding. This may be a single-byte, double-byte, or variable-byte scheme. For example, if you use Windows, the string would be written in ASCII, as 31 32 33 34, without the extra zero bytes. If the Unicode encoding were written into a text file, then it would be quite unlikely that the resulting file would be humanly readable with the tools of the host environment. To overcome this problem, Java has a set of stream filters that bridges the gap between Unicode-encoded strings and the character encoding used by the local operating system. All of these classes descend from the abstract Reader and Writer classes, and the names are reminiscent of the ones used for binary data. For example, the InputStreamReader class turns an input stream that contains bytes in a particular character encoding into a reader that emits Unicode characters. Similarly, the OutputStreamWriter class turns a stream of Unicode characters into a stream of bytes in a particular character encoding.

For example, here is how you make an input reader that reads keystrokes from the console and automatically converts them to Unicode.

InputStreamReader in = new InputStreamReader(System.in);

This input stream reader assumes the normal character encoding used by the host system. For example, under Windows, it uses the ISO 8859-1 encoding (also known as ISO Latin-1 or, among Windows programmers, as “ANSI code”). You can choose a different encoding by specifying it in the constructor for the InputStreamReader. This takes the form

InputStreamReader(InputStream, String)

where the string describes the encoding scheme that you want to use. For example,

InputStreamReader in = new InputStreamReader(
   new FileInputStream("kremlin.dat"), "ISO8859_5");

The next section has more information on character sets.

Because it is so common to want to attach a reader or writer to a file, a pair of convenience classes, FileReader and FileWriter, is provided for this purpose. For example, the writer definition

FileWriter out = new FileWriter("output.txt");

is equivalent to

FileWriter out = new FileWriter(new FileOutputStream("output.txt"));

Character Sets

In the past, international character sets have been handled rather unsystematically throughout the Java library. The java.nio package—introduced in JDK 1.4—unifies character set conversion with the introduction of the Charset class. (Note that the s is lower case.)

A character set maps between sequences of two-byte Unicode code units and byte sequences used in a local character encoding. One of the most popular character encodings is ISO-8859-1, a single-byte encoding of the first 256 Unicode characters. Gaining in importance is ISO-8859-15, which replaces some of the less useful characters of ISO-8859-1 with accented letters used in French and Finnish, and, more important, replaces the “international currency” character Character Setscharacter setsencodingcharacter sets with the Euro symbol (Character Setscharacter setsencodingcharacter sets) in code point 0xA4. Other examples for character encodings are the variable-byte encodings commonly used for Japanese and Chinese.

The Charset class uses the character set names standardized in the IANA Character Set Registry (http://www.iana.org/assignments/character-sets). These names differ slightly from those used in previous versions. For example, the “official” name of ISO-8859-1 is now "ISO-8859-1" and no longer "ISO8859_1", which was the preferred name up to JDK 1.3. For compatibility with other naming conventions, each character set can have a number of aliases. For example, ISO-8859-1 has aliases

ISO8859-1
ISO_8859_1
ISO8859_1
ISO_8859-1
ISO_8859-1:1987
8859_1
latin1
l1
csISOLatin1
iso-ir-100
cp819
IBM819
IBM-819
819

Character set names are case insensitive.

You obtain a Charset by calling the static forName method with either the official name or one of its aliases:

Charset cset = Charset.forName("ISO-8859-1");

The aliases method returns a Set object of the aliases. A Set is a collection that we discuss in Volume 2; here is the code to iterate through the set elements:

Set<String> aliases = cset.aliases();
for (String alias : aliases)
   System.out.println(alias);

Note

Note

An excellent reference for the “ISO 8859 alphabet soup” is http://czyborra.com/charsets/iso8859.html.

International versions of Java support many more encodings. There is even a mechanism for adding additional character set providers—see the JDK documentation for details. To find out which character sets are available in a particular implementation, call the static availableCharsets method. It returns a SortedMap, another collection class. Use this code to find out the names of all available character sets:

Set<String, Charset> charsets = Charset.availableCharsets();
for (String name : charsets.keySet())
   System.out.println(name);

Table 12–3 lists the character encodings that every Java implementation is required to have. Table 12–4 lists the encoding schemes that the JDK installs by default. The character sets in Tables 12–5 and 12–6 are installed only on operating systems that use non-European languages. The encoding schemes in Table 12–6 are supplied for compatibility with previous versions of the JDK.

Table 12–3. Required Character Encodings

Charset Standard Name

Legacy Name

Description

US-ASCII

ASCII

American Standard Code for Information Exchange

ISO-8859-1

ISO8859_1

ISO 8859-1, Latin alphabet No. 1

UTF-8

UTF8

Eight-bit Unicode Transformation Format

UTF-16

UTF-16

Sixteen-bit Unicode Transformation Format, byte order specified by an optional initial byte-order mark

UTF-16BE

UnicodeBigUnmarked

Sixteen-bit Unicode Transformation Format, big-endian byte order

UTF-16LE

UnicodeLittleUnmarked

Sixteen-bit Unicode Transformation Format, little-endian byte order

Table 12–4. Basic Character Encodings

Charset Standard Name

Legacy Name

Description

ISO8859-2

ISO8859_2

ISO 8859-2, Latin alphabet No. 2

ISO8859-4

ISO8859_4

ISO 8859-4, Latin alphabet No. 4

ISO8859-5

ISO8859_5

ISO 8859-5, Latin/Cyrillic alphabet

ISO8859-7

ISO8859_7

ISO 8859-7, Latin/Greek alphabet

ISO8859-9

ISO8859_9

ISO 8859-9, Latin alphabet No. 5

ISO8859-13

ISO8859_13

ISO 8859-13, Latin alphabet No. 7

ISO8859-15

ISO8859_15

ISO 8859-15, Latin alphabet No. 9

windows-1250

Cp1250

Windows Eastern European

windows-1251

Cp1251

Windows Cyrillic

windows-1252

Cp1252

Windows Latin-1

windows-1253

Cp1253

Windows Greek

windows-1254

Cp1254

Windows Turkish

windows-1257

Cp1257

Windows Baltic

Table 12–5. Extended Character Encodings

Charset Standard Name

Legacy Name

Description

Big5

Big5

Big5, Traditional Chinese

Big5-HKSCS

Big5_HKSCS

Big5 with Hong Kong extensions, Traditional Chinese

EUC-JP

EUC_JP

JIS X 0201, 0208, 0212, EUC encoding, Japanese

EUC-KR

EUC_KR

KS C 5601, EUC encoding, Korean

GB18030

GB18030

Simplified Chinese, PRC Standard

GBK

GBK

GBK, Simplified Chinese

ISCII91

ISCII91

ISCII91 encoding of Indic scripts

ISO-2022-JP

ISO2022JP

JIS X 0201, 0208 in ISO 2022 form, Japanese

ISO-2022-KR

ISO2022KR

ISO 2022 KR, Korean

ISO8859-3

ISO8859_3

ISO 8859-3, Latin alphabet No. 3

ISO8859-6

ISO8859_6

ISO 8859-6, Latin/Arabic alphabet

ISO8859-8

ISO8859_8

ISO 8859-8, Latin/Hebrew alphabet

Shift_JIS

SJIS

Shift-JIS, Japanese

TIS-620

TIS620

TIS620, Thai

windows-1255

Cp1255

Windows Hebrew

windows-1256

Cp1256

Windows Arabic

windows-1258

Cp1258

Windows Vietnamese

windows-31j

MS932

Windows Japanese

x-EUC-CN

EUC_CN

GB2312, EUC encoding, Simplified Chinese

x-EUC-JP-LINUX

EUC_JP_LINUX

JIS X 0201, 0208, EUC encoding, Japanese

x-EUC-TW

EUC_TW

CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese

x-MS950-HKSCS

MS950_HKSCS

Windows Traditional Chinese with Hong Kong extensions

x-mswin-936

MS936

Windows Simplified Chinese

x-windows-949

MS949

Windows Korean

x-windows-950

MS950

Windows Traditional Chinese

Table 12–6. Legacy Character Encodings

Legacy Name

Description

Cp037

USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia

Cp273

IBM Austria, Germany

Cp277

IBM Denmark, Norway

Cp278

IBM Finland, Sweden

Cp280

IBM Italy

Cp284

IBM Catalan/Spain, Spanish Latin America

Cp285

IBM United Kingdom, Ireland

Cp297

IBM France

Cp420

IBM Arabic

Cp424

IBM Hebrew

Cp437

MS-DOS United States, Australia, New Zealand, South Africa

Cp500

EBCDIC 500V1

Cp737

PC Greek

Cp775

PC Baltic

Cp838

IBM Thailand extended SBCS

Cp850

MS-DOS Latin-1

Cp852

MS-DOS Latin-2

Cp855

IBM Cyrillic

Cp856

IBM Hebrew

Cp857

IBM Turkish

Cp858

Variant of Cp850 with Euro character

Cp860

MS-DOS Portuguese

Cp861

MS-DOS Icelandic

Cp862

PC Hebrew

Cp863

MS-DOS Canadian French

Cp864

PC Arabic

Cp865

MS-DOS Nordic

Cp866

MS-DOS Russian

Cp868

MS-DOS Pakistan

Cp869

IBM Modern Greek

Cp870

IBM Multilingual Latin-2

Cp871

IBM Iceland

Cp874

IBM Thai

Cp875

IBM Greek

Cp918

IBM Pakistan (Urdu)

Cp921

IBM Latvia, Lithuania (AIX, DOS)

Cp922

IBM Estonia (AIX, DOS)

Cp930

Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026

Cp933

Korean Mixed with 1880 UDC, superset of 5029

Cp935

Simplified Chinese Host mixed with 1880 UDC, superset of 5031

Cp937

Traditional Chinese Host mixed with 6204 UDC, superset of 5033

Cp939

Japanese Latin Kanji mixed with 4370 UDC, superset of 5035

Cp942

IBM OS/2 Japanese, superset of Cp932

Cp942C

Variant of Cp942

Cp943

IBM OS/2 Japanese, superset of Cp932 and Shift-JIS

Cp943C

Variant of Cp943

Cp948

OS/2 Chinese (Taiwan) superset of 938

Cp949

PC Korean

Cp949C

Variant of Cp949

Cp950

PC Chinese (Hong Kong, Taiwan)

Cp964

AIX Chinese (Taiwan)

Cp970

AIX Korean

Cp1006

IBM AIX Pakistan (Urdu)

Cp1025

IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovina, Macedonia (FYR)

Cp1026

IBM Latin-5, Turkey

Cp1046

IBM Arabic - Windows

Cp1097

IBM Iran (Farsi)/Persian

Cp1098

IBM Iran (Farsi)/Persian (PC)

Cp1112

IBM Latvia, Lithuania

Cp1122

IBM Estonia

Cp1123

IBM Ukraine

Cp1124

IBM AIX Ukraine

Cp1140

Variant of Cp037 with Euro character

Cp1141

Variant of Cp273 with Euro character

Cp1142

Variant of Cp277 with Euro character

Cp1143

Variant of Cp278 with Euro character

Cp1144

Variant of Cp280 with Euro character

Cp1145

Variant of Cp284 with Euro character

Cp1146

Variant of Cp285 with Euro character

Cp1147

Variant of Cp297 with Euro character

Cp1148

Variant of Cp500 with Euro character

Cp1149

Variant of Cp871 with Euro character

Cp1381

IBM OS/2, DOS People’s Republic of China (PRC)

Cp1383

IBM AIX People’s Republic of China (PRC)

Cp33722

IBM-eucJP - Japanese (superset of 5050)

ISO2022CN

ISO 2022 CN, Chinese (conversion to Unicode only)

ISO2022CN_CNS

CNS 11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)

ISO2022CN_GB

GB 2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)

JIS0201

JIS X 0201, Japanese

JIS0208

JIS X 0208, Japanese

JIS0212

JIS X 0212, Japanese

JISAutoDetect

Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)

Johab

Johab, Korean

MS874

Windows Thai

MacArabic

Macintosh Arabic

MacCentralEurope

Macintosh Latin-2

MacCroatian

Macintosh Croatian

MacCyrillic

Macintosh Cyrillic

MacDingbat

Macintosh Dingbat

MacGreek

Macintosh Greek

MacHebrew

Macintosh Hebrew

MacIceland

Macintosh Iceland

MacRoman

Macintosh Roman

MacRomania

Macintosh Romania

MacSymbol

Macintosh Symbol

MacThai

Macintosh Thai

MacTurkish

Macintosh Turkish

MacUkraine

Macintosh Ukraine

Local encoding schemes cannot represent all Unicode characters. If a character cannot be represented, it is transformed to a ?.

Once you have a character set, you can use it to convert between Unicode strings and encoded byte sequences. Here is how you encode a Unicode string.

String str = . . .;
ByteBuffer buffer = cset.encode(str);
byte[] bytes = buffer.array();

Conversely, to decode a byte sequence, you need a byte buffer. Use the static wrap method of the ByteBuffer array to turn a byte array into a byte buffer. The result of the decode method is a CharBuffer. Call its toString method to get a string.

byte[] bytes = . . .;
ByteBuffer bbuf = ByteBuffer.wrap(bytes, offset, length);
CharBuffer cbuf = cset.decode(bbuf);
String str = cbuf.toString();
Legacy Character Encodings
java.nio.charset.Charset 1.4
  • static SortedMap availableCharsets()

    gets all available character sets for this virtual machine. Returns a map whose keys are character set names and whose values are character sets.

  • static Charset forName(String name)

    gets a character set for the given name.

  • Set aliases()

    returns the set of alias names for this character set.

  • ByteBuffer encode(String str)

    encodes the given string into a sequence of bytes.

  • CharBuffer decode(ByteBuffer buffer)

    decodes the given character sequence. Unrecognized inputs are converted to the Unicode “replacement character” ('uFFFD').

Legacy Character Encodings
java.nio.ByteBuffer 1.4
  • byte[] array()

    returns the array of bytes that this buffer manages.

  • static ByteBuffer wrap(byte[] bytes)

  • static ByteBuffer wrap(byte[] bytes, int offset, int length)

    return a byte buffer that manages the given array of bytes or the given range.

Legacy Character Encodings
java.nio.CharBuffer
  • char[] array()

    returns the array of code units that this buffer manages.

  • char charAt(int index)

    returns the code unit at the given index.

  • String toString()

    returns a string consisting of the code units that this buffer manages

How to Write Text Output

For text output, you want to use a PrintWriter. A print writer can print strings and numbers in text format. Just as a DataOutputStream has useful output methods but no destination, a PrintWriter must be combined with a destination writer.

PrintWriter out = new PrintWriter(new FileWriter("employee.txt"));

You can also combine a print writer with a destination (output) stream.

PrintWriter out = new PrintWriter(new FileOutputStream("employee.txt"));

The PrintWriter(OutputStream) constructor automatically adds an OutputStreamWriter to convert Unicode characters to bytes in the stream.

To write to a print writer, you use the same print and println methods that you used with System.out. You can use these methods to print numbers (int, short, long, float, double), characters, Boolean values, strings, and objects.

Note

Note

Java veterans may wonder whatever happened to the PrintStream class and to System.out. In Java 1.0, the PrintStream class simply truncated all Unicode characters to ASCII characters by dropping the top byte. Conversely, the readLine method of the DataInputStream turned ASCII to Unicode by setting the top byte to 0. Clearly, that was not a clean or portable approach, and it was fixed with the introduction of readers and writers in Java 1.1. For compatibility with existing code, System.in, System.out, and System.err are still streams, not readers and writers. But now the PrintStream class internally converts Unicode characters to the default host encoding in the same way as the PrintWriter does. Objects of type PrintStream act exactly like print writers when you use the print and println methods, but unlike print writers, they allow you to send raw bytes to them with the write(int) and write(byte[]) methods.

For example, consider this code:

String name = "Harry Hacker";
double salary = 75000;
out.print(name);
out.print(' '),
out.println(salary);

This writes the characters

Harry Hacker 75000

to the stream out. The characters are then converted to bytes and end up in the file employee.txt.

The println method automatically adds the correct end-of-line character for the target system (" " on Windows, " " on UNIX, " " on Macs) to the line. This is the string obtained by the call System.getProperty("line.separator").

If the writer is set to autoflush mode, then all characters in the buffer are sent to their destination whenever println is called. (Print writers are always buffered.) By default, autoflushing is not enabled. You can enable or disable autoflushing by using the PrintWriter(Writer, boolean) constructor and passing the appropriate Boolean as the second argument.

PrintWriter out = new PrintWriter(new FileWriter("employee.txt"), true); // autoflush

The print methods don’t throw exceptions. You can call the checkError method to see if something went wrong with the stream.

Note

Note

You cannot write raw bytes to a PrintWriter. Print writers are designed for text output only.

Note
java.io.PrintWriter 1.1
  • PrintWriter(Writer out)

    creates a new PrintWriter, without automatic line flushing.

    Parameters:

    out

    A character-output writer

  • PrintWriter(Writer out, boolean autoFlush)

    creates a new PrintWriter.

    Parameters:

    out

    A character-output writer

     

    autoFlush

    If true, the println methods will flush the output buffer

  • PrintWriter(OutputStream out)

    creates a new PrintWriter, without automatic line flushing, from an existing OutputStream by automatically creating the necessary intermediate OutputStreamWriter.

    Parameters:

    out

    An output stream

  • PrintWriter(OutputStream out, boolean autoFlush)

    creates a new PrintWriter from an existing OutputStream but allows you to determine whether the writer autoflushes or not.

    Parameters:

    out

    An output stream

     

    autoFlush

    If true, the println methods will flush the output buffer

  • void print(Object obj)

    prints an object by printing the string resulting from toString.

    Parameters:

    obj

    The object to be printed

  • void print(String s)

    prints a Unicode string.

  • void println(String s)

    prints a string followed by a line terminator. Flushes the stream if the stream is in autoflush mode.

  • void print(char[] s)

    prints an array of Unicode characters.

  • void print(char c)

    prints a Unicode character.

  • void print(int i)

    prints an integer in text format.

  • void print(long l)

    prints a long integer in text format.

  • void print(float f)

    prints a floating-point number in text format.

  • void print(double d)

    prints a double-precision floating-point number in text format.

  • void print(boolean b)

    prints a Boolean value in text format.

  • boolean checkError()

    returns true if a formatting or output error occurred. Once the stream has encountered an error, it is tainted and all calls to checkError return true.

How to Read Text Input

As you know:

  • To write data in binary format, you use a DataOutputStream.

  • To write in text format, you use a PrintWriter.

Therefore, you might expect that there is an analog to the DataInputStream that lets you read data in text format. The closest analog is the Scanner class that we have used extensively. However, before JDK 5.0, the only game in town for processing text input was the BufferedReader method—it has a method, readLine, that lets you read a line of text. You need to combine a buffered reader with an input source.

BufferedReader in = new BufferedReader(new FileReader("employee.txt"));

The readLine method returns null when no more input is available. A typical input loop, therefore, looks like this:

String line;while ((line = in.readLine()) != null){   do something with line}

The FileReader class already converts bytes to Unicode characters. For other input sources, you need to use the InputStreamReader—unlike the PrintWriter, the InputStreamReader has no automatic convenience method to bridge the gap between bytes and Unicode characters.

BufferedReader in2 = new BufferedReader(new InputStreamReader(System.in));
BufferedReader in3 = new BufferedReader(new InputStreamReader(url.openStream()));

To read numbers from text input, you need to read a string first and then convert it.

String s = in.readLine();
double x = Double.parseDouble(s);

That works if there is a single number on each line. Otherwise, you must work harder and break up the input string, for example, by using the StringTokenizer utility class. We see an example of this later in this chapter.

Tip

Tip

Java has StringReader and StringWriter classes that allow you to treat a string as if it were a data stream. This can be quite convenient if you want to use the same code to parse both strings and data from a stream.

ZIP File Streams

ZIP files are archives that store one or more files in (usually) compressed format. Java 1.1 can handle both GZIP and ZIP format. (See RFC 1950, RFC 1951, and RFC 1952, for example, at http://www.faqs.org/rfcs.) In this section we concentrate on the more familiar (but somewhat more complicated) ZIP format and leave the GZIP classes to you if you need them. (They work in much the same way.)

Note

Note

The classes for handling ZIP files are in java.util.zip and not in java.io, so remember to add the necessary import statement. Although not part of java.io, the GZIP and ZIP classes subclass java.io.FilterInputStream and java.io.FilterOutputStream. The java.util.zip packages also contain classes for computing cyclic redundancy check (CRC) checksums. (CRC is a method to generate a hashlike code that the receiver of a file can use to check the integrity of the data.)

Each ZIP file has a header with information such as the name of the file and the compression method that was used. In Java, you use a ZipInputStream to read a ZIP file by layering the ZipInputStream constructor onto a FileInputStream. You then need to look at the individual entries in the archive. The getNextEntry method returns an object of type ZipEntry that describes the entry. The read method of the ZipInputStream is modified to return –1 at the end of the current entry (instead of just at the end of the ZIP file). You must then call closeEntry to read the next entry. Here is a typical code sequence to read through a ZIP file:

ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname));ZipEntry entry;while ((entry = zin.getNextEntry()) != null){   analyze entry;   read the contents of zin;   zin.closeEntry();}zin.close();

To read the contents of a ZIP entry, you will probably not want to use the raw read method; usually, you will use the methods of a more competent stream filter. For example, to read a text file inside a ZIP file, you can use the following loop:

BufferedReader in = new BufferedReader(new InputStreamReader(zin));String s;while ((s = in.readLine()) != null)   do something with s;

The program in Example 12–1 lets you open a ZIP file. It then displays the files stored in the ZIP archive in the combo box at the bottom of the screen. If you double-click on one of the files, the contents of the file are displayed in the text area, as shown in Figure 12–5.

Example 12–1. ZipTest.java

  1. import java.awt.*;
  2. import java.awt.event.*;
  3. import java.io.*;
  4. import java.util.*;
  5. import java.util.zip.*;
  6. import javax.swing.*;
  7. import javax.swing.filechooser.FileFilter;
  8.
  9. public class ZipTest
 10. {
 11.    public static void main(String[] args)
 12.    {
 13.       ZipTestFrame frame = new ZipTestFrame();
 14.       frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
 15.       frame.setVisible(true);
 16.    }
 17. }
 18.
 19. /**
 20.   A frame with a text area to show the contents of a file inside
 21.   a zip archive, a combo box to select different files in the
 22.   archive, and a menu to load a new archive.
 23. */
 24. class ZipTestFrame extends JFrame
 25. {
 26.    public ZipTestFrame()
 27.    {
 28.       setTitle("ZipTest");
 29.       setSize(DEFAULT_WIDTH, DEFAULT_HEIGHT);
 30.
 31.       // add the menu and the Open and Exit menu items
 32.       JMenuBar menuBar = new JMenuBar();
 33.       JMenu menu = new JMenu("File");
 34.
 35.       JMenuItem openItem = new JMenuItem("Open");
 36.       menu.add(openItem);
 37.       openItem.addActionListener(new OpenAction());
 38.
 39.       JMenuItem exitItem = new JMenuItem("Exit");
 40.       menu.add(exitItem);
 41.       exitItem.addActionListener(new
 42.          ActionListener()
 43.          {
 44.             public void actionPerformed(ActionEvent event)
 45.             {
 46.                System.exit(0);
 47.             }
 48.          });
 49.
 50.       menuBar.add(menu);
 51.       setJMenuBar(menuBar);
 52.
 53.       // add the text area and combo box
 54.       fileText = new JTextArea();
 55.       fileCombo = new JComboBox();
 56.       fileCombo.addActionListener(new
 57.          ActionListener()
 58.          {
 59.             public void actionPerformed(ActionEvent event)
 60.             {
 61.               loadZipFile((String) fileCombo.getSelectedItem());
 62.            }
 63.         });
 64.
 65.      add(fileCombo, BorderLayout.SOUTH);
 66.      add(new JScrollPane(fileText), BorderLayout.CENTER);
 67.   }
 68.
 69.   /**
 70.      This is the listener for the File->Open menu item.
 71.   */
 72.   private class OpenAction implements ActionListener
 73.   {
 74.      public void actionPerformed(ActionEvent event)
 75.      {
 76.         // prompt the user for a zip file
 77.         JFileChooser chooser = new JFileChooser();
 78.         chooser.setCurrentDirectory(new File("."));
 79.         ExtensionFileFilter filter = new ExtensionFileFilter();
 80.         filter.addExtension(".zip");
 81.         filter.addExtension(".jar");
 82.         filter.setDescription("ZIP archives");
 83.         chooser.setFileFilter(filter);
 84.         int r = chooser.showOpenDialog(ZipTestFrame.this);
 85.         if (r == JFileChooser.APPROVE_OPTION)
 86.         {
 87.            zipname = chooser.getSelectedFile().getPath();
 88.            scanZipFile();
 89.         }
 90.      }
 91.   }
 92.
 93.   /**
 94.      Scans the contents of the zip archive and populates
 95.      the combo box.
 96.   */
 97.   public void scanZipFile()
 98.   {
 99.      fileCombo.removeAllItems();
100.      try
101.      {
102.         ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname));
103.         ZipEntry entry;
104.         while ((entry = zin.getNextEntry()) != null)
105.         {
106.            fileCombo.addItem(entry.getName());
107.            zin.closeEntry();
108.         }
109.         zin.close();
110.      }
111.      catch (IOException e)
112.      {
113.         e.printStackTrace();
114.      }
115.   }
116.
117.   /**
118.      Loads a file from the zip archive into the text area
119.      @param name the name of the file in the archive
120.   */
121.   public void loadZipFile(String name)
122.   {
123.      try
124.      {
125.         ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname));
126.         ZipEntry entry;
127.         fileText.setText("");
128.
129.         // find entry with matching name in archive
130.         while ((entry = zin.getNextEntry()) != null)
131.         {
132.            if (entry.getName().equals(name))
133.            {
134.               // read entry into text area
135.               BufferedReader in = new BufferedReader(new InputStreamReader(zin));
136.               String line;
137.               while ((line = in.readLine()) != null)
138.               {
139.                  fileText.append(line);
140.                  fileText.append("
");
141.               }
142.            }
143.            zin.closeEntry();
144.         }
145.         zin.close();
146.      }
147.      catch (IOException e)
148.      {
149.         e.printStackTrace();
150.      }
151.   }
152.
153.   public static final int DEFAULT_WIDTH = 400;
154.   public static final int DEFAULT_HEIGHT = 300;
155.
156.   private JComboBox fileCombo;
157.   private JTextArea fileText;
158.   private String zipname;
159. }
160.
161. /**
162.   This file filter matches all files with a given set of
163.   extensions. From FileChooserTest in chapter 9
164. */
165. class ExtensionFileFilter extends FileFilter
166. {
167.   /**
168.      Adds an extension that this file filter recognizes.
169.      @param extension a file extension (such as ".txt" or "txt")
170.   */
171.   public void addExtension(String extension)
172.   {
173.      if (!extension.startsWith("."))
174.         extension = "." + extension;
175.      extensions.add(extension.toLowerCase());
176.   }
177.
178.   /**
179.      Sets a description for the file set that this file filter
180.      recognizes.
181.      @param aDescription a description for the file set
182.   */
183.   public void setDescription(String aDescription)
184.   {
185.      description = aDescription;
186.   }
187.
188.   /**
189.      Returns a description for the file set that this file
190.      filter recognizes.
191.      @return a description for the file set
192.   */
193.   public String getDescription()
194.   {
195.      return description;
196.   }
197.
198.   public boolean accept(File f)
199.   {
200.      if (f.isDirectory()) return true;
201.      String name = f.getName().toLowerCase();
202.
203.      // check if the file name ends with any of the extensions
204.      for (String e : extensions)
205.         if (name.endsWith(e))
206.            return true;
207.      return false;
208.   }
209.
210.   private String description = "";
211.   private ArrayList<String> extensions = new ArrayList<String>();
212. }
The ZipTest program

Figure 12–5. The ZipTest program

Note

Note

The ZIP input stream throws a ZipException when there is an error in reading a ZIP file. Normally this error occurs when the ZIP file has been corrupted.

To write a ZIP file, you open a ZipOutputStream by layering it onto a FileOutputStream. For each entry that you want to place into the ZIP file, you create a ZipEntry object. You pass the file name to the ZipEntry constructor; it sets the other parameters such as file date and decompression method automatically. You can override these settings if you like. Then, you call the putNextEntry method of the ZipOutputStream to begin writing a new file. Send the file data to the ZIP stream. When you are done, call closeEntry. Repeat for all the files you want to store. Here is a code skeleton:

FileOutputStream fout = new FileOutputStream("test.zip");ZipOutputStream zout = new ZipOutputStream(fout);for all files{   ZipEntry ze = new ZipEntry(filename);   zout.putNextEntry(ze);   send data to zout;   zout.closeEntry();}zout.close();

Note

Note

JAR files (which were discussed in Chapter 10) are simply ZIP files with another entry, the so-called manifest. You use the JarInputStream and JarOutputStream classes to read and write the manifest entry.

ZIP streams are a good example of the power of the stream abstraction. Both the source and the destination of the ZIP data are completely flexible. You layer the most convenient reader stream onto the ZIP file stream to read the data that are stored in compressed form, and that reader doesn’t even realize that the data are being decompressed as they are being requested. And the source of the bytes in ZIP formats need not be a file—the ZIP data can come from a network connection. In fact, the JAR files that we discussed in Chapter 10 are ZIP-formatted files. Whenever the class loader of an applet reads a JAR file, it reads and decompresses data from the network.

Note

Note

The article at http://www.javaworld.com/javaworld/jw-10-2000/jw-1027-toolbox.html shows you how to modify a ZIP archive.

Note
java.util.zip.ZipInputStream 1.1
  • ZipInputStream(InputStream in)

    This constructor creates a ZipInputStream that allows you to inflate data from the given InputStream.

    Parameters:

    In

    The underlying input stream

  • ZipEntry getNextEntry()

    returns a ZipEntry object for the next entry, or null if there are no more entries.

  • void closeEntry()

    closes the current open entry in the ZIP file. You can then read the next entry by using getNextEntry().

Note
java.util.zip.ZipOutputStream 1.1
  • ZipOutputStream(OutputStream out)

    this constructor creates a ZipOutputStream that you use to write compressed data to the specified OutputStream.

    Parameters:

    Out

    The underlying output stream

  • void putNextEntry(ZipEntry ze)

    writes the information in the given ZipEntry to the stream and positions the stream for the data. The data can then be written to the stream by write().

    Parameters:

    ze

    The new entry

  • void closeEntry()

    closes the currently open entry in the ZIP file. Use the putNextEntry method to start the next entry.

  • void setLevel(int level)

    sets the default compression level of subsequent DEFLATED entries. The default value is Deflater.DEFAULT_COMPRESSION. Throws an IllegalArgumentException if the level is not valid.

    Parameters:

    level

    A compression level, from 0 (NO_COMPRESSION) to 9 (BEST_COMPRESSION)

  • void setMethod(int method)

    sets the default compression method for this ZipOutputStream for any entries that do not specify a method.

    Parameters:

    method

    The compression method, either DEFLATED or STORED

Note
java.util.zip.ZipEntry 1.1
  • ZipEntry(String name)

    Parameters:

    name

    The name of the entry

  • long getCrc()

    returns the CRC32 checksum value for this ZipEntry.

  • String getName()

    returns the name of this entry.

  • long getSize()

    returns the uncompressed size of this entry, or –1 if the uncompressed size is not known.

  • boolean isDirectory()

    returns a Boolean that indicates whether this entry is a directory.

  • void setMethod(int method)

    Parameters:

    method

    The compression method for the entry; must be either DEFLATED or STORED

  • void setSize(long size)

    sets the size of this entry. Only required if the compression method is STORED.

    Parameters:

    size

    The uncompressed size of this entry

  • void setCrc(long crc)

    sets the CRC32 checksum of this entry. Use the CRC32 class to compute this checksum. Only required if the compression method is STORED.

    Parameters:

    crc

    The checksum of this entry

Note
java.util.zip.ZipFile 1.1
  • ZipFile(String name)

    this constructor creates a ZipFile for reading from the given string.

    Parameters:

    name

    A string that contains the path name of the file

  • ZipFile(File file)

    this constructor creates a ZipFile for reading from the given File object.

    Parameters:

    file

    The file to read; the File class is described at the end of this chapter

  • Enumeration entries()

    returns an Enumeration object that enumerates the ZipEntry objects that describe the entries of the ZipFile.

  • ZipEntry getEntry(String name)

    returns the entry corresponding to the given name, or null if there is no such entry.

    Parameters:

    name

    The entry name

  • InputStream getInputStream(ZipEntry ze)

    returns an InputStream for the given entry.

    Parameters:

    ze

    A ZipEntry in the ZIP file

  • String getName()

    returns the path of this ZIP file.

Use of Streams

In the next four sections, we show you how to put some of the creatures in the stream zoo to good use. For these examples, we assume you are working with the Employee class and some of its subclasses, such as Manager. (See Chapters 4 and 5 for more on these example classes.) We consider four separate scenarios for saving an array of employee records to a file and then reading them back into memory:

  1. Saving data of the same type (Employee) in text format

  2. Saving data of the same type in binary format

  3. Saving and restoring polymorphic data (a mixture of Employee and Manager objects)

  4. Saving and restoring data containing embedded references (managers with pointers to other employees)

Writing Delimited Output

In this section, you learn how to store an array of Employee records in the time-honored delimited format. This means that each record is stored in a separate line. Instance fields are separated from each other by delimiters. We use a vertical bar (|) as our delimiter. (A colon (:) is another popular choice. Part of the fun is that everyone uses a different delimiter.) Naturally, we punt on the issue of what might happen if a | actually occurred in one of the strings we save.

Note

Note

Especially on UNIX systems, an amazing number of files are stored in exactly this format. We have seen entire employee databases with thousands of records in this format, queried with nothing more than the UNIX awk, sort, and join utilities. (In the PC world, where desktop database programs are available at low cost, this kind of ad hoc storage is much less common.)

Here is a sample set of records:

Harry Hacker|35500|1989|10|1
Carl Cracker|75000|1987|12|15
Tony Tester|38000|1990|3|15

Writing records is simple. Because we write to a text file, we use the PrintWriter class. We simply write all fields, followed by either a | or, for the last field, a . Finally, in keeping with the idea that we want the class to be responsible for responding to messages, we add a method, writeData, to our Employee class.

public void writeData(PrintWriter out) throws IOException
{
   GregorianCalendar calendar = new GregorianCalendar();
   calendar.setTime(hireDay);
   out.println(name + "|"
      + salary + "|"
      + calendar.get(Calendar.YEAR) + "|"
      + (calendar.get(Calendar.MONTH) + 1) + "|"
      + calendar.get(Calendar.DAY_OF_MONTH));
}

To read records, we read in a line at a time and separate the fields. This is the topic of the next section, in which we use a utility class supplied with Java to make our job easier.

String Tokenizers and Delimited Text

When reading a line of input, we get a single long string. We want to split it into individual strings. This means finding the | delimiters and then separating out the individual pieces, that is, the sequence of characters up to the next delimiter. (These are usually called tokens.) The StringTokenizer class in java.util is designed for exactly this purpose. It gives you an easy way to break up a large string that contains delimited text. The idea is that a string tokenizer object attaches to a string. When you construct the tokenizer object, you specify which characters are the delimiters. For example, we need to use

StringTokenizer tokenizer = new StringTokenizer(line, "|");

You can specify multiple delimiters in the string, for example:

StringTokenizer tokenizer = new StringTokenizer(line, "|,;");

This means that any of the characters in the string can serve as delimiters.

If you don’t specify a delimiter set, the default is " ", that is, all whitespace characters (space, tab, newline, and carriage return)

Once you have constructed a string tokenizer, you can use its methods to quickly extract the tokens from the string. The nextToken method returns the next unread token. The hasMoreTokens method returns true if more tokens are available. The following loop processes all tokens:

while (tokenizer.hasMoreTokens()){   String token = tokenizer.nextToken();   process token}

Note

Note

An alternative to the StringTokenizer is the split method of the String class. The call line.split("[|,;]") returns a String[] array consisting of all tokens, using the delimiters inside the brackets. You can use any regular expression to describe delimiters—we will discuss regular expression on page 698.

Note
java.util.StringTokenizer 1.0
  • StringTokenizer(String str, String delim)

    constructs a string tokenizer with the given delimiter set.

    Parameters:

    str

    The input string from which tokens are read

     

    delim

    A string containing delimiter characters (every character in this string is a delimiter)

  • StringTokenizer(String str)

    constructs a string tokenizer with the default delimiter set " ".

  • boolean hasMoreTokens()

    returns true if more tokens exist.

  • String nextToken()

    returns the next token; throws a NoSuchElementException if there are no more tokens.

  • String nextToken(String delim)

    returns the next token after switching to the new delimiter set. The new delimiter set is subsequently used.

  • int countTokens()

    returns the number of tokens still in the string.

Reading Delimited Input

Reading in an Employee record is simple. We simply read in a line of input with the readLine method of the BufferedReader class. Here is the code needed to read one record into a string.

BufferedReader in = new BufferedReader(new FileReader("employee.dat"));
. . .
String line = in.readLine();

Next, we need to extract the individual tokens. When we do this, we end up with strings, so we need to convert them to numbers.

Just as with the writeData method, we add a readData method of the Employee class. When you call

e.readData(in);

this method overwrites the previous contents of e. Note that the method may throw an IOException if the readLine method throws that exception. This method can do nothing if an IOException occurs, so we just let it propagate up the call chain.

Here is the code for this method:

public void readData(BufferedReader in) throws IOException
{
   String s = in.readLine();
   StringTokenizer t = new StringTokenizer(s, "|");
   name = t.nextToken();
   salary = Double.parseDouble(t.nextToken());
   int y = Integer.parseInt(t.nextToken());
   int m = Integer.parseInt(t.nextToken());
   int d = Integer.parseInt(t.nextToken());
   GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d);
      // GregorianCalendar uses 0 = January
   hireDay = calendar.getTime();
}

Finally, in the code for a program that tests these methods, the static method

void writeData(Employee[] e, PrintWriter out)

first writes the length of the array, then writes each record. The static method

Employee[] readData(BufferedReader in)

first reads in the length of the array, then reads in each record, as illustrated in Example 12–2.

Example 12–2. DataFileTest.java

  1. import java.io.*;
  2. import java.util.*;
  3.
  4. public class DataFileTest
  5. {
  6.   public static void main(String[] args)
  7.   {
  8.      Employee[] staff = new Employee[3];
  9.
 10.      staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15);
 11.      staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1);
 12.      staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15);
 13.
 14.      try
 15.      {
 16.         // save all employee records to the file employee.dat
 17.         PrintWriter out = new PrintWriter(new FileWriter("employee.dat"));
 18.         writeData(staff, out);
 19.         out.close();
 20.
 21.         // retrieve all records into a new array
 22.         BufferedReader in = new BufferedReader(new  FileReader("employee.dat"));
 23.         Employee[] newStaff = readData(in);
 24.         in.close();
 25.
 26.         // print the newly read employee records
 27.         for (Employee e : newStaff)
 28.            System.out.println(e);
 29.      }
 30.      catch(IOException exception)
 31.      {
 32.         exception.printStackTrace();
 33.      }
 34.   }
 35.
 36.   /**
 37.      Writes all employees in an array to a print writer
 38.      @param employees an array of employees
 39.      @param out a print writer
 40.   */
 41.   static void writeData(Employee[] employees, PrintWriter out)
 42.      throws IOException
 43.   {
 44.      // write number of employees
 45.      out.println(employees.length);
 46.
 47.      for (Employee e : employees)
 48.         e.writeData(out);
 49.   }
 50.
 51.   /**
 52.      Reads an array of employees from a buffered reader
 53.      @param in the buffered reader
 54.      @return the array of employees
 55.   */
 56.   static Employee[] readData(BufferedReader in)
 57.      throws IOException
 58.   {
 59.      // retrieve the array size
 60.      int n = Integer.parseInt(in.readLine());
 61.
 62.      Employee[] employees = new Employee[n];
 63.      for (int i = 0; i < n; i++)
 64.      {
 65.         employees[i] = new Employee();
 66.         employees[i].readData(in);
 67.      }
 68.      return employees;
 69.   }
 70. }
 71.
 72. class Employee
 73. {
 74.    public Employee() {}
 75.
 76.    public Employee(String n, double s, int year, int month, int day)
 77.    {
 78.       name = n;
 79.       salary = s;
 80.       GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day);
 81.       hireDay = calendar.getTime();
 82.    }
 83.
 84.    public String getName()
 85.    {
 86.       return name;
 87.    }
 88.
 89.    public double getSalary()
 90.    {
 91.       return salary;
 92.    }
 93.
 94.    public Date getHireDay()
 95.    {
 96.       return hireDay;
 97.    }
 98.
 99.    public void raiseSalary(double byPercent)
100.    {
101.       double raise = salary * byPercent / 100;
102.       salary += raise;
103.    }
104.
105.    public String toString()
106.    {
107.       return getClass().getName()
108.          + "[name=" + name
109.          + ",salary=" + salary
110.          + ",hireDay=" + hireDay
111.          + "]";
112.    }
113.
114.    /**
115.       Writes employee data to a print writer
116.       @param out the print writer
117.   */
118.   public void writeData(PrintWriter out) throws IOException
119.   {
120.      GregorianCalendar calendar = new GregorianCalendar();
121.      calendar.setTime(hireDay);
122.      out.println(name + "|"
123.         + salary + "|"
124.         + calendar.get(Calendar.YEAR) + "|"
125.         + (calendar.get(Calendar.MONTH) + 1) + "|"
126.         + calendar.get(Calendar.DAY_OF_MONTH));
127.   }
128.
129.   /**
130.      Reads employee data from a buffered reader
131.      @param in the buffered reader
132.   */
133.   public void readData(BufferedReader in) throws IOException
134.   {
135.      String s = in.readLine();
136.      StringTokenizer t = new StringTokenizer(s, "|");
137.      name = t.nextToken();
138.      salary = Double.parseDouble(t.nextToken());
139.      int y = Integer.parseInt(t.nextToken());
140.      int m = Integer.parseInt(t.nextToken());
141.      int d = Integer.parseInt(t.nextToken());
142.      GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d);
143.      hireDay = calendar.getTime();
144.   }
145.
146.   private String name;
147.   private double salary;
148.   private Date hireDay;
149. }

The StringBuilder Class

When you process input, you often need to construct strings from individual characters or Unicode code units. It would be inefficient to use string concatenation for this purpose. Every time you append characters to a string, the string object needs to find new memory to hold the larger string: this is time consuming. Appending even more characters means the string needs to be relocated again and again. Using the StringBuilder class avoids this problem.

In contrast, a StringBuilder works much like an ArrayList. It manages a char[] array that can grow and shrink on demand. You can append, insert, or remove code units until the string builder holds the desired string. Then you use the toString method to convert the contents to an actual String object.

Note

Note

The StringBuilder class was introduced in JDK 5.0. Its predecessor, StringBuffer, is slightly less efficient, but it allows multiple threads to add or remove characters. If all string editing happens in a single thread, you should use StringBuilder instead. The APIs of both classes are identical.

The following API notes contain the most important methods for the StringBuilder and StringBuffer classes.

Note
java.lang.StringBuilder 5.0
Note
java.lang.StringBuffer 1.0
  • StringBuilder/StringBuffer()

    constructs an empty string builder or string buffer.

  • StringBuilder/StringBuffer(int length)

    constructs an empty string builder or string buffer with the initial capacity length.

  • StringBuilder/StringBuffer(String str)

    constructs a string builder or string buffer with the initial contents str.

  • int length()

    returns the number of code units of the builder or buffer.

  • StringBuilder/StringBuffer append(String str)

    appends a string and returns this.

  • StringBuilder/StringBuffer append(char c)

    appends a code unit and returns this.

  • StringBuilder/StringBuffer appendCodePoint(int cp) 5.0

    appends a code point, converting it into one or two code units, and returns this.

  • void setCharAt(int i, char c)

    sets the ith code unit to c.

  • StringBuilder/StringBuffer insert(int offset, String str)

    inserts a string at position offset and returns this.

  • StringBuilder/StringBuffer insert(int offset, char c)

    inserts a code unit at position offset and returns this.

  • StringBuilder/StringBuffer delete(int startIndex, int endIndex)

    deletes the code units with offsets startIndex to endIndex - 1 and returns this.

  • String toString()

    returns a string with the same data as the builder or buffer contents.

Working with Random-Access Streams

If you have a large number of employee records of variable length, the storage technique used in the preceding section suffers from one limitation: it is not possible to read a record in the middle of the file without first reading all records that come before it. In this section, we make all records the same length. This lets us implement a random-access method for reading back the information by using the RandomAccessFile class that you saw earlier—we can use this to get at any record in the same amount of time.

We will store the numbers in the instance fields in our classes in a binary format. We do that with the writeInt and writeDouble methods of the DataOutput interface. (As we mentioned earlier, this is the common interface of the DataOutputStream and the RandomAccessFile classes.)

However, because the size of each record must remain constant, we need to make all the strings the same size when we save them. The variable-size UTF format does not do this, and the rest of the Java library provides no convenient means of accomplishing this. We need to write a bit of code to implement two helper methods to make the strings the same size. We will call the methods writeFixedString and readFixedString. These methods read and write Unicode strings that always have the same length.

The writeFixedString method takes the parameter size. Then, it writes the specified number of code units, starting at the beginning of the string. (If there are too few code units, the method pads the string, using zero values.) Here is the code for the writeFixedString method:

static void writeFixedString(String s, int size, DataOutput out)
   throws IOException
{
   int i;
   for (i = 0; i < size; i++)
   {
      char ch = 0;
      if (i < s.length()) ch = s.charAt(i);
      out.writeChar(ch);
   }
}

The readFixedString method reads characters from the input stream until it has consumed size code units or until it encounters a character with a zero value. Then, it should skip past the remaining zero values in the input field. For added efficiency, this method uses the StringBuilder class to read in a string.

static String readFixedString(int size, DataInput in)
   throws IOException
{
   StringBuilder b = new StringBuilder(size);
   int i = 0;
   boolean more = true;
   while (more && i < size)
   {
      char ch = in.readChar();
      i++;
      if (ch == 0) more = false;
      else b.append(ch);
   }
   in.skipBytes(2 * (size - i));
   return b.toString();
}

Note

Note

We placed the writeFixedString and readFixedString methods inside the DataIO helper class.

To write a fixed-size record, we simply write all fields in binary.

public void writeData(DataOutput out) throws IOException
{
   DataIO.writeFixedString(name, NAME_SIZE, out);
   out.writeDouble(salary);

   GregorianCalendar calendar = new GregorianCalendar();
   calendar.setTime(hireDay);
   out.writeInt(calendar.get(Calendar.YEAR));
   out.writeInt(calendar.get(Calendar.MONTH) + 1);
   out.writeInt(calendar.get(Calendar.DAY_OF_MONTH));
}

Reading the data back is just as simple.

public void readData(DataInput in) throws IOException
{
   name = DataIO.readFixedString(NAME_SIZE, in);
   salary = in.readDouble();
   int y = in.readInt();
   int m = in.readInt();
   int d = in.readInt();
   GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d);
   hireDay = calendar.getTime();
}

In our example, each employee record is 100 bytes long because we specified that the name field would always be written using 40 characters. This gives us a breakdown as indicated in the following:

  • 40 characters = 80 bytes for the name

  • 1 double = 8 bytes

  • 3 int = 12 bytes

As an example, suppose you want to position the file pointer to the third record. You can use the following version of the seek method:

long n = 3;
int RECORD_SIZE = 100;
in.seek((n - 1) * RECORD_SIZE);

Then you can read a record:

Employee e = new Employee();
e.readData(in);

If you want to modify the record and then save it back into the same location, remember to set the file pointer back to the beginning of the record:

in.seek((n - 1) * RECORD_SIZE);
e.writeData(out);

To determine the total number of bytes in a file, use the length method. The total number of records is the length divided by the size of each record.

long int nbytes = in.length(); // length in bytes
int nrecords = (int) (nbytes / RECORD_SIZE);

The test program shown in Example 12–3 writes three records into a data file and then reads them from the file in reverse order. To do this efficiently requires random access—we need to get at the third record first.

Example 12–3. RandomFileTest.java

  1. import java.io.*;
  2. import java.util.*;
  3. 
  4. public class RandomFileTest
  5. {
  6.    public static void main(String[] args)
  7.    {
  8.       Employee[] staff = new Employee[3];
  9.
 10.       staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15);
 11.       staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1);
 12.       staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15);
 13.
 14.       try
 15.      {
 16.         // save all employee records to the file employee.dat
 17.         DataOutputStream out = new DataOutputStream(new FileOutputStream("employee
RandomFileTest.java.dat"));
 18.         for (Employee e : staff)
 19.            e.writeData(out);
 20.         out.close();
 21.
 22.         // retrieve all records into a new array
 23.         RandomAccessFile in = new RandomAccessFile("employee.dat", "r");
 24.         // compute the array size
 25.         int n = (int)(in.length() / Employee.RECORD_SIZE);
 26.         Employee[] newStaff = new Employee[n];
 27.
 28.         // read employees in reverse order
 29.         for (int i = n - 1; i >= 0; i--)
 30.         {
 31.            newStaff[i] = new Employee();
 32.            in.seek(i * Employee.RECORD_SIZE);
 33.            newStaff[i].readData(in);
 34.         }
 35.         in.close();
 36.
 37.         // print the newly read employee records
 38.         for (Employee e : newStaff)
 39.            System.out.println(e);
 40.      }
 41.      catch(IOException e)
 42.      {
 43.         e.printStackTrace();
 44.      }
 45.   }
 46. }
 47.
 48. class Employee
 49. {
 50.   public Employee() {}
 51.
 52.   public Employee(String n, double s, int year, int month, int day)
 53.   {
 54.      name = n;
 55.      salary = s;
 56.      GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day);
 57.      hireDay = calendar.getTime();
 58.   }
 59.
 60.   public String getName()
 61.   {
 62.      return name;
 63.   }
 64.
 65.   public double getSalary()
 66.   {
 67.      return salary;
 68.   }
 69.
 70.   public Date getHireDay()
 71.   {
 72.      return hireDay;
 73.   }
 74.
 75.   /**
 76.      Writes employee data to a data output
 77.      @param out the data output
 78.   */
 79.   public void raiseSalary(double byPercent)
 80.   {
 81.      double raise = salary * byPercent / 100;
 82.      salary += raise;
 83.   }
 84.
 85.   public String toString()
 86.   {
 87.      return getClass().getName()
 88.         + "[name=" + name
 89.         + ",salary=" + salary
 90.         + ",hireDay=" + hireDay
 91.         + "]";
 92.   }
 93.
 94.   /**
 95.      Writes employee data to a data output
 96.      @param out the data output
 97.   */
 98.   public void writeData(DataOutput out) throws IOException
 99.   {
100.      DataIO.writeFixedString(name, NAME_SIZE, out);
101.      out.writeDouble(salary);
102.
103.      GregorianCalendar calendar = new GregorianCalendar();
104.      calendar.setTime(hireDay);
105.      out.writeInt(calendar.get(Calendar.YEAR));
106.      out.writeInt(calendar.get(Calendar.MONTH) + 1);
107.      out.writeInt(calendar.get(Calendar.DAY_OF_MONTH));
108.   }
109.
110.   /**
111.      Reads employee data from a data input
112.      @param in the data input
113.   */
114.   public void readData(DataInput in) throws IOException
115.   {
116.      name = DataIO.readFixedString(NAME_SIZE, in);
117.      salary = in.readDouble();
118.      int y = in.readInt();
119.      int m = in.readInt();
120.      int d = in.readInt();
121.      GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d);
122.      hireDay = calendar.getTime();
123.   }
124.
125.   public static final int NAME_SIZE = 40;
126.   public static final int RECORD_SIZE = 2 * NAME_SIZE + 8 + 4 + 4 + 4;
127.
128.   private String name;
129.   private double salary;
130.   private Date hireDay;
131. }
132.
133. class DataIO
134. {
135.   public static String readFixedString(int size, DataInput in)
136.      throws IOException
137.   {
138.      StringBuilder b = new StringBuilder(size);
139.      int i = 0;
140.      boolean more = true;
141.      while (more && i < size)
142.      {
143.         char ch = in.readChar();
144.         i++;
145.         if (ch == 0) more = false;
146.         else b.append(ch);
147.      }
148.      in.skipBytes(2 * (size - i));
149.      return b.toString();
150.   }
151.
152.   public static void writeFixedString(String s, int size, DataOutput out)
153.      throws IOException
154.   {
155.      int i;
156.      for (i = 0; i < size; i++)
157.      {
158.         char ch = 0;
159.         if (i < s.length()) ch = s.charAt(i);
160.         out.writeChar(ch);
161.      }
162.   }
163. }

Object Streams

Using a fixed-length record format is a good choice if you need to store data of the same type. However, objects that you create in an object-oriented program are rarely all of the same type. For example, you may have an array called staff that is nominally an array of Employee records but contains objects that are actually instances of a subclass such as Manager.

If we want to save files that contain this kind of information, we must first save the type of each object and then the data that define the current state of the object. When we read this information back from a file, we must

  • Read the object type;

  • Create a blank object of that type;

  • Fill it with the data that we stored in the file.

It is entirely possible (if very tedious) to do this by hand, and in the first edition of this book we did exactly this. However, Sun Microsystems developed a powerful mechanism that allows this to be done with much less effort. As you will soon see, this mechanism, called object serialization, almost completely automates what was previously a very tedious process. (You see later in this chapter where the term “serialization” comes from.)

Storing Objects of Variable Type

To save object data, you first need to open an ObjectOutputStream object:

ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"));

Now, to save an object, you simply use the writeObject method of the ObjectOutputStream class as in the following fragment:

Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1);
Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15);
out.writeObject(harry);
out.writeObject(boss);

To read the objects back in, first get an ObjectInputStream object:

ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"));

Then, retrieve the objects in the same order in which they were written, using the readObject method.

Employee e1 = (Employee) in.readObject();
Employee e2 = (Employee) in.readObject();

When reading back objects, you must carefully keep track of the number of objects that were saved, their order, and their types. Each call to readObject reads in another object of the type Object. You therefore will need to cast it to its correct type.

If you don’t need the exact type or you don’t remember it, then you can cast it to any superclass or even leave it as type Object. For example, e2 is an Employee object variable even though it actually refers to a Manager object. If you need to dynamically query the type of the object, you can use the getClass method that we described in Chapter 5.

You can write and read only objects with the writeObject/readObject methods. For primitive type values, you use methods such as writeInt/readInt or writeDouble/readDouble. (The object stream classes implement the DataInput/DataOutput interfaces.) Of course, numbers inside objects (such as the salary field of an Employee object) are saved and restored automatically. Recall that, in Java, strings and arrays are objects and can, therefore, be processed with the writeObject/readObject methods.

There is, however, one change you need to make to any class that you want to save and restore in an object stream. The class must implement the Serializable interface:

class Employee implements Serializable { . . . }

The Serializable interface has no methods, so you don’t need to change your classes in any way. In this regard, it is similar to the Cloneable interface that we also discussed in Chapter 6. However, to make a class cloneable, you still had to override the clone method of the Object class. To make a class serializable, you do not need to do anything else.

Example 12–4 is a test program that writes an array containing two employees and one manager to disk and then restores it. Writing an array is done with a single operation:

Employee[] staff = new Employee[3];
. . .
out.writeObject(staff);

Similarly, reading in the result is done with a single operation. However, we must apply a cast to the return value of the readObject method:

Employee[] newStaff = (Employee[]) in.readObject();

Once the information is restored, we print each employee because you can easily distinguish employee and manager objects by their different toString results. This should convince you that we did restore the correct types.

Example 12–4. ObjectFileTest.java

  1. import java.io.*;
  2. import java.util.*;
  3.
  4. class ObjectFileTest
  5. {
  6.    public static void main(String[] args)
  7.    {
  8.       Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15);
  9.       boss.setBonus(5000);
 10.
 11.       Employee[] staff = new Employee[3];
 12.
 13.       staff[0] = boss;
 14.       staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1);
 15.       staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15);
 16.
 17.       try
 18.       {
 19.          // save all employee records to the file employee.dat
 20.          ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream
ObjectFileTest.java("employee.dat"));
 21.          out.writeObject(staff);
 22.          out.close();
 23.
 24.          // retrieve all records into a new array
 25.          ObjectInputStream in =  new ObjectInputStream(new FileInputStream("employee
ObjectFileTest.java.dat"));
 26.          Employee[] newStaff = (Employee[]) in.readObject();
 27.          in.close();
 28.
 29.          // print the newly read employee records
 30.          for (Employee e : newStaff)
 31.             System.out.println(e);
 32.       }
 33.       catch (Exception e)
 34.       {
 35.          e.printStackTrace();
 36.       }
 37.    }
 38. }
 39.
 40. class Employee implements Serializable
 41. {
 42.    public Employee() {}
 43.
 44.    public Employee(String n, double s, int year, int month, int day)
 45.    {
 46.       name = n;
 47.       salary = s;
 48.       GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day);
 49.       hireDay = calendar.getTime();
 50.    }
 51.
 52.    public String getName()
 53.    {
 54.       return name;
 55.    }
 56.
 57.    public double getSalary()
 58.    {
 59.       return salary;
 60.    }
 61.
 62.    public Date getHireDay()
 63.    {
 64.       return hireDay;
 65.    }
 66.
 67.    public void raiseSalary(double byPercent)
 68.    {
 69.       double raise = salary * byPercent / 100;
 70.       salary += raise;
 71.    }
 72.
 73.    public String toString()
 74.    {
 75.       return getClass().getName()
 76.          + "[name=" + name
 77.          + ",salary=" + salary
 78.          + ",hireDay=" + hireDay
 79.          + "]";
 80.    }
 81.
 82.    private String name;
 83.    private double salary;
 84.    private Date hireDay;
 85. }
 86.
 87. class Manager extends Employee
 88. {
 89.    /**
 90.       @param n the employee's name
 91.       @param s the salary
 92.       @param year the hire year
 93.       @param month the hire month
 94.       @param day the hire day
 95.    */
 96.    public Manager(String n, double s, int year, int month, int day)
 97.    {
 98.       super(n, s, year, month, day);
 99.       bonus = 0;
100.    }
101.
102.    public double getSalary()
103.    {
104.       double baseSalary = super.getSalary();
105.       return baseSalary + bonus;
106.    }
107.
108.    public void setBonus(double b)
109.    {
110.       bonus = b;
111.    }
112.
113.    public String toString()
114.    {
115.       return super.toString()
116.         + "[bonus=" + bonus
117.         + "]";
118.    }
119.
120.    private double bonus;
121. }
ObjectFileTest.java
java.io.ObjectOutputStream 1.1
  • ObjectOutputStream(OutputStream out)

    creates an ObjectOutputStream so that you can write objects to the specified OutputStream.

  • void writeObject(Object obj)

    writes the specified object to the ObjectOutputStream. This method saves the class of the object, the signature of the class, and the values of any nonstatic, nontransient field of the class and its superclasses.

ObjectFileTest.java
java.io.ObjectInputStream 1.1
  • ObjectInputStream(InputStream is)

    creates an ObjectInputStream to read back object information from the specified InputStream.

  • Object readObject()

    reads an object from the ObjectInputStream. In particular, this method reads back the class of the object, the signature of the class, and the values of the nontransient and nonstatic fields of the class and all its superclasses. It does deserializing to allow multiple object references to be recovered.

Understanding the Object Serialization File Format

Object serialization saves object data in a particular file format. Of course, you can use the writeObject/readObject methods without having to know the exact sequence of bytes that represents objects in a file. Nonetheless, we found studying the data format to be extremely helpful for gaining insight into the object streaming process. We did this by looking at hex dumps of various saved object files. However, the details are somewhat technical, so feel free to skip this section if you are not interested in the implementation.

Every file begins with the two-byte “magic number”

AC ED

followed by the version number of the object serialization format, which is currently

00 05

(We use hexadecimal numbers throughout this section to denote bytes.) Then, it contains a sequence of objects, in the order that they were saved.

String objects are saved as

74

two-byte length

characters

For example, the string “Harry” is saved as

74 00 05 Harry

The Unicode characters of the string are saved in “modified UTF-8” format.

When an object is saved, the class of that object must be saved as well. The class description contains

  1. The name of the class;

  2. The serial version unique ID, which is a fingerprint of the data field types and method signatures;

  3. A set of flags describing the serialization method; and

  4. A description of the data fields.

Java gets the fingerprint by

  1. Ordering descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way;

  2. Then applying the so-called Secure Hash Algorithm (SHA) to that data.

SHA is a fast algorithm that gives a “fingerprint” to a larger block of information. This fingerprint is always a 20-byte data packet, regardless of the size of the original data. It is created by a clever sequence of bit operations on the data that makes it essentially 100 percent certain that the fingerprint will change if the information is altered in any way. SHA is a U.S. standard, recommended by the National Institute for Science and Technology (NIST). (For more details on SHA, see, for example, Cryptography and Network Security: Principles and Practice, by William Stallings [Prentice Hall, 2002].) However, Java uses only the first 8 bytes of the SHA code as a class fingerprint. It is still very likely that the class fingerprint will change if the data fields or methods change in any way.

Java can then check the class fingerprint to protect us from the following scenario: An object is saved to a disk file. Later, the designer of the class makes a change, for example, by removing a data field. Then, the old disk file is read in again. Now the data layout on the disk no longer matches the data layout in memory. If the data were read back in its old form, it could corrupt memory. Java takes great care to make such memory corruption close to impossible. Hence, it checks, using the fingerprint, that the class definition has not changed when it restores an object. It does this by comparing the fingerprint on disk with the fingerprint of the current class.

Note

Note

Technically, as long as the data layout of a class has not changed, it ought to be safe to read objects back in. But Java is conservative and checks that the methods have not changed either. (After all, the methods describe the meaning of the stored data.) Of course, in practice, classes do evolve, and it may be necessary for a program to read in older versions of objects. We discuss this later in the section entitled “Versioning” on page 679.

Here is how a class identifier is stored:

  • 72

  • 2-byte length of class name

  • class name

  • 8-byte fingerprint

  • 1-byte flag

  • 2-byte count of data field descriptors

  • data field descriptors

  • 78 (end marker)

  • superclass type (70 if none)

The flag byte is composed of three bit masks, defined in java.io.ObjectStreamConstants:

static final byte SC_WRITE_METHOD = 1;
   // class has writeObject method that writes additional data
static final byte SC_SERIALIZABLE = 2;
   // class implements Serializable interface
static final byte SC_EXTERNALIZABLE = 4;
   // class implements Externalizable interface

We discuss the Externalizable interface later in this chapter. Externalizable classes supply custom read and write methods that take over the output of their instance fields. The classes that we write implement the Serializable interface and will have a flag value of 02. The java.util.Date class defines its own readObject/writeObject methods and has a flag of 03.

Each data field descriptor has the format:

  • 1-byte type code

  • 2-byte length of field name

  • field name

  • class name (if field is an object)

The type code is one of the following:

B

byte

C

char

D

double

F

float

I

int

J

long

L

object

S

short

Z

boolean

[

array

When the type code is L, the field name is followed by the field type. Class and field name strings do not start with the string code 74, but field types do. Field types use a slightly different encoding of their names, namely, the format used by native methods. (See Volume 2 for native methods.)

For example, the salary field of the Employee class is encoded as:

D 00 06 salary

Here is the complete class descriptor of the Employee class:

72 00 08 Employee

 
 

E6 D2 86 7D AE AC 18 1B 02

Fingerprint and flags

 

00 03

Number of instance fields

 

D 00 06 salary

Instance field type and name

 

L 00 07 hireDay

Instance field type and name

 

74 00 10 Ljava/util/Date;

Instance field class name—Date

 

L 00 04 name

Instance field type and name

 

74 00 12 Ljava/lang/String;

Instance field class name—String

 

78

End marker

 

70

No superclass

These descriptors are fairly long. If the same class descriptor is needed again in the file, then an abbreviated form is used:

71

4-byte serial number

The serial number refers to the previous explicit class descriptor. We discuss the numbering scheme later.

An object is stored as

73

class descriptor

object data

For example, here is how an Employee object is stored:

40 E8 6A 00 00 00 00 00

salary field value—double

73

hireDay field value—new object

 

71 00 7E 00 08

Existing class java.util.Date

 

77 08 00 00 00 91 1B 4E B1 80 78

External storage—details later

74 00 0C Harry Hacker

name field value—String

As you can see, the data file contains enough information to restore the Employee object.

Arrays are saved in the following format:

75

class descriptor

4-byte number of entries

entries

The array class name in the class descriptor is in the same format as that used by native methods (which is slightly different from the class name used by class names in other class descriptors). In this format, class names start with an L and end with a semicolon.

For example, an array of three Employee objects starts out like this:

75

Array

 

72 00 0B [LEmployee;

New class, string length, class name Employee[]

  

FC BF 36 11 C5 91 11 C7 02

Fingerprint and flags

  

00 00

Number of instance fields

  

78

End marker

  

70

No superclass

  

00 00 00 03

Number of array entries

Note that the fingerprint for an array of Employee objects is different from a fingerprint of the Employee class itself.

Of course, studying these codes can be about as exciting as reading the average phone book. But it is still instructive to know that the object stream contains a detailed description of all the objects that it contains, with sufficient detail to allow reconstruction of both objects and arrays of objects.

Solving the Problem of Saving Object References

We now know how to save objects that contain numbers, strings, or other simple objects. However, there is one important situation that we still need to consider. What happens when one object is shared by several objects as part of its state?

To illustrate the problem, let us make a slight modification to the Manager class. Let’s assume that each manager has a secretary, implemented as an instance variable secretary of type Employee. (It would make sense to derive a class Secretary from Employee for this purpose, but we don’t do that here.)

class Manager extends Employee
{
   . . .
   private Employee secretary;
}

Having done this, you must keep in mind that the Manager object now contains a reference to the Employee object that describes the secretary, not a separate copy of the object.

In particular, two managers can share the same secretary, as is the case in Figure 12–6 and the following code:

harry = new Employee("Harry Hacker", . . .);
Manager carl = new Manager("Carl Cracker", . . .);
carl.setSecretary(harry);
Manager tony = new Manager("Tony Tester", . . .);
tony.setSecretary(harry);

Now, suppose we write the employee data to disk. What we don’t want is for the Manager to save its information according to the following logic:

  • Save employee data;

  • Save secretary data.

Then, the data for harry would be saved three times. When reloaded, the objects would have the configuration shown in Figure 12–7.

Here, Harry is saved three times

Figure 12–7. Here, Harry is saved three times

This is not what we want. Suppose the secretary gets a raise. We would not want to hunt for all other copies of that object and apply the raise as well. We want to save and restore only one copy of the secretary. To do this, we must copy and restore the original references to the objects. In other words, we want the object layout on disk to be exactly like the object layout in memory. This is called persistence in object-oriented circles.

Of course, we cannot save and restore the memory addresses for the secretary objects. When an object is reloaded, it will likely occupy a completely different memory address than it originally did.

Two managers can share a mutual employee

Figure 12–6. Two managers can share a mutual employee

Instead, Java uses a serialization approach. Hence, the name object serialization for this mechanism. Here is the algorithm:

  • All objects that are saved to disk are given a serial number (1, 2, 3, and so on, as shown in Figure 12–8).

    An example of object serialization

    Figure 12–8. An example of object serialization

  • When saving an object to disk, find out if the same object has already been stored.

  • If it has been stored previously, just write “same as previously saved object with serial number x.” If not, store all its data.

When reading back the objects, simply reverse the procedure. For each object that you load, note its sequence number and remember where you put it in memory. When you encounter the tag “same as previously saved object with serial number x,” you look up where you put the object with serial number x and set the object reference to that memory address.

Note that the objects need not be saved in any particular order. Figure 12–9 shows what happens when a manager occurs first in the staff array.

Objects saved in random order

Figure 12–9. Objects saved in random order

All of this sounds confusing, and it is. Fortunately, when object streams are used, the process is also completely automatic. Object streams assign the serial numbers and keep track of duplicate objects. The exact numbering scheme is slightly different from that used in the figures—see the next section.

Note

Note

In this chapter, we use serialization to save a collection of objects to a disk file and retrieve it exactly as we stored it. Another very important application is the transmittal of a collection of objects across a network connection to another computer. Just as raw memory addresses are meaningless in a file, they are also meaningless when communicating with a different processor. Because serialization replaces memory addresses with serial numbers, it permits the transport of object collections from one machine to another. We study that use of serialization when discussing remote method invocation in Volume 2.

Example 12–5 is a program that saves and reloads a network of Employee and Manager objects (some of which share the same employee as a secretary). Note that the secretary object is unique after reloading—when newStaff[1] gets a raise, that is reflected in the secretary fields of the managers.

Example 12–5. ObjectRefTest.java

  1. import java.io.*;
  2. import java.util.*;
  3.
  4. class ObjectRefTest
  5. {
  6.    public static void main(String[] args)
  7.    {
  8.       Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1);
  9.       Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15);
 10.       boss.setSecretary(harry);
 11.
 12.       Employee[] staff = new Employee[3];
 13.
 14.       staff[0] = boss;
 15.       staff[1] = harry;
 16.       staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15);
 17.
 18.       try
 19.       {
 20.          // save all employee records to the file employee.dat
 21.          ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream
ObjectRefTest.java("employee.dat"));
 22.          out.writeObject(staff);
 23.          out.close();
 24.
 25.          // retrieve all records into a new array
 26.          ObjectInputStream in =  new ObjectInputStream(new FileInputStream("employee
ObjectRefTest.java.dat"));
 27.          Employee[] newStaff = (Employee[]) in.readObject();
 28.          in.close();
 29.
 30.          // raise secretary's salary
 31.          newStaff[1].raiseSalary(10);
 32.
 33.          // print the newly read employee records
 34.          for (Employee e : newStaff)
 35.             System.out.println(e);
 36.       }
 37.       catch (Exception e)
 38.       {
 39.          e.printStackTrace();
 40.       }
 41.    }
 42. }
 43.
 44. class Employee implements Serializable
 45. {
 46.    public Employee() {}
 47.
 48.    public Employee(String n, double s, int year, int month, int day)
 49.    {
 50.       name = n;
 51.       salary = s;
 52.       GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day);
 53.       hireDay = calendar.getTime();
 54.    }
 55.
 56.    public String getName()
 57.    {
 58.       return name;
 59.    }
 60.
 61.    public double getSalary()
 62.    {
 63.       return salary;
 64.    }
 65.
 66.    public Date getHireDay()
 67.    {
 68.       return hireDay;
 69.    }
 70.
 71.    public void raiseSalary(double byPercent)
 72.    {
 73.       double raise = salary * byPercent / 100;
 74.       salary += raise;
 75.    }
 76.
 77.    public String toString()
 78.    {
 79.       return getClass().getName()
 80.          + "[name=" + name
 81.          + ",salary=" + salary
 82.          + ",hireDay=" + hireDay
 83.          + "]";
 84.    }
 85.
 86.    private String name;
 87.    private double salary;
 88.    private Date hireDay;
 89. }
 90.
 91. class Manager extends Employee
 92. {
 93.    /**
 94.       Constructs a Manager without a secretary
 95.       @param n the employee's name
 96.       @param s the salary
 97.       @param year the hire year
 98.       @param month the hire month
 99.       @param day the hire day
100.    */
101.    public Manager(String n, double s, int year, int month, int day)
102.    {
103.       super(n, s, year, month, day);
104.       secretary = null;
105.    }
106.
107.    /**
108.       Assigns a secretary to the manager.
109.       @param s the secretary
110.    */
111.    public void setSecretary(Employee s)
112.    {
113.       secretary = s;
114.    }
115.
116.    public String toString()
117.    {
118.       return super.toString()
119.         + "[secretary=" + secretary
120.         + "]";
121.    }
122.
123.    private Employee secretary;
124. }

Understanding the Output Format for Object References

This section continues the discussion of the output format of object streams. If you skipped the previous discussion, you should skip this section as well.

All objects (including arrays and strings) and all class descriptors are given serial numbers as they are saved in the output file. This process is referred to as serialization because every saved object is assigned a serial number. (The count starts at 00 7E 00 00.)

We already saw that a full class descriptor for any given class occurs only once. Subsequent descriptors refer to it. For example, in our previous example, a repeated reference to the Date class was coded as

71 00 7E 00 08

The same mechanism is used for objects. If a reference to a previously saved object is written, it is saved in exactly the same way, that is, 71 followed by the serial number. It is always clear from the context whether the particular serial reference denotes a class descriptor or an object.

Finally, a null reference is stored as

70

Here is the commented output of the ObjectRefTest program of the preceding section. If you like, run the program, look at a hex dump of its data file employee.dat, and compare it with the commented listing. The important lines toward the end of the output show the reference to a previously saved object.

AC ED 00 05

File header

75

Array staff (serial #1)

 

72 00 0B [LEmployee;

New class, string length, class name Employee[] (serial #0)

  

FC BF 36 11 C5 91 11 C7 02

Fingerprint and flags

  

00 00

Number of instance fields

  

78

End marker

  

70

No superclass

  

00 00 00 03

Number of array entries

 

73

staff[0]—. new object (serial #7)

  

72 00 07 Manager

New class, string length, class name (serial #2)

   

36 06 AE 13 63 8F 59 B7 02

Fingerprint and flags

   

00 01

Number of data fields

   

L 00 09 secretary

Instance field type and name

   

74 00 0A LEmployee;

Instance field class name—String (serial #3)

   

78

End marker

   

72 00 08 Employee

Superclass—new class, string length, class name (serial #4)

    

E6 D2 86 7D AE AC 18 1B 02

Fingerprint and flags

    

00 03

Number of instance fields

    

D 00 06 salary

Instance field type and name

    

L 00 07 hireDay

Instance field type and name

    

74 00 10 Ljava/util/Date;

Instance field class name—String (serial #5)

    

L 00 04 name

Instance field type and name

    

74 00 12 Ljava/lang/String;

Instance field class name—String (serial #6)

    

78

End marker

    

70

No superclass

  

40 F3 88 00 00 00 00 00

salary field value—double

  

73

hireDay field value—new object (serial #9)

  

72 00 0E java.util.Date

New class, string length, class name (serial #8)

   

68 6A 81 01 4B 59 74 19 03

Fingerprint and flags

   

00 00

No instance variables

   

78

End marker

   

70

No superclass

  

77 08

External storage, number of bytes

  

00 00 00 83 E9 39 E0 00

Date

  

78

End marker

 

74 00 0C Carl Cracker

name field value—String (serial #10)

 

73

secretary field value—new object (serial #11)

  

71 00 7E 00 04

existing class (use serial #4)

  

40 E8 6A 00 00 00 00 00

salary field value—double

  

73

hireDay field value—new object (serial #12)

   

71 00 7E 00 08

Existing class (use serial #8)

   

77 08

External storage, number of bytes

   

00 00 00 91 1B 4E B1 80

Date

   

78

End marker

  

74 00 0C Harry Hacker

name field value—String (serial #13)

71 00 7E 00 0B

staff[1]—. existing object (use serial #11)

73

staff[2]—. new object (serial #14)

 

71 00 7E 00 04

Existing class (use serial #4)

 

40 E3 88 00 00 00 00 00

salary field value—double

 

73

hireDay field value—new object (serial #15)

  

71 00 7E 00 08

Existing class (use serial #8)

  

77 08

External storage, number of bytes

  

00 00 00 94 6D 3E EC 00 00

Date

  

78

End marker

 

74 00 0B Tony Tester

name field value—String (serial # 16)

It is usually not important to know the exact file format (unless you are trying to create an evil effect by modifying the data). What you should remember is this:

  • The object stream output contains the types and data fields of all objects.

  • Each object is assigned a serial number.

  • Repeated occurrences of the same object are stored as references to that serial number.

Modifying the Default Serialization Mechanism

Certain data fields should never be serialized, for example, integer values that store file handles or handles of windows that are only meaningful to native methods. Such information is guaranteed to be useless when you reload an object at a later time or transport it to a different machine. In fact, improper values for such fields can actually cause native methods to crash. Java has an easy mechanism to prevent such fields from ever being serialized. Mark them with the keyword transient. You also need to tag fields as transient if they belong to nonserializable classes. Transient fields are always skipped when objects are serialized.

The serialization mechanism provides a way for individual classes to add validation or any other desired action to the default read and write behavior. A serializable class can define methods with the signature

private void readObject(ObjectInputStream in)
   throws IOException, ClassNotFoundException;
private void writeObject(ObjectOutputStream out)
   throws IOException;

Then, the data fields are no longer automatically serialized, and these methods are called instead.

Here is a typical example. A number of classes in the java.awt.geom package, such as Point2D.Double, are not serializable. Now suppose you want to serialize a class LabeledPoint that stores a String and a Point2D.Double. First, you need to mark the Point2D.Double field as transient to avoid a NotSerializableException.

public class LabeledPoint implements Serializable
{
   . . .
   private String label;
   private transient Point2D.Double point;
}

In the writeObject method, we first write the object descriptor and the String field, state, by calling the defaultWriteObject method. This is a special method of the ObjectOutputStream class that can only be called from within a writeObject method of a serializable class. Then we write the point coordinates, using the standard DataOutput calls.

private void writeObject(ObjectOutputStream out)
   throws IOException
{
   out.defaultWriteObject();
   out.writeDouble(point.getX());
   out.writeDouble(point.getY());
}

In the readObject method, we reverse the process:

private void readObject(ObjectInputStream in)
   throws IOException
{
   in.defaultReadObject();
   double x = in.readDouble();
   double y = in.readDouble();
   point = new Point2D.Double(x, y);
}

Another example is the java.util.Date class that supplies its own readObject and writeObject methods. These methods write the date as a number of milliseconds from the epoch (January 1, 1970, midnight UTC). The Date class has a complex internal representation that stores both a Calendar object and a millisecond count, to optimize lookups. The state of the Calendar is redundant and does not have to be saved.

The readObject and writeObject methods only need to save and load their data fields. They should not concern themselves with superclass data or any other class information.

Rather than letting the serialization mechanism save and restore object data, a class can define its own mechanism. To do this, a class must implement the Externalizable interface. This in turn requires it to define two methods:

public void readExternal(ObjectInputStream in)
  throws IOException, ClassNotFoundException;
public void writeExternal(ObjectOutputStream out)
  throws IOException;

Unlike the readObject and writeObject methods that were described in the preceding section, these methods are fully responsible for saving and restoring the entire object, including the superclass data. The serialization mechanism merely records the class of the object in the stream. When reading an externalizable object, the object stream creates an object with the default constructor and then calls the readExternal method. Here is how you can implement these methods for the Employee class:

public void readExternal(ObjectInput s)
   throws IOException
{
   name = s.readUTF();
   salary = s.readDouble();
   hireDay = new Date(s.readLong());
}

public void writeExternal(ObjectOutput s)
   throws IOException
{
  s.writeUTF(name);
  s.writeDouble(salary);
  s.writeLong(hireDay.getTime());
}

Tip

Tip

Serialization is somewhat slow because the virtual machine must discover the structure of each object. If you are concerned about performance and if you read and write a large number of objects of a particular class, you should investigate the use of the Externalizable interface. The tech tip http://developer.java.sun.com/developer/TechTips/txtarchive/Apr00_Stu.txt demonstrates that in the case of an employee class, using external reading and writing was about 35%–40% faster than the default serialization.

Caution

Caution

Unlike the readObject and writeObject methods, which are private and can only be called by the serialization mechanism, the readExternal and writeExternal methods are public. In particular, readExternal potentially permits modification of the state of an existing object.

Note

Note

For even more exotic variations of serialization, see http://www.absolutejava.com/serialization.

Serializing Singletons and Typesafe Enumerations

You have to pay particular attention when serializing and deserializing objects that are assumed to be unique. This commonly happens when you are implementing singletons and typesafe enumerations.

If you use the enum construct of JDK 5.0, then you need not worry about serialization—it just works. However, suppose you maintain legacy code that contains an enumerated type such as

public class Orientation
{
   public static final Orientation HORIZONTAL = new Orientation(1);
   public static final Orientation VERTICAL  = new Orientation(2);
   private Orientation(int v) { value = v; }
   private int value;
}

This idiom was common before enumerations were added to the Java language. Note that the constructor is private. Thus, no objects can be created beyond Orientation.HORIZONTAL and Orientation.VERTICAL. In particular, you can use the == operator to test for object equality:

if (orientation == Orientation.HORIZONTAL) . . .

There is an important twist that you need to remember when a typesafe enumeration implements the Serializable interface. The default serialization mechanism is not appropriate. Suppose we write a value of type Orientation and read it in again:

Orientation original = Orientation.HORIZONTAL;
ObjectOutputStream out = . . .;
out.write(value);
out.close();
ObjectInputStream in = . . .;
Orientation saved = (Orientation) in.read();

Now the test

if (saved == Orientation.HORIZONTAL) . . .

will fail. In fact, the saved value is a completely new object of the Orientation type and not equal to any of the predefined constants. Even though the constructor is private, the serialization mechanism can create new objects!

To solve this problem, you need to define another special serialization method, called readResolve. If the readResolve method is defined, it is called after the object is deserialized. It must return an object that then becomes the return value of the readObject method. In our case, the readResolve method will inspect the value field and return the appropriate enumerated constant:

protected Object readResolve() throws ObjectStreamException
{
   if (value == 1) return Orientation.HORIZONTAL;
   if (value == 2) return Orientation.VERTICAL;
   return null; // this shouldn't happen
}

Remember to add a readResolve method to all typesafe enumerations in your legacy code and to all classes that follow the singleton design pattern.

Versioning

In the previous sections, we showed you how to save relatively small collections of objects by means of an object stream. But those were just demonstration programs. With object streams, it helps to think big. Suppose you write a program that lets the user produce a document. This document contains paragraphs of text, tables, graphs, and so on. You can stream out the entire document object with a single call to writeObject:

out.writeObject(doc);

The paragraph, table, and graph objects are automatically streamed out as well. One user of your program can then give the output file to another user who also has a copy of your program, and that program loads the entire document with a single call to readObject:

doc = (Document) in.readObject();

This is very useful, but your program will inevitably change, and you will release a version 1.1. Can version 1.1 read the old files? Can the users who still use 1.0 read the files that the new version is now producing? Clearly, it would be desirable if object files could cope with the evolution of classes.

At first glance it seems that this would not be possible. When a class definition changes in any way, then its SHA fingerprint also changes, and you know that object streams will refuse to read in objects with different fingerprints. However, a class can indicate that it is compatible with an earlier version of itself. To do this, you must first obtain the fingerprint of the earlier version of the class. You use the stand-alone serialver program that is part of the JDK to obtain this number. For example, running

serialver Employee

prints

Employee: static final long serialVersionUID = -1814239825517340645L;

If you start the serialver program with the -show option, then the program brings up a graphical dialog box (see Figure 12–10).

The graphical version of the serialver program

Figure 12–10. The graphical version of the serialver program

All later versions of the class must define the serialVersionUID constant to the same fingerprint as the original.

class Employee implements Serializable // version 1.1
{
   . . .
   public static final long serialVersionUID = -1814239825517340645L;
}

When a class has a static data member named serialVersionUID, it will not compute the fingerprint manually but instead will use that value.

Once that static data member has been placed inside a class, the serialization system is now willing to read in different versions of objects of that class.

If only the methods of the class change, then there is no problem with reading the new object data. However, if data fields change, then you may have problems. For example, the old file object may have more or fewer data fields than the one in the program, or the types of the data fields may be different. In that case, the object stream makes an effort to convert the stream object to the current version of the class.

The object stream compares the data fields of the current version of the class with the data fields of the version in the stream. Of course, the object stream considers only the nontransient and nonstatic data fields. If two fields have matching names but different types, then the object stream makes no effort to convert one type to the other—the objects are incompatible. If the object in the stream has data fields that are not present in the current version, then the object stream ignores the additional data. If the current version has data fields that are not present in the streamed object, the added fields are set to their default (null for objects, zero for numbers, and false for Boolean values).

Here is an example. Suppose we have saved a number of employee records on disk, using the original version (1.0) of the class. Now we change the Employee class to version 2.0 by adding a data field called department. Figure 12–11 shows what happens when a 1.0 object is read into a program that uses 2.0 objects. The department field is set to null. Figure 12–12 shows the opposite scenario: a program using 1.0 objects reads a 2.0 object. The additional department field is ignored.

Reading an object with fewer data fieldsobjectsreading objects and data fields

Figure 12–11. Reading an object with fewer data fields

Reading an object with more data fields

Figure 12–12. Reading an object with more data fields

Is this process safe? It depends. Dropping a data field seems harmless—the recipient still has all the data that it knew how to manipulate. Setting a data field to null may not be so safe. Many classes work hard to initialize all data fields in all constructors to non-null values, so that the methods don’t have to be prepared to handle null data. It is up to the class designer to implement additional code in the readObject method to fix version incompatibilities or to make sure the methods are robust enough to handle null data.

Using Serialization for Cloning

There is an amusing (and, occasionally, very useful) use for the serialization mechanism: it gives you an easy way to clone an object provided the class is serializable. (Recall from Chapter 6 that you need to do a bit of work to allow an object to be cloned.)

To clone a serializable object, simply serialize it to an output stream and then read it back in. The result is a new object that is a deep copy of the existing object. You don’t have to write the object to a file—you can use a ByteArrayOutputStream to save the data into a byte array.

As Example 12–6 shows, to get clone for free, simply extend the SerialCloneable class, and you are done.

You should be aware that this method, although clever, will usually be much slower than a clone method that explicitly constructs a new object and copies or clones the data fields (as you saw in Chapter 6).

Example 12–6. SerialCloneTest.java

 1. import java.io.*;
 2. import java.util.*;
 3.
 4. public class SerialCloneTest
 5. {
 6.    public static void main(String[] args)
 7.    {
 8.       Employee harry = new Employee("Harry Hacker", 35000, 1989, 10, 1);
 9.       // clone harry
10.      Employee harry2 = (Employee) harry.clone();
11.
12.      // mutate harry
13.      harry.raiseSalary(10);
14.
15.      // now harry and the clone are different
16.      System.out.println(harry);
17.      System.out.println(harry2);
18.   }
19. }
20.
21. /**
22.    A class whose clone method uses serialization.
23. */
24. class SerialCloneable implements Cloneable, Serializable
25. {
26.    public Object clone()
27.    {
28.       try
29.       {
30.          // save the object to a byte array
31.          ByteArrayOutputStream bout = new ByteArrayOutputStream();
32.          ObjectOutputStream out = new ObjectOutputStream(bout);
33.          out.writeObject(this);
34.          out.close();
35.
36.          // read a clone of the object from the byte array
37.          ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray());
38.          ObjectInputStream in = new ObjectInputStream(bin);
39.          Object ret = in.readObject();
40.          in.close();
41.
42.          return ret;
43.       }
44.       catch (Exception e)
45.       {
46.          return null;
47.       }
48.    }
49. }
50.
51. /**
52.    The familiar Employee class, redefined to extend the
53.    SerialCloneable class.
54. */
55. class Employee extends SerialCloneable
56. {
57.    public Employee(String n, double s, int year, int month, int day)
58.    {
59.       name = n;
60.       salary = s;
61.       GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day);
62.       hireDay = calendar.getTime();
63.    }
64.
65.    public String getName()
66.    {
67.       return name;
68.    }
69.
70.    public double getSalary()
71.    {
72.       return salary;
73.    }
74.
75.    public Date getHireDay()
76.    {
77.       return hireDay;
78.    }
79.
80.    public void raiseSalary(double byPercent)
81.    {
82.       double raise = salary * byPercent / 100;
83.       salary += raise;
84.    }
85.
86.    public String toString()
87.    {
88.       return getClass().getName()
89.          + "[name=" + name
90.          + ",salary=" + salary
91.          + ",hireDay=" + hireDay
92.          + "]";
93.    }
94.
95.    private String name;
96.    private double salary;
97.    private Date hireDay;
98. }

File Management

You have learned how to read and write data from a file. However, there is more to file management than reading and writing. The File class encapsulates the functionality that you will need in order to work with the file system on the user’s machine. For example, you use the File class to find out when a file was last modified or to remove or rename the file. In other words, the stream classes are concerned with the contents of the file, whereas the File class is concerned with the storage of the file on a disk.

Note

Note

As is so often the case in Java, the File class takes the least common denominator approach. For example, under Windows, you can find out (or set) the read-only flag for a file, but while you can find out if it is a hidden file, you can’t hide it without using a native method (see Volume 2).

The simplest constructor for a File object takes a (full) file name. If you don’t supply a path name, then Java uses the current directory. For example,

File f = new File("test.txt");

gives you a file object with this name in the current directory. (The “current directory” is the current directory of the process that executes the virtual machine. If you launched the virtual machine from the command line, it is the directory from which you started the java executable.)

A call to this constructor does not create a file with this name if it doesn’t exist. Actually, creating a file from a File object is done with one of the stream class constructors or the createNewFile method in the File class. The createNewFile method only creates a file if no file with that name exists, and it returns a boolean to tell you whether it was successful.

On the other hand, once you have a File object, the exists method in the File class tells you whether a file exists with that name. For example, the following trial program would almost certainly print “false” on anyone’s machine, and yet it can print out a path name to this nonexistent file.

import java.io.*;


public class Test
{
   public static void main(String args[])
   {
      File f = new File("afilethatprobablydoesntexist");
      System.out.println(f.getAbsolutePath());
      System.out.println(f.exists());
   }
}

There are two other constructors for File objects:

File(String path, String name)

which creates a File object with the given name in the directory specified by the path parameter. (If the path parameter is null, this constructor creates a File object, using the current directory.)

Finally, you can use an existing File object in the constructor:

File(File dir, String name)

where the File object represents a directory and, as before, if dir is null, the constructor creates a File object in the current directory.

Somewhat confusingly, a File object can represent either a file or a directory (perhaps because the operating system that the Java designers were most familiar with happens to implement directories as files). You use the isDirectory and isFile methods to tell whether the file object represents a file or a directory. This is surprising—in an object-oriented system, you might have expected a separate Directory class, perhaps extending the File class.

To make an object representing a directory, you simply supply the directory name in the File constructor:

File tempDir = new File(File.separator + "temp");

If this directory does not yet exist, you can create it with the mkdir method:

tempDir.mkdir();

If a file object represents a directory, use list() to get an array of the file names in that directory. The program in Example 12–7 uses all these methods to print out the directory substructure of whatever path is entered on the command line. (It would be easy enough to change this program into a utility class that returns a vector of the subdirectories for further processing.)

Tip

Tip

Always use File objects, not strings, when manipulating file or directory names. For example, the equals method of the File class knows that some file systems are not case significant and that a trailing / in a directory name doesn’t matter.

Example 12–7. FindDirectories.java

 1. import java.io.*;
 2.
 3. public class FindDirectories
 4. {
 5.    public static void main(String[] args)
 6.    {
 7.       // if no arguments provided, start at the parent directory
 8.       if (args.length == 0) args = new String[] { ".." };
 9.
10.       try
11.       {
12.          File pathName = new File(args[0]);
13.          String[] fileNames = pathName.list();
14.
15.          // enumerate all files in the directory
16.          for (int i = 0; i < fileNames.length; i++)
17.          {
18.             File f = new File(pathName.getPath(), fileNames[i]);
19.
20.             // if the file is again a directory, call the main method recursively
21.             if (f.isDirectory())
22.             {
23.                System.out.println(f.getCanonicalPath());
24.                main(new String [] { f.getPath() });
25.             }
26.          }
27.       }
28.       catch(IOException e)
29.       {
30.          e.printStackTrace();
31.       }
32.    }
33. }

Rather than listing all files in a directory, you can use a FileNameFilter object as a parameter to the list method to narrow down the list. These objects are simply instances of a class that satisfies the FilenameFilter interface.

All a class needs to do to implement the FilenameFilter interface is define a method called accept. Here is an example of a simple FilenameFilter class that allows only files with a specified extension:

public class ExtensionFilter implements FilenameFilter
{
   public ExtensionFilter(String ext)
   {
      extension = "." + ext;
   }

   public boolean accept(File dir, String name)
   {
      return name.endsWith(extension);
   }

   private String extension;
}

When writing portable programs, it is a challenge to specify file names with subdirectories. As we mentioned earlier, it turns out that you can use a forward slash (the UNIX separator) as the directory separator in Windows as well, but other operating systems might not permit this, so we don’t recommend using a forward slash.

Caution

Caution

If you do use forward slashes as directory separators in Windows when constructing a File object, the getAbsolutePath method returns a file name that contains forward slashes, which will look strange to Windows users. Instead, use the getCanonicalPath method—it replaces the forward slashes with backslashes.

It is much better to use the information about the current directory separator that the File class stores in a static instance field called separator. (In a Windows environment, this is a backslash (); in a UNIX environment, it is a forward slash (/). For example:

File foo = new File("Documents" + File.separator + "data.txt")

Of course, if you use the second alternate version of the File constructor

File foo = new File("Documents", "data.txt")

then the constructor will supply the correct separator.

The API notes that follow give you what we think are the most important remaining methods of the File class; their use should be straightforward.

Caution
java.io.File 1.0
  • boolean canRead()

    indicates whether the file can be read by the current application.

  • boolean canWrite()

    indicates whether the file is writable or read-only.

  • static boolean createTempFile(String prefix, String suffix) 1.2

  • static boolean createTempFile(String prefix, String suffix, File directory) 1.2

    create a temporary file in the system’s default temp directory or the given directory, using the given prefix and suffix to generate the temporary name.

    Parameters:

    prefix

    A prefix string that is at least three characters long

     

    suffix

    An optional suffix. If null, .tmp is used

     

    directory

    The directory in which the file is created. If it is null, the file is created in the current working directory

  • boolean delete()

    tries to delete the file. Returns true if the file was deleted, false otherwise.

  • void deleteOnExit()

    requests that the file be deleted when the virtual machine shuts down.

  • boolean exists()

    true if the file or directory exists; false otherwise.

  • String getAbsolutePath()

    returns a string that contains the absolute path name. Tip: Use getCanonicalPath instead.

  • File getCanonicalFile() 1.2

    returns a File object that contains the canonical path name for the file. In particular, redundant “.” directories are removed, the correct directory separator is used, and the capitalization preferred by the underlying file system is obtained.

  • String getCanonicalPath() 1.1

    returns a string that contains the canonical path name. In particular, redundant “.” directories are removed, the correct directory separator is used, and the capitalization preferred by the underlying file system is obtained.

  • String getName()

    returns a string that contains the file name of the File object (does not include path information).

  • String getParent()

    returns a string that contains the name of the parent of this File object. If this File object is a file, then the parent is the directory containing it. If it is a directory, then the parent is the parent directory or null if there is no parent directory.

  • File getParentFile() 1.2

    returns a File object for the parent of this File directory. See getParent for a definition of “parent.”

  • String getPath()

    returns a string that contains the path name of the file.

  • boolean isDirectory()

    returns true if the File represents a directory; false otherwise.

  • boolean isFile()

    returns true if the File object represents a file as opposed to a directory or a device.

  • boolean isHidden() 1.2

    returns true if the File object represents a hidden file or directory.

  • long lastModified()

    returns the time the file was last modified (counted in milliseconds since Midnight January 1, 1970 GMT), or 0 if the file does not exist. Use the Date(long) constructor to convert this value to a date.

  • long length()

    returns the length of the file in bytes, or 0 if the file does not exist.

  • String[] list()

    returns an array of strings that contain the names of the files and directories contained by this File object, or null if this File was not representing a directory.

  • String[] list(FilenameFilter filter)

    returns an array of the names of the files and directories contained by this File that satisfy the filter, or null if none exist.

    Parameters:

    filter

    The FilenameFilter object to use

  • File[] listFiles() 1.2

    returns an array of File objects corresponding to the files and directories contained by this File object, or null if this File was not representing a directory.

  • File[] listFiles(FilenameFilter filter) 1.2

    returns an array of File objects for the files and directories contained by this File that satisfy the filter, or null if none exist.

    Parameters:

    filter

    The FilenameFilter object to use

  • static File[] listRoots() 1.2

    returns an array of File objects corresponding to all the available file roots. (For example, on a Windows system, you get the File objects representing the installed drives (both local drives and mapped network drives). On a UNIX system, you simply get “/”.)

  • boolean createNewFile() 1.2

    automatically makes a new file whose name is given by the File object if no file with that name exists. That is, the checking for the file name and the creation are not interrupted by other file system activity. Returns true if the method created the file.

  • boolean mkdir()

    makes a subdirectory whose name is given by the File object. Returns true if the directory was successfully created; false otherwise.

  • boolean mkdirs()

    unlike mkdir, creates the parent directories if necessary. Returns false if any of the necessary directories could not be created.

  • boolean renameTo(File dest)

    returns true if the name was changed; false otherwise.

    Parameters:

    dest

    A File object that specifies the new name

  • boolean setLastModified(long time) 1.2

    sets the last modified time of the file. Returns true if successful, false otherwise.

    Parameters:

    time

    A long integer representing the number of milliseconds since Midnight January 1, 1970, GMT. Use the getTime method of the Date class to calculate this value

  • boolean setReadOnly() 1.2

    sets the file to be read-only. Returns true if successful, false otherwise.

  • URL toURL() 1.2

    converts the File object to a file URL.

Caution
java.io.FilenameFilter 1.0
  • boolean accept(File dir, String name)

    should be defined to return true if the file matches the filter criterion.

    Parameters:

    dir

    A File object representing the directory that contains the file

     

    name

    The name of the file

New I/O

JDK 1.4 contains a number of features for improved input/output processing, collectively called the “new I/O,” in the java.nio package. (Of course, the “new” moniker is somewhat regrettable because, a few years down the road, the package won’t be new any longer.)

The package includes support for the following features:

  • Memory-mapped files

  • File locking

  • Character set encoders and decoders

  • Nonblocking I/O

We already introduced character sets on page 633. In this section, we discuss only the first two features. Nonblocking I/O requires the use of threads, which are covered in Volume 2.

Memory-Mapped Files

Most operating systems can take advantage of the virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.

At the end of this section, you can find a program that computes the CRC32 checksum of a file, using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 12–7 when computing the checksum of the 37-Mbyte file rt.jar in the jre/lib directory of the JDK.

Table 12–7. Timing Data for File Operations

Method

Time

Plain Input Stream

110 seconds

Buffered Input Stream

9.9 seconds

Random Access File

162 seconds

Memory Mapped file

7.2 seconds

As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFile.

Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain can be substantial if you need to use random access. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.

The java.nio package makes memory mapping quite simple. Here is what you do.

First, get a channel from the file. A channel is an abstraction for disk files that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files. You get a channel by calling the getChannel method that has been added to the FileInputStream, FileOutputStream, and RandomAccessFile class.

FileInputStream in = new FileInputStream(. . .);
FileChannel channel = in.getChannel();

Then you get a MappedByteBuffer from the channel by calling the map method of the FileChannel class. You specify the area of the file that you want to map and a mapping mode. Three modes are supported:

  • FileChannel.MapMode.READ_ONLYThe resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.

  • FileChannel.MapMode.READ_WRITEThe resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file may not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs is operating-system dependent.

  • FileChannel.MapMode.PRIVATEThe resulting buffer is writable, but any changes are private to this buffer and are not propagated to the file.

Once you have the buffer, you can read and write data, using the methods of the ByteBuffer class and the Buffer superclass.

Buffers support both sequential and random data access. A buffer has a position that is advanced by get and put operations. For example, you can sequentially traverse all bytes in the buffer as

while (buffer.hasRemaining())
{
   byte b = buffer.get();
   . . .
}

Alternatively, you can use random access:

for (int i = 0; i < buffer.limit(); i++)
{
   byte b = buffer.get(i);
   . . .
}

You can also read and write arrays of bytes with the methods

get(byte[] bytes)
get(byte[], int offset, int length)

Finally, there are methods

getInt
getLong
getShort
getChar
getFloat
getDouble

to read primitive type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call

buffer.order(ByteOrder.LITTLE_ENDIAN);

To find out the current byte order of a buffer, call

ByteOrder b = buffer.order()

Caution

Caution

This pair of methods does not use the set/get naming convention.

To write numbers to a buffer, use one of the methods

putInt
putLong
putShort
putChar
putFloat
putDouble

Example 12–8 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That quantity is a checksum that is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip package contains a class CRC32 that computes the checksum of a sequence of bytes, using the following loop:

CRC32 crc = new CRC32();while (more bytes)   crc.update(next byte)long checksum = crc.getValue();

Note

Note

For a nice explanation of the CRC algorithm, see http://www.relisoft.com/Science/CrcMath.html.

The details of the CRC computation are not important. We just use it as an example of a useful file operation.

Run the program as

java NIOTest filename

Example 12–8. NIOTest.java

  1. import java.io.*;
  2. import java.nio.*;
  3. import java.nio.channels.*;
  4. import java.util.zip.*;
  5.
  6. /**
  7.    This program computes the CRC checksum of a file.
  8.    Usage: java NIOTest filename
  9. */
 10. public class NIOTest
 11. {
 12.    public static long checksumInputStream(String filename)
 13.       throws IOException
 14.    {
 15.       InputStream in = new FileInputStream(filename);
 16.       CRC32 crc = new CRC32();
 17.
 18.       int c;
 19.       while((c = in.read()) != -1)
 20.          crc.update(c);
 21.       return crc.getValue();
 22.    }
 23.
 24.    public static long checksumBufferedInputStream(String filename)
 25.       throws IOException
 26.    {
 27.       InputStream in = new BufferedInputStream(new FileInputStream(filename));
 28.       CRC32 crc = new CRC32();
 29.
 30.       int c;
 31.       while((c = in.read()) != -1)
 32.          crc.update(c);
 33.       return crc.getValue();
 34.    }
 35.
 36.    public static long checksumRandomAccessFile(String filename)
 37.       throws IOException
 38.    {
 39.       RandomAccessFile file = new RandomAccessFile(filename, "r");
 40.       long length = file.length();
 41.       CRC32 crc = new CRC32();
 42.
 43.       for (long p = 0; p < length; p++)
 44.       {
 45.          file.seek(p);
 46.          int c = file.readByte();
 47.          crc.update(c);
 48.       }
 49.       return crc.getValue();
 50.    }
 51.
 52.    public static long checksumMappedFile(String filename)
 53.       throws IOException
 54.    {
 55.       FileInputStream in = new FileInputStream(filename);
 56.       FileChannel channel = in.getChannel();
 57.
 58.       CRC32 crc = new CRC32();
 59.       int length = (int) channel.size();
 60.       MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, length);
 61.
 62.       for (int p = 0; p < length; p++)
 63.       {
 64.          int c = buffer.get(p);
 65.          crc.update(c);
 66.       }
 67.       return crc.getValue();
 68.    }
 69.
 70.    public static void main(String[] args)
 71.       throws IOException
 72.    {
 73.       System.out.println("Input Stream:");
 74.       long start = System.currentTimeMillis();
 75.       long crcValue = checksumInputStream(args[0]);
 76.       long end = System.currentTimeMillis();
 77.       System.out.println(Long.toHexString(crcValue));
 78.       System.out.println((end - start) + " milliseconds");
 79.
 80.       System.out.println("Buffered Input Stream:");
 81.       start = System.currentTimeMillis();
 82.       crcValue = checksumBufferedInputStream(args[0]);
 83.       end = System.currentTimeMillis();
 84.       System.out.println(Long.toHexString(crcValue));
 85.       System.out.println((end - start) + " milliseconds");
 86.
 87.       System.out.println("Random Access File:");
 88.       start = System.currentTimeMillis();
 89.       crcValue = checksumRandomAccessFile(args[0]);
 90.       end = System.currentTimeMillis();
 91.       System.out.println(Long.toHexString(crcValue));
 92.       System.out.println((end - start) + " milliseconds");
 93.
 94.       System.out.println("Mapped File:");
 95.       start = System.currentTimeMillis();
 96.       crcValue = checksumMappedFile(args[0]);
 97.       end = System.currentTimeMillis();
 98.       System.out.println(Long.toHexString(crcValue));
 99.       System.out.println((end - start) + " milliseconds");
100.    }
101. }
NIOTest.java
java.io.FileInputStream 1.0
  • FileChannel getChannel() 1.4

    returns a channel for accessing this stream.

NIOTest.java
java.io.FileOutputStream 1.0
  • FileChannel getChannel() 1.4

    returns a channel for accessing this stream.

NIOTest.java
java.io.RandomAccessFile 1.0
  • FileChannel getChannel() 1.4

    returns a channel for accessing this file.

NIOTest.java
java.nio.channels.FileChannel 1.4
  • MappedByteBuffer map(FileChannel.MapMode mode, long position, long size)

    maps a region of the file to memory.

    Parameters:

    mode

    One of the constants READ_ONLY, READ_WRITE, or PRIVATE in the FileChannel.MapMode class

     

    position

    The start of the mapped region

     

    size

    The size of the mapped region

NIOTest.java
java.nio.Buffer 1.4
  • boolean hasRemaining()

    returns true if the current buffer position has not yet reached the buffer’s limit position.

  • int limit()

    returns the limit position of the buffer, that is, the first position at which no more values are available.

NIOTest.java
java.nio.ByteBuffer 1.4
  • byte get()

    gets a byte from the current position and advances the current position to the next byte.

  • byte get(int index)

    gets a byte from the specified index.

  • ByteBuffer put(byte b)

    puts a byte to the current position and advances the current position to the next byte. Returns a reference to this buffer.

  • ByteBuffer put(int index, byte b)

    puts a byte at the specified index. Returns a reference to this buffer.

  • ByteBuffer get(byte[] destination)

  • ByteBuffer get(byte[] destination, int offset, int length)

    fill a byte array, or a region of a byte array, with bytes from the buffer, and advance the current position by the number of bytes read. If not enough bytes remain in the buffer, then no bytes are read, and a BufferUnderflowException is thrown. Return a reference to this buffer.

    Parameters:

    destination

    The byte array to be filled

     

    offset

    The offset of the region to be filled

     

    length

    The length of the region to be filled

  • ByteBuffer put(byte[] source)

  • ByteBuffer put(byte[] source, int offset, int length)

    put all bytes from a byte array, or the bytes from a region of a byte array, into the buffer, and advance the current position by the number of bytes read. If not enough bytes remain in the buffer, then no bytes are written, and a BufferOverflowException is thrown. Returns a reference to this buffer.

    Parameters:

    source

    The byte array to be written

     

    offset

    The offset of the region to be written

     

    length

    The length of the region to be written

  • Xxx getXxx()

  • Xxx getXxx(int index)

  • ByteBuffer putXxx(xxx value)

  • ByteBuffer putXxx(int index, xxx value)

    are used for relative and absolute reading and writing of binary numbers. Xxx is one of Int, Long, Short, Char, Float, or Double.

  • ByteBuffer order(ByteOrder order)

  • ByteOrder order()

    set or get the byte order. The value for order is one of the constants BIG_ENDIAN or LITTLE_ENDIAN of the ByteOrder class.

The Buffer Data Structure

When you use memory mapping, you make a single buffer that spans the entire file, or the area of the file in which you are interested. You can also use buffers to read and write more modest chunks of information.

In this section, we briefly describe the basic operations on Buffer objects. A buffer is an array of values of the same type. The Buffer class is an abstract class with concrete subclasses ByteBuffer, CharBuffer, DoubleBuffer, FloatBuffer, IntBuffer, LongBuffer, and ShortBuffer. In practice, you will most commonly use ByteBuffer and CharBuffer. As shown in Figure 12–13, a buffer has

  • a capacity that never changes

  • a position at which the next value is read or written

  • a limit beyond which reading and writing is meaningless

  • optionally, a mark for repeating a read or write operation

These values fulfill the condition

0

mark

position

limit

capacity

A buffer

Figure 12–13. A buffer

The principal purpose for a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put to add values to the buffer. When you run out of data or you reach the capacity, it is time to switch to reading.

Call flip to set the limit to the current position and the position to 0. Now keep calling get while the remaining method (which returns limit - position) is positive. When you have read all values in the buffer, call clear to prepare the buffer for the next writing cycle. The clear method resets the position to 0 and the limit to the capacity.

If you want to re-read the buffer, use rewind or mark/reset—see the API notes for details.

A buffer
java.nio.Buffer 1.4
  • Buffer clear()

    prepares this buffer for writing by setting the position to zero and the limit to the capacity; returns this.

  • Buffer flip()

    prepares this buffer for reading by setting the limit to the position and the position to zero; returns this.

  • Buffer rewind()

    prepares this buffer for re-reading the same values by setting the position to zero and leaving the limit unchanged; returns this.

  • Buffer mark()

    sets the mark of this buffer to the position; returns this.

  • Buffer reset()

    sets the position of this buffer to the mark, thus allowing the marked portion to be read or written again; returns this.

  • int remaining()

    returns the remaining number of readable or writable values, that is, the difference between limit and position.

  • int position()

    returns the position of this buffer.

  • int capacity()

    returns the capacity of this buffer.

A buffer
java.nio.CharBuffer 1.4
  • char get()

  • CharBuffer get(char[] destination)

  • CharBuffer get(char[] destination, int offset, int length)

    gets one char value, or a range of char values, starting at the buffer’s position and moving the position past the characters that were read. The last two methods return this.

  • CharBuffer put(char c)

  • CharBuffer put(char[] source)

  • CharBuffer put(char[] source, int offset, int length)

  • CharBuffer put(String source)

  • CharBuffer put(CharBuffer source)

    puts one char value, or a range of char values, starting at the buffer’s position and advancing the position past the characters that were written. When reading from a CharBuffer, all remaining characters are read. All methods return this.

  • CharBuffer read(CharBuffer destination)

    gets char values from this buffer and puts them into the destination until the destination’s limit is reached. Returns this.

File Locking

Consider a situation in which multiple simultaneously executing programs need to modify the same file. Clearly, the programs need to communicate in some way, or the file can easily become damaged.

File locks control access to a file or a range of bytes within a file. However, file locking varies greatly among operating systems, which explains why file locking capabilities were absent from prior versions of the JDK.

Frankly, file locking is not all that common in application programs. Many applications use a database for data storage, and the database has mechanisms for resolving concurrent access problems. If you store information in flat files and are worried about concurrent access, you may well find it simpler to start using a database rather than designing complex file locking schemes.

Still, there are situations in which file locking is essential. Suppose your application saves a configuration file with user preferences. If a user invokes two instances of the application, it could happen that both of them want to write the configuration file at the same time. In that situation, the first instance should lock the file. When the second instance finds the file locked, it can decide to wait until the file is unlocked or simply skip the writing process.

To lock a file, call either the lock or tryLock method of the FileChannel class:

FileLock lock = channel.lock();

or

FileLock lock = channel.tryLock();

The first call blocks until the lock becomes available. The second call returns immediately, either with the lock or null if the lock is not available. The file remains locked until the channel is closed or the release method is invoked on the lock.

You can also lock a portion of the file with the call

FileLock lock(long start, long size, boolean exclusive)

or

FileLock tryLock(long start, long size, boolean exclusive)

The exclusive flag is true to lock the file for both reading and writing. It is false for a shared lock, which allows multiple processes to read from the file, while preventing any process

from acquiring an exclusive lock. Not all operating systems support shared locks. You may get an exclusive lock even if you just asked for a shared one. Call the isShared method of the FileLock class to find out which kind you have.

Note

Note

If you lock the tail portion of a file and the file subsequently grows beyond the locked portion, the additional area is not locked. To lock all bytes, use a size of Long.MAX_VALUE.

Keep in mind that file locking is system dependent. Here are some points to watch for:

  • On some systems, file locking is merely advisory. If an application fails to get a lock, it may still write to a file that another application has currently locked.

  • On some systems, you cannot simultaneously lock a file and map it into memory.

  • File locks are held by the entire Java virtual machine. If two programs are launched by the same virtual machine (such as an applet or application launcher), then they can’t each acquire a lock on the same file. The lock and tryLock methods will throw an OverlappingFileLockException if the virtual machine already holds another overlapping lock on the same file.

  • On some systems, closing a channel releases all locks on the underlying file held by the Java virtual machine. You should therefore avoid multiple channels on the same locked file.

  • Locking files on a networked file system is highly system dependent and should probably be avoided.

Note
java.nio.channels.FileChannel 1.4
  • FileLock lock()

    acquires an exclusive lock on the entire file. This method blocks until the lock is acquired.

  • FileLock tryLock()

    acquires an exclusive lock on the entire file, or returns null if the lock cannot be acquired.

  • FileLock lock(long position, long size, boolean shared)

  • FileLock tryLock(long position, long size, boolean shared)

    acquire a lock on a region of the file. The first method blocks until the lock is acquired, and the second method returns null if the lock cannot be acquired.

    Parameters:

    position

    The start of the region to be locked

     

    size

    The size of the region to be locked

     

    shared

    true for a shared lock, false for an exclusive lock

Note
java.nio.channels.FileLock 1.4
  • void release()

    releases this lock.

Regular Expressions

Regular expressions are used to specify string patterns. You can use regular expressions whenever you need to locate strings that match a particular pattern. For example, one of our sample programs locates all hyperlinks in an HTML file by looking for strings of the pattern <a href="...">.

Of course, for specifying a pattern, the ... notation is not precise enough. You need to specify precisely what sequence of characters is a legal match. You need to use a special syntax whenever you describe a pattern.

Here is a simple example. The regular expression

[Jj]ava.+

matches any string of the following form:

  • The first letter is a J or j.

  • The next three letters are ava.

  • The remainder of the string consists of one or more arbitrary characters.

For example, the string "javanese" matches the particular regular expression, but the string "Core Java" does not.

As you can see, you need to know a bit of syntax to understand the meaning of a regular expression. Fortunately, for most purposes, a small number of straightforward constructs are sufficient.

  • A character class is a set of character alternatives, enclosed in brackets, such as [Jj], [0-9], [A-Za-z], or [^0-9]. Here the - denotes a range (all characters whose Unicode value falls between the two bounds), and ^ denotes the complement (all characters except the ones specified).

  • There are many predefined character classes such as d (digits) or p{Sc} (Unicode currency symbol). See Tables 12–8 and 12–9.

  • Most characters match themselves, such as the ava characters in the example above.

  • The . symbol matches any character (except possibly line terminators, depending on flag settings).

  • Use as an escape character, for example . matches a period and \ matches a backslash.

  • ^ and $ match the beginning and end of a line respectively.

  • If X and Y are regular expressions, then XY means “any match for X followed by a match for Y”. X | Y means “any match for X or Y”.

  • You can apply quantifiers X+ (1 or more), X* (0 or more), and X? (0 or 1) to an expression X.

  • By default, a quantifier matches the largest possible repetition that makes the overall match succeed. You can modify that behavior with suffixes ? (reluctant or stingy match—match the smallest repetition count) and + (possessive or greedy match—match the largest count even if that makes the overall match fail).

    For example, the string cab matches [a-z]*ab but not [a-z]*+ab. In the first case, the expression [a-z]* only matches the character c, so that the characters ab match the remainder of the pattern. But the greedy version [a-z]*+ matches the characters cab, leaving the remainder of the pattern unmatched.

  • You can use groups to define subexpressions. Enclose the groups in ( ), for example ([+-]?)([0-9]+). You can then ask the pattern matcher to return the match of each group or to refer back to a group with , where n is the group number (starting with 1).

For example, here is a somewhat complex but potentially useful regular expression—it describes decimal or hexadecimal integers:

[+-]?[0-9]+|0[Xx][0-9A-Fa-f]+

Unfortunately, the expression syntax is not completely standardized between the various programs and libraries that use regular expressions. While there is consensus on the basic constructs, there are many maddening differences in the details. The Java regular expression classes use a syntax that is similar to, but not quite the same as, the one used in the Perl language. Table 12–8 shows all constructs of the Java syntax. For more information on the regular expression syntax, consult the API documentation for the Pattern class or the book Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly and Associates, 1997).

Table 12–8. Regular Expression Syntax

Syntax

Explanation

Characters

c

The character c

unnnn, xnn, 0n, 0nn, 0nnn

The code unit with the given hex or octal value

t, n, r, f, a, e

The control characters tab, newline, return, form feed, alert, and escape

cc

The control character corresponding to the character c

Character Classes

[C1C2. . .]

Any of the characters represented by C1, C2, . . . The Ci are characters, character ranges (c1-c2), or character classes

[^. . .]

Complement of character class

[ . . . && . . .]

Intersection of two character classes

Predefined Character Classes

.

Any character except line terminators (or any character if the DOTALL flag is set)

d

A digit [0-9]

D

A nondigit [^0-9]

s

A whitespace character [ fx0B]

S

A non-whitespace character

w

A word character [a-zA-Z0-9_]

W

A nonword character

p{name}

A named character class—see Table 12–9

P{name}

The complement of a named character class

Boundary Matchers

^ $

Beginning, end of input (or beginning, end of line in multiline mode)



A word boundary

B

A nonword boundary

Syntax

Explanation

A

Beginning of input

z

End of input



End of input except final line terminator

G

End of previous match

Quantifiers

X?

Optional X

X*

X, 0 or more times

X+

X, 1 or more times

X{n} X{n,} X{n,m}

X n times, at least n times, between n and m times

Quantifier Suffixes

?

Turn default (greedy) match into reluctant match

+

Turn default (greedy) match into possessive match

Set Operations

XY

Any string from X, followed by any string from Y

X|Y

Any string from X or Y

Grouping

(X)

Capture the string matching X as a group

n

The match of the nth group

Escapes

c

The character c (must not be an alphabetic character)

Q . . . E

Quote . . . verbatim

(? . . . )

Special construct—see API notes of Pattern class

The simplest use for a regular expression is to test whether a particular string matches it. Here is how you program that test in Java. First construct a Pattern object from the string denoting the regular expression. Then get a Matcher object from the pattern, and call its matches method:

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) . . .

Table 12–9. Predefined Character Class Names

Lower

ASCII lower case [a-z]

Upper

ASCII upper case [A-Z]

Alpha

ASCII alphabetic [A-Za-z]

Digit

ASCII digits [0-9]

Alnum

ASCII alphabetic or digit [A-Za-z0-9]

Xdigit

Hex digits [0-9A-Fa-f]

Print or Graph

Printable ASCII character [x21-x7E]

Punct

ASCII non-alpha or digit [p{Print}&&P{Alnum}]

ASCII

All ASCII [x00-x7F]

Cntrl

ASCII Control character [x00-x1F]

Blank

Space or tab [ ]

Space

Whitespace [ fx0B]

javaLowerCase

Lower case, as determined by Character.isLowerCase()

javaUpperCase

Upper case, as determined by Character.isUpperCase()

javaWhitespace

Whitespace, as determined by Character.isWhitespace()

javaMirrored

Mirrored, as determined by Character.isMirrored()

InBlock

Block is the name of a Unicode character block, with spaces removed, such as BasicLatin or Mongolian. See http://www.unicode.org for a list of block names.

Category or InCategory

Category is the name of a Unicode character category such as L (letter) or Sc (currency symbol). See http://www.unicode.org for a list of category names.

The input of the matcher is an object of any class that implements the CharSequence interface, such as a String, StringBuilder, or CharBuffer.

When compiling the pattern, you can set one or more flags, for example,

Pattern pattern = Pattern.compile(patternString,
   Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CASE);

The following six flags are supported:

  • CASE_INSENSITIVEMatch characters independently of the letter case. By default, this flag takes only US ASCII characters into account.

  • UNICODE_CASEWhen used in combination with CASE_INSENSITIVE, use Unicode letter case for matching.

  • MULTILINE^ and $ match the beginning and end of a line, not the entire input.

  • UNIX_LINESOnly ' ' is recognized as a line terminator when matching ^ and $ in multiline mode.

  • DOTALLWhen using this flag, the . symbol matches all characters, including line terminators.

  • CANON_EQTakes canonical equivalence of Unicode characters into account. For example, u followed by ¨ (diaeresis) matches ü.

If the regular expression contains groups, then the Matcher object can reveal the group boundaries. The methods

int start(int groupIndex)
int end(int groupIndex)

yield the starting index and the past-the-end index of a particular group.

You can simply extract the matched string by calling

String group(int groupIndex)

Group 0 is the entire input; the group index for the first actual group is 1. Call the groupCount method to get the total group count.

Nested groups are ordered by the opening parentheses. For example, given the pattern

((1?[0-9]):([0-5][0-9]))[ap]m

and the input

11:59am

the matcher reports the following groups

Group Index

Start

End

String

0

0

7

11;59am

1

0

5

11:59

2

0

2

11

3

3

5

59

Example 12–9 prompts for a pattern, then for strings to match. It prints out whether or not the input matches the pattern. If the input matches and the pattern contains groups, then the program prints the group boundaries as parentheses, such as

((11):(59))am

Example 12–9. RegexTest.java

 1. import java.util.*;
 2. import java.util.regex.*;
 3.
 4. /**
 5.    This program tests regular expression matching.
 6.    Enter a pattern and strings to match, or hit Cancel
 7.    to exit. If the pattern contains groups, the group
 8.    boundaries are displayed in the match.
 9. */
10. public class RegExTest
11. {
12.   public static void main(String[] args)
13.   {
14.      Scanner in = new Scanner(System.in);
15.      System.out.println("Enter pattern: ");
16.      String patternString = in.nextLine();
17.
18.      Pattern pattern = null;
19.      try
20.      {
21.         pattern = Pattern.compile(patternString);
22.      }
23.      catch (PatternSyntaxException e)
24.      {
25.         System.out.println("Pattern syntax error");
26.         System.exit(1);
27.      }
28.
29.      while (true)
30.      {
31.         System.out.println("Enter string to match: ");
32.         String input = in.nextLine();
33.         if (input == null || input.equals("")) return;
34.         Matcher matcher = pattern.matcher(input);
35.         if (matcher.matches())
36.         {
37.            System.out.println("Match");
38.            int g = matcher.groupCount();
39.            if (g > 0)
40.            {
41.               for (int i = 0; i < input.length(); i++)
42.               {
43.                  for (int j = 1; j <= g; j++)
44.                     if (i == matcher.start(j))
45.                        System.out.print('('),
46.                  System.out.print(input.charAt(i));
47.                  for (int j = 1; j <= g; j++)
48.                     if (i + 1 == matcher.end(j))
49.                        System.out.print(')'),
50.               }
51.               System.out.println();
52.            }
53.         }
54.         else
55.            System.out.println("No match");
56.      }
57.   }
58. }

Usually, you don’t want to match the entire input against a regular expression, but you want to find one or more matching substrings in the input. Use the find method of the Matcher class to find the next match. If it returns true, use the start and end methods to find the extent of the match.

while (matcher.find())
{
   int start = matcher.start();
   int end = matcher.end();
   String match = input.substring(start, end);
   . . .
}

Example 12–10 puts this mechanism to work. It locates all hypertext references in a web page and prints them. To run the program, supply a URL on the command line, such as

java HrefMatch http://www.horstmann.com

Example 12–10. HrefMatch.java

 1. import java.io.*;
 2. import java.net.*;
 3. import java.util.regex.*;
 4.
 5. /**
 6.    This program displays all URLs in a web page by
 7.    matching a regular expression that describes the
 8.    <a href=...> HTML tag. Start the program as
 9.    java HrefMatch URL
10. */
11. public class HrefMatch
12. {
13.    public static void main(String[] args)
14.    {
15.       try
16.       {
17.          // get URL string from command line or use default
18.          String urlString;
19.          if (args.length > 0) urlString = args[0];
20.          else urlString = "http://java.sun.com";
21.
22.          // open reader for URL
23.          InputStreamReader in = new InputStreamReader(new URL(urlString).openStream());
24.
25.          // read contents into string buffer
26.          StringBuilder input = new StringBuilder();
27.          int ch;
28.          while ((ch = in.read()) != -1) input.append((char) ch);
29.
30.          // search for all occurrences of pattern
31.          String patternString = "<a\s+href\s*=\s*("[^"]*"|[^\s>])\s*>";
32.          Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
33.          Matcher matcher = pattern.matcher(input);
34.
35.          while (matcher.find())
36.          {
37.             int start = matcher.start();
38.             int end = matcher.end();
39.             String match = input.substring(start, end);
40.             System.out.println(match);
41.          }
42.       }
43.       catch (IOException e)
44.       {
45.          e.printStackTrace();
46.       }
47.       catch (PatternSyntaxException e)
48.       {
49.          e.printStackTrace();
50.       }
51.    }
52. }

The replaceAll method of the Matcher class replaces all occurrences of a regular expression with a replacement string. For example, the following instructions replace all sequences of digits with a # character.

Pattern pattern = Pattern.compile("[0-9]+");
Matcher matcher = pattern.matcher(input);
String output = matcher.replaceAll("#");

The replacement string can contain references to groups in the pattern: $n is replaced with the nth group. Use $ to include a $ character in the replacement text.

The replaceFirst method replaces only the first occurrence of the pattern.

Finally, the Pattern class has a split method that works like a string tokenizer on steroids. It splits an input into an array of strings, using the regular expression matches as boundaries. For example, the following instructions split the input into tokens, where the delimiters are punctuation marks surrounded by optional whitespace.

Pattern pattern = Pattern.compile("\s*\p{Punct}\s*");
String[] tokens = pattern.split(input);
HrefMatch.java
java.util.regex.Pattern 1.4
  • static Pattern compile(String expression)

  • static Pattern compile(String expression, int flags)

    compile the regular expression string into a pattern object for fast processing of matches.

    Parameters:

    expression

    The regular expression

     

    flags

    One or more of the flags CASE_INSENSITIVE, UNICODE_CASE, MULTILINE, UNIX_LINES, DOTALL, and CANON_EQ

  • Matcher matcher(CharSequence input)

    returns a matcher object that you can use to locate the matches of the pattern in the input.

  • String[] split(CharSequence input)

  • String[] split(CharSequence input, int limit)

    split the input string into tokens, where the pattern specifies the form of the delimiters. Return an array of tokens. The delimiters are not part of the tokens.

    Parameters:

    input

    The string to be split into tokens

     

    limit

    The maximum number of strings to produce. If limit - 1 matching delimiters have been found, then the last entry of the returned array contains the remaining unsplit input. If limit is ≤ 0, then the entire input is split. If limit is 0, then trailing empty strings are not placed in the returned array

HrefMatch.java
java.util.regex.Matcher 1.4
  • boolean matches()

    returns true if the input matches the pattern.

  • boolean lookingAt()

    returns true if the beginning of the input matches the pattern.

  • boolean find()

  • boolean find(int start)

    attempt to find the next match and return true if another match is found.

    Parameters:

    start

    The index at which to start searching

  • int start()

  • int end()

    return the start and past-the-end position of the current match.

  • String group()

    returns the current match.

  • int groupCount()

    returns the number of groups in the input pattern.

  • int start(int groupIndex)

  • int end(int groupIndex)

    return the start and past-the-end position of a given group in the current match.

    Parameters:

    groupIndex

    The group index (starting with 1), or 0 to indicate the entire match

  • String group(int groupIndex)

    returns the string matching a given group.

    Parameters:

    groupIndex

    The group index (starting with 1), or 0 to indicate the entire match

  • String replaceAll(String replacement)

  • String replaceFirst(String replacement)

    return a string obtained from the matcher input by replacing all matches, or the first match, with the replacement string.

    Parameters:

    replacement

    The replacement string. It can contain references to a pattern group as $n. Use $ to include a $ symbol

  • Matcher reset()

  • Matcher reset(CharSequence input)

    reset the matcher state. The second method makes the matcher work on a different input. Both methods return this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset