In this chapter, we cover the methods for handling files and directories as well as the methods for actually writing and reading data. This chapter also shows you the object serialization mechanism that lets you store objects as easily as you can store text or numeric data. Next, we turn to several improvements that were made in the “new I/O” package java.nio
, introduced in JDK 1.4 We finish the chapter with a discussion of regular expressions, even though they are not actually related to streams and files. We couldn’t find a better place to handle that topic, and apparently neither could the Java team—the regular expression API specification was attached to the specification request for the “new I/O” features of JDK 1.4.
Input/output techniques are not particularly exciting, but without the ability to read and write data, your programs are severely limited. This chapter is about how to get input from any source of data that can send out a sequence of bytes and how to send output to any destination that can receive a sequence of bytes. These sources and destinations of byte sequences can be—and often are—files, but they can also be network connections and even blocks of memory. There is a nice payback to keeping this generality in mind: for example, information stored in files and information retrieved from a network connection are handled in essentially the same way. (See Volume 2 for more information about programming with networks.) Of course, while data are always ultimately stored as a sequence of bytes, it is often more convenient to think of data as having some higher-level structure such as being a sequence of characters or objects. For that reason, we dispense with low-level input/output quickly and focus on higher-level facilities for the majority of the chapter.
In the Java programming language, an object from which we can read a sequence of bytes is called an input stream. An object to which we can write a sequence of bytes is called an output stream. These are specified in the abstract classes InputStream
and OutputStream
. Because byte-oriented streams are inconvenient for processing information stored in Unicode (recall that Unicode uses two bytes per code unit), there is a separate hierarchy of classes for processing Unicode characters that inherit from the abstract Reader
and Writer
classes. These classes have read and write operations that are based on two-byte Unicode code units rather than on single-byte characters.
You saw abstract classes in Chapter 5. Recall that the point of an abstract class is to provide a mechanism for factoring out the common behavior of classes to a higher level. This leads to cleaner code and makes the inheritance tree easier to understand. The same game is at work with input and output in the Java programming language.
As you will soon see, Java derives from these four abstract classes a zoo of concrete classes. You can visit almost any conceivable input/output creature in this zoo.
The InputStream
class has an abstract method:
This method reads one byte and returns the byte that was read, or –1 if it encounters the end of the input source. The designer of a concrete input stream class overrides this method to provide useful functionality. For example, in the FileInputStream
class, this method reads one byte from a file. System.in
is a predefined object of a subclass of InputStream
that allows you to read information from the keyboard.
The InputStream
class also has nonabstract methods to read an array of bytes or to skip a number of bytes. These methods call the abstract read
method, so subclasses need to override only one method.
Similarly, the OutputStream
class defines the abstract method
abstract void write(int b)
which writes one byte to an output location.
Both the read
and write
methods can block a thread until the byte is actually read or written. This means that if the stream cannot immediately be read from or written to (usually because of a busy network connection), Java suspends the thread containing this call. This gives other threads the chance to do useful work while the method is waiting for the stream to again become available. (We discuss threads in depth in Volume 2.)
The available
method lets you check the number of bytes that are currently available for reading. This means a fragment like the following is unlikely to ever block:
int bytesAvailable = in.available(); if (bytesAvailable > 0) { byte[] data = new byte[bytesAvailable]; in.read(data); }
When you have finished reading or writing to a stream, close it by calling the close
method. This call frees up operating system resources that are in limited supply. If an application opens too many streams without closing them, system resources may become depleted. Closing an output stream also flushes the buffer used for the output stream: any characters that were temporarily placed in a buffer so that they could be delivered as a larger packet are sent off. In particular, if you do not close a file, the last packet of bytes may never be delivered. You can also manually flush the output with the flush
method.
Even if a stream class provides concrete methods to work with the raw read
and write
functions, Java programmers rarely use them because programs rarely need to read and write streams of bytes. The data that you are interested in probably contain numbers, strings, and objects.
Java gives you many stream classes derived from the basic InputStream
and OutputStream
classes that let you work with data in the forms that you usually use rather than at the low byte-level.
java.io.InputStream 1.0
abstract int read()
reads a byte of data and returns the byte read. The read
method returns a –1 at the end of the stream.
int read(byte[] b)
reads into an array of bytes and returns the actual number of bytes read, or –1 at the end of the stream. The read
method reads at most b.length
bytes.
int read(byte[] b, int off, int len)
reads into an array of bytes. The read
method returns the actual number of bytes read, or –1 at the end of the stream.
Parameters: |
| The array into which the data is read |
| The offset into b where the first bytes should be placed | |
| The maximum number of bytes to read |
long skip(long n)
skips n
bytes in the input stream. Returns the actual number of bytes skipped (which may be less than n
if the end of the stream was encountered).
int available()
returns the number of bytes available without blocking. (Recall that blocking means that the current thread loses its turn.)
void close()
closes the input stream.
void mark(int readlimit)
puts a marker at the current position in the input stream. (Not all streams support this feature.) If more than readlimit
bytes have been read from the input stream, then the stream is allowed to forget the marker.
void reset()
returns to the last marker. Subsequent calls to read
reread the bytes. If there is no current marker, then the stream is not reset.
boolean markSupported()
returns true
if the stream supports marking.
abstract void write(int n)
writes a byte of data.
void write(byte[] b)
writes all bytes in the array b
.
void write(byte[] b, int off, int len)
writes a range of bytes in the array b
.
Parameters: |
| The array from which to write the data |
| The offset into b to the first byte that will be written | |
| The number of bytes to write |
void close()
flushes and closes the output stream.
void flush()
flushes the output stream, that is, sends any buffered data to its destination.
Unlike C, which gets by just fine with a single type FILE*
, Java has a whole zoo of more than 60 (!) different stream types (see Figures 12–1 and 12–2). Library designers claim that there is a good reason to give users a wide choice of stream types: it is supposed to reduce programming errors. For example, in C, some people think it is a common mistake to send output to a file that was open only for reading. (Well, it is not actually that common.) Naturally, if you do this, the output is ignored at run time. In Java and C++, the compiler catches that kind of mistake because an InputStream
(Java) or istream
(C++) has no methods for output.
(We would argue that in C++, and even more so in Java, the main tool that the stream interface designers have against programming errors is intimidation. The sheer complexity of the stream libraries keeps programmers on their toes.)
ANSI C++ gives you more stream types than you want, such as istream
, ostream
, iostream
, ifstream
, ofstream
, fstream
, wistream
, wifstream
, istrstream
, and so on (18 classes in all). But Java really goes overboard with streams and gives you separate classes for selecting buffering, lookahead, random access, text formatting, and binary data.
Let us divide the animals in the stream class zoo by how they are used. Four abstract classes are at the base of the zoo: InputStream
, OutputStream
, Reader
, and Writer
. You do not make objects of these types, but other methods can return them. For example, as you saw in Chapter 10, the URL
class has the method openStream
that returns an InputStream
. You then use this InputStream
object to read from the URL. As we said, the InputStream
and OutputStream
classes let you read and write only individual bytes and arrays of bytes; they have no methods to read and write strings and numbers. You need more capable child classes for this. For example, DataInputStream
and DataOutputStream
let you read and write all the basic Java types.
For Unicode text, on the other hand, as we said, you use classes that descend from Reader
and Writer
. The basic methods of the Reader
and Writer
classes are similar to the ones for InputStream
and OutputStream
.
abstract int read() abstract void write(int b)
They work just as the comparable methods do in the InputStream
and OutputStream
classes except, of course, the read
method returns either a Unicode code unit (as an integer between 0 and 65535) or –1 when you have reached the end of the file.
Finally, there are streams that do useful stuff, for example, the ZipInputStream
and ZipOutputStream
that let you read and write files in the familiar ZIP compression format.
Moreover, JDK 5.0 introduces four new interfaces: Closeable
, Flushable
, Readable
, and Appendable
(see Figure 12–3). The first two interfaces are very simple, with methods
void close() throws IOException
and
void flush()
respectively. The classes InputStream
, OutputStream
, Reader
, and Writer
all implement the Closeable
interface. OutputStream
and Writer
implement the Flushable
interface.
The Readable
interface has a single method
int read(CharBuffer cb)
The CharBuffer
class has methods for sequential and random read/write access. It represents an in-memory buffer or a memory-mapped file (see page 694).
The Appendable
interface has two methods, for appending single characters and character sequences:
Appendable append(char c) Appendable append(CharSequence s)
The CharSequence
type is yet another interface, describing minimal properties of a sequence of char
values. It is implemented by String
, CharBuffer
, and StringBuilder
/StringBuffer
(see page 656).
Of the stream zoo classes, only Writer
implements Appendable
.
void close()
closes this Closeable
. This method may throw an IOException
.
void flush()
flushes this Flushable
.
int read(CharBuffer cb)
attempts to read as many char
values into cb
as it can hold. Returns the number of values read, or -1 if no further values are available from this Readable
.
Appendable append(char c)
appends the code unit c
to this Appendable
; returns this
.
Appendable append(CharSequence cs)
appends all code units in cs
to this Appendable
; returns this
.
char charAt(int index)
returns the code unit at the given index.
int length()
returns the number of code units in this sequence.
CharSequence subSequence(int startIndex, int endIndex)
returns a CharSequence
consisting of the code units stored at index startIndex
to endIndex - 1
.
String toString()
returns a string consisting of the code units of this sequence.
FileInputStream
and FileOutputStream
give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example,
FileInputStream fin = new FileInputStream("employee.dat");
looks in the current directory for a file named "employee.dat"
.
Because the backslash character is the escape character in Java strings, be sure to use \
for Windows-style path names ("C:\Windows\win.ini"
). In Windows, you can also use a single forward slash ("C:/Windows/win.ini"
) because most Windows file handling system calls will interpret forward slashes as file separators. However, this is not recommended—the behavior of the Windows system functions is subject to change, and on other operating systems, the file separator may yet be different. Instead, for portable programs, you should use the correct file separator character. It is stored in the constant string File.separator
.
You can also use a File
object (see page 684 for more on file objects):
File f = new File("employee.dat"); FileInputStream fin = new FileInputStream(f);
Like the abstract InputStream
and OutputStream
classes, these classes support only reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin
.
byte b = (byte) fin.read();
Because all the classes in java.io
interpret relative path names as starting with the user’s current working directory, you may want to know this directory. You can get at this information by a call to System.getProperty("user.dir")
.
As you will see in the next section, if we just had a DataInputStream
, then we could read numeric types:
DataInputStream din = . . .; double s = din.readDouble();
But just as the FileInputStream
has no methods to read numeric types, the DataInputStream
has no method to get data from a file.
Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream
and the input stream returned by the openStream
method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream
and the PrintWriter
) can assemble bytes into more useful data types. The Java programmer has to combine the two into what are often called filtered streams by feeding an existing stream to the constructor of another stream. For example, to be able to read numbers from a file, first create a FileInputStream
and then pass it to the constructor of a DataInputStream
.
FileInputStream fin = new FileInputStream("employee.dat"); DataInputStream din = new DataInputStream(fin); double s = din.readDouble();
It is important to keep in mind that the data input stream that we created with the above code does not correspond to a new disk file. The newly created stream still accesses the data from the file attached to the file input stream, but the point is that it now has a more capable interface.
If you look at Figure 12–1 again, you can see the classes FilterInputStream
and FilterOutputStream
. You combine their subclasses into a new filtered stream to construct the streams you want. For example, by default, streams are not buffered. That is, every call to read
contacts the operating system to ask it to dole out yet another byte. If you want buffering and the data input methods for a file named employee.dat
in the current directory, you need to use the following rather monstrous sequence of constructors:
DataInputStream din = new DataInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Notice that we put the DataInputStream
last in the chain of constructors because we want to use the DataInputStream
methods, and we want them to use the buffered read
method. Regardless of the ugliness of the above code, it is necessary: you must be prepared to continue layering stream constructors until you have access to the functionality you want.
Sometimes you’ll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream
for this purpose.
PushbackInputStream pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Now you can speculatively read the next byte
int b = pbin.read();
and throw it back if it isn’t what you wanted.
if (b != '<') pbin.unread(b);
But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference.
DataInputStream din = new DataInputStream( pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat"))));
Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to layering stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 12–4).
ZipInputStream zin = new ZipInputStream(new FileInputStream("employee.zip")); DataInputStream din = new DataInputStream(zin);
(See the section on ZIP file streams starting on page 643 for more on Java’s ability to handle ZIP files.)
All in all, apart from the rather monstrous constructors that are needed to layer streams, the ability to mix and match streams is a very useful feature of Java!
FileInputStream(String name)
creates a new file input stream, using the file whose path name is specified by the name
string.
FileInputStream(File f)
creates a new file input stream, using the information encapsulated in the File
object. (The File
class is described at the end of this chapter.)
java.io.FileOutputStream 1.0
FileOutputStream(String name)
creates a new file output stream specified by the name
string. Path names that are not absolute are resolved relative to the current working directory. Caution: This method automatically deletes any existing file with the same name.
FileOutputStream(String name, boolean append)
creates a new file output stream specified by the name
string. Path names that are not absolute are resolved relative to the current working directory. If the append
parameter is true
, then data are added at the end of the file. An existing file with the same name will not be deleted.
FileOutputStream(File f)
creates a new file output stream using the information encapsulated in the File
object. (The File
class is described at the end of this chapter.) Caution: This method automatically deletes any existing file with the same name as the name of f
.
java.io.BufferedInputStream 1.0
BufferedInputStream(InputStream in)
creates a new buffered stream with a default buffer size. A buffered input stream reads characters from a stream without causing a device access every time. When the buffer is empty, a new block of data is read into the buffer.
BufferedInputStream(InputStream in, int n)
creates a new buffered stream with a user-defined buffer size.
BufferedOutputStream(OutputStream out)
creates a new buffered stream with a default buffer size. A buffered output stream collects characters to be written without causing a device access every time. When the buffer fills up or when the stream is flushed, the data are written.
BufferedOutputStream(OutputStream out, int n)
creates a new buffered stream with a user-defined buffer size.
PushbackInputStream(InputStream in)
constructs a stream with one-byte lookahead.
PushbackInputStream(InputStream in, int size)
constructs a stream with a pushback buffer of specified size.
void unread(int b)
pushes back a byte, which is retrieved again by the next call to read. You can push back only one byte at a time.
Parameters: |
| The byte to be read again |
You often need to write the result of a computation or read one back. The data streams support methods for reading back all the basic Java types. To write a number, character, Boolean value, or string, use one of the following methods of the DataOutput
interface:
writeChars writeByte writeInt writeShort writeLong writeFloat writeDouble writeChar writeBoolean writeUTF
For example, writeInt
always writes an integer as a 4-byte binary quantity regardless of the number of digits, and writeDouble
always writes a double
as an 8-byte binary quantity. The resulting output is not humanly readable, but the space needed will be the same for each value of a given type and reading it back in will be faster. (See the section on the PrintWriter
class later in this chapter for how to output numbers as human-readable text.)
There are two different methods of storing integers and floating-point numbers in memory, depending on the platform you are using. Suppose, for example, you are working with a 4-byte int
, say the decimal number 1234, or 4D2 in hexadecimal (1234 = 4 x 256 + 13 x 16 + 2). This can be stored in such a way that the first of the 4 bytes in memory holds the most significant byte (MSB) of the value: 00 00 04 D2
. This is the so-called big-endian method. Or we can start with the least significant byte (LSB) first: D2 04 00 00
. This is called, naturally enough, the little-endian method. For example, the SPARC uses big-endian; the Pentium, little-endian. This can lead to problems. When a C or C++ file is saved, the data are saved exactly as the processor stores them. That makes it challenging to move even the simplest data files from one platform to another. In Java, all values are written in the big-endian fashion, regardless of the processor. That makes Java data files platform independent.
The writeUTF
method writes string data by using a modified version of 8-bit Unicode Transformation Format. Instead of simply using the standard UTF-8 encoding (which is shown in Table 12–1), character strings are first represented in UTF-16 (see Table 12–2) and then the result is encoded using the UTF-8 rules. The modified encoding is different for characters with code higher than 0xFFFF.
It is used for backwards compatibility with virtual machines that were built when Unicode had not yet grown beyond 16 bits.
Table 12–2. UTF-16 Encoding
Character Range | Encoding |
---|---|
0...FFFF | a15a14a13a12a11a10a9a8 a7a6a5a4a3a2a1a0 |
10000...10FFFF | 110110b19b18 b17a16a15a14a13a12a11a10 110111a9a8 a7a6a5a4a3a2a1a0 where b19b18b17b16 = a20a19a18a17a16 -1 |
Because nobody else uses this modification of UTF-8, you should only use the writeUTF
method to write strings that are intended for a Java virtual machine; for example, if you write a program that generates bytecodes. Use the writeChars
method for other purposes.
See RFC 2279 (http://ietf.org/rfc/rfc2279.txt) and RFC 2781 (http://ietf.org/rfc/rfc2781.txt) for definitions of UTF-8 and UTF-16.
To read the data back in, use the following methods:
|
|
|
|
|
|
|
|
The binary data format is compact and platform independent. Except for the UTF strings, it is also suited to random access. The major drawback is that binary files are not readable by humans.
boolean readBoolean()
reads in a Boolean value.
byte readByte()
reads an 8-bit byte.
char readChar()
reads a 16-bit Unicode character.
double readDouble()
reads a 64-bit double.
float readFloat()
reads a 32-bit float.
void readFully(byte[] b)
reads bytes into the array b
, blocking until all bytes are read.
Parameters: |
| The buffer into which the data is read |
void readFully(byte[] b, int off, int len)
reads bytes into the array b
, blocking until all bytes are read.
Parameters: |
| The buffer into which the data is read |
| The start offset of the data | |
| The maximum number of bytes to read |
int readInt()
reads a 32-bit integer.
String readLine()
reads in a line that has been terminated by a
,
,
, or EOF
. Returns a string containing all bytes in the line converted to Unicode characters.
long readLong()
reads a 64-bit long integer.
short readShort()
reads a 16-bit short integer.
String readUTF()
reads a string of characters in “modified UTF-8” format.
int skipBytes(int n)
skips n
bytes, blocking until all bytes are skipped.
Parameters: |
| The number of bytes to be skipped |
void writeBoolean(boolean b)
writes a Boolean value.
void writeByte(int b)
writes an 8-bit byte.
void writeChar(int c)
writes a 16-bit Unicode character.
void writeChars(String s)
writes all characters in the string.
void writeDouble(double d)
writes a 64-bit double.
void writeFloat(float f)
writes a 32-bit float.
void writeInt(int i)
writes a 32-bit integer.
void writeLong(long l)
writes a 64-bit long integer.
void writeShort(int s)
writes a 16-bit short integer.
void writeUTF(String s)
writes a string of characters in “modified UTF-8” format.
The RandomAccessFile
stream class lets you find or write data anywhere in a file. It implements both the DataInput
and DataOutput
interfaces. Disk files are random access, but streams of data from a network are not. You open a random-access file either for reading only or for both reading and writing. You specify the option by using the string "r"
(for read access) or "rw"
(for read/write access) as the second argument in the constructor.
RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); RandomAccessFile inOut = new RandomAccessFile("employee.dat", "rw");
When you open an existing file as a RandomAccessFile
, it does not get deleted.
A random-access file also has a file pointer setting that comes with it. The file pointer always indicates the position of the next record that will be read or written. The seek
method sets the file pointer to an arbitrary byte position within the file. The argument to seek
is a long
integer between zero and the length of the file in bytes.
The getFilePointer
method returns the current position of the file pointer.
To read from a random-access file, you use the same methods—such as readInt
and readChar
—as for DataInputStream
objects. That is no accident. These methods are actually defined in the DataInput
interface that both DataInputStream
and RandomAccessFile
implement.
Similarly, to write a random-access file, you use the same writeInt
and writeChar
methods as in the DataOutputStream
class. These methods are defined in the DataOutput
interface that is common to both classes.
The advantage of having the RandomAccessFile
class implement both DataInput
and DataOutput
is that this lets you use or write methods whose argument types are the DataInput
and DataOutput
interfaces.
class Employee { . . . read(DataInput in) { . . . } write(DataOutput out) { . . . } }
Note that the read
method can handle either a DataInputStream
or a RandomAccessFile
object because both of these classes implement the DataInput
interface. The same is true for the write
method.
RandomAccessFile(String file, String mode)
RandomAccessFile(File file, String mode)
Parameters: |
| The file to be opened |
| "r" for read-only mode, "rw" for read/write mode, "rws" for read/write mode with synchronous disk writes of data and metadata for every update, and "rwd" for read/write mode with synchronous disk writes of data only. |
long getFilePointer()
returns the current location of the file pointer.
void seek(long pos)
sets the file pointer to pos
bytes from the beginning of the file.
long length()
returns the length of the file in bytes.
In the last section, we discussed binary input and output. While binary I/O is fast and efficient, it is not easily readable by humans. In this section, we will focus on text I/O. For example, if the integer 1234 is saved in binary, it is written as the sequence of bytes 00 00 04 D2
(in hexadecimal notation). In text format, it is saved as the string "1234"
.
Unfortunately, doing this in Java requires a bit of work, because, as you know, Java uses Unicode characters. That is, the character encoding for the string "1234"
really is 00 31 00 32 00 33 00 34
(in hex). However, at the present time most environments in which your Java programs will run use their own character encoding. This may be a single-byte, double-byte, or variable-byte scheme. For example, if you use Windows, the string would be written in ASCII, as 31 32 33 34
, without the extra zero bytes. If the Unicode encoding were written into a text file, then it would be quite unlikely that the resulting file would be humanly readable with the tools of the host environment. To overcome this problem, Java has a set of stream filters that bridges the gap between Unicode-encoded strings and the character encoding used by the local operating system. All of these classes descend from the abstract Reader
and Writer
classes, and the names are reminiscent of the ones used for binary data. For example, the InputStreamReader
class turns an input stream that contains bytes in a particular character encoding into a reader that emits Unicode characters. Similarly, the OutputStreamWriter
class turns a stream of Unicode characters into a stream of bytes in a particular character encoding.
For example, here is how you make an input reader that reads keystrokes from the console and automatically converts them to Unicode.
InputStreamReader in = new InputStreamReader(System.in);
This input stream reader assumes the normal character encoding used by the host system. For example, under Windows, it uses the ISO 8859-1 encoding (also known as ISO Latin-1 or, among Windows programmers, as “ANSI code”). You can choose a different encoding by specifying it in the constructor for the InputStreamReader
. This takes the form
InputStreamReader(InputStream, String)
where the string describes the encoding scheme that you want to use. For example,
InputStreamReader in = new InputStreamReader( new FileInputStream("kremlin.dat"), "ISO8859_5");
The next section has more information on character sets.
Because it is so common to want to attach a reader or writer to a file, a pair of convenience classes, FileReader
and FileWriter
, is provided for this purpose. For example, the writer definition
FileWriter out = new FileWriter("output.txt");
is equivalent to
FileWriter out = new FileWriter(new FileOutputStream("output.txt"));
In the past, international character sets have been handled rather unsystematically throughout the Java library. The java.nio
package—introduced in JDK 1.4—unifies character set conversion with the introduction of the Charset
class. (Note that the s
is lower case.)
A character set maps between sequences of two-byte Unicode code units and byte sequences used in a local character encoding. One of the most popular character encodings is ISO-8859-1, a single-byte encoding of the first 256 Unicode characters. Gaining in importance is ISO-8859-15, which replaces some of the less useful characters of ISO-8859-1 with accented letters used in French and Finnish, and, more important, replaces the “international currency” character with the Euro symbol () in code point 0xA4
. Other examples for character encodings are the variable-byte encodings commonly used for Japanese and Chinese.
The Charset
class uses the character set names standardized in the IANA Character Set Registry (http://www.iana.org/assignments/character-sets). These names differ slightly from those used in previous versions. For example, the “official” name of ISO-8859-1 is now "ISO-8859-1"
and no longer "ISO8859_1"
, which was the preferred name up to JDK 1.3. For compatibility with other naming conventions, each character set can have a number of aliases. For example, ISO-8859-1 has aliases
ISO8859-1 ISO_8859_1 ISO8859_1 ISO_8859-1 ISO_8859-1:1987 8859_1 latin1 l1 csISOLatin1 iso-ir-100 cp819 IBM819 IBM-819 819
Character set names are case insensitive.
You obtain a Charset
by calling the static forName
method with either the official name or one of its aliases:
Charset cset = Charset.forName("ISO-8859-1");
The aliases
method returns a Set
object of the aliases. A Set
is a collection that we discuss in Volume 2; here is the code to iterate through the set elements:
Set<String> aliases = cset.aliases(); for (String alias : aliases) System.out.println(alias);
An excellent reference for the “ISO 8859 alphabet soup” is http://czyborra.com/charsets/iso8859.html.
International versions of Java support many more encodings. There is even a mechanism for adding additional character set providers—see the JDK documentation for details. To find out which character sets are available in a particular implementation, call the static availableCharsets
method. It returns a SortedMap
, another collection class. Use this code to find out the names of all available character sets:
Set<String, Charset> charsets = Charset.availableCharsets(); for (String name : charsets.keySet()) System.out.println(name);
Table 12–3 lists the character encodings that every Java implementation is required to have. Table 12–4 lists the encoding schemes that the JDK installs by default. The character sets in Tables 12–5 and 12–6 are installed only on operating systems that use non-European languages. The encoding schemes in Table 12–6 are supplied for compatibility with previous versions of the JDK.
Table 12–3. Required Character Encodings
Charset Standard Name | Legacy Name | Description |
---|---|---|
|
| American Standard Code for Information Exchange |
|
| ISO 8859-1, Latin alphabet No. 1 |
|
| Eight-bit Unicode Transformation Format |
|
| Sixteen-bit Unicode Transformation Format, byte order specified by an optional initial byte-order mark |
|
| Sixteen-bit Unicode Transformation Format, big-endian byte order |
|
| Sixteen-bit Unicode Transformation Format, little-endian byte order |
Table 12–4. Basic Character Encodings
Charset Standard Name | Legacy Name | Description |
---|---|---|
|
| ISO 8859-2, Latin alphabet No. 2 |
|
| ISO 8859-4, Latin alphabet No. 4 |
|
| ISO 8859-5, Latin/Cyrillic alphabet |
|
| ISO 8859-7, Latin/Greek alphabet |
|
| ISO 8859-9, Latin alphabet No. 5 |
|
| ISO 8859-13, Latin alphabet No. 7 |
|
| ISO 8859-15, Latin alphabet No. 9 |
|
| Windows Eastern European |
|
| Windows Cyrillic |
|
| |
|
| Windows Greek |
|
| |
|
| Windows Baltic |
Table 12–5. Extended Character Encodings
Charset Standard Name | Legacy Name | Description |
---|---|---|
|
| Big5, Traditional Chinese |
|
| Big5 with Hong Kong extensions, Traditional Chinese |
|
| JIS X 0201, 0208, 0212, EUC encoding, Japanese |
|
| |
|
| |
|
| GBK, Simplified Chinese |
|
| ISCII91 encoding of Indic scripts |
|
| JIS X 0201, 0208 in ISO 2022 form, Japanese |
|
| ISO 2022 KR, Korean |
|
| ISO 8859-3, Latin alphabet No. 3 |
|
| ISO 8859-6, Latin/Arabic alphabet |
|
| ISO 8859-8, Latin/Hebrew alphabet |
|
| |
|
| TIS620, Thai |
|
| |
|
| Windows Arabic |
|
| Windows Vietnamese |
|
| Windows Japanese |
|
| GB2312, EUC encoding, Simplified Chinese |
|
| JIS X 0201, 0208, EUC encoding, Japanese |
|
| CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese |
|
| Windows Traditional Chinese with Hong Kong extensions |
|
| Windows Simplified Chinese |
|
| Windows Korean |
|
| Windows Traditional Chinese |
Table 12–6. Legacy Character Encodings
Legacy Name | Description |
| USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia |
| |
| IBM Denmark, Norway |
| IBM Finland, Sweden |
| IBM Italy |
| IBM Catalan/Spain, Spanish Latin America |
| IBM United Kingdom, Ireland |
| IBM France |
| IBM Arabic |
| IBM Hebrew |
| MS-DOS United States, Australia, New Zealand, South Africa |
| EBCDIC 500V1 |
| PC Greek |
| PC Baltic |
| IBM Thailand extended SBCS |
| MS-DOS Latin-1 |
| MS-DOS Latin-2 |
| IBM Cyrillic |
| IBM Hebrew |
| IBM Turkish |
| Variant of Cp850 with Euro character |
| MS-DOS Portuguese |
| MS-DOS Icelandic |
| PC Hebrew |
| MS-DOS Canadian French |
| PC Arabic |
| MS-DOS Nordic |
| MS-DOS Russian |
| MS-DOS Pakistan |
| IBM Modern Greek |
| IBM Multilingual Latin-2 |
| IBM Iceland |
| IBM Thai |
| IBM Greek |
| IBM Pakistan (Urdu) |
| IBM Latvia, Lithuania (AIX, DOS) |
| IBM Estonia (AIX, DOS) |
| Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026 |
| Korean Mixed with 1880 UDC, superset of 5029 |
| Simplified Chinese Host mixed with 1880 UDC, superset of 5031 |
| Traditional Chinese Host mixed with 6204 UDC, superset of 5033 |
| Japanese Latin Kanji mixed with 4370 UDC, superset of 5035 |
| IBM OS/2 Japanese, superset of |
| Variant of |
| IBM OS/2 Japanese, superset of |
| Variant of |
| OS/2 Chinese (Taiwan) superset of 938 |
| PC Korean |
| Variant of |
| PC Chinese (Hong Kong, Taiwan) |
| AIX Chinese (Taiwan) |
| AIX Korean |
| IBM AIX Pakistan (Urdu) |
| IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovina, Macedonia (FYR) |
| IBM Latin-5, Turkey |
| IBM Arabic - Windows |
| IBM Iran (Farsi)/Persian |
| IBM Iran (Farsi)/Persian (PC) |
| IBM Latvia, Lithuania |
| IBM Estonia |
| IBM Ukraine |
| IBM AIX Ukraine |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| Variant of |
| IBM OS/2, DOS People’s Republic of China (PRC) |
| IBM AIX People’s Republic of China (PRC) |
| IBM-eucJP - Japanese (superset of 5050) |
| ISO 2022 CN, Chinese (conversion to Unicode only) |
| CNS 11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only) |
| GB 2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only) |
| JIS X 0201, Japanese |
| JIS X 0208, Japanese |
| JIS X 0212, Japanese |
| Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only) |
| Johab, Korean |
| Windows Thai |
| Macintosh Arabic |
| Macintosh Latin-2 |
| Macintosh Croatian |
| Macintosh Cyrillic |
| Macintosh Dingbat |
| Macintosh Greek |
| Macintosh Hebrew |
| Macintosh Iceland |
| Macintosh Roman |
| Macintosh Romania |
| Macintosh Symbol |
| Macintosh Thai |
| Macintosh Turkish |
| Macintosh Ukraine |
Local encoding schemes cannot represent all Unicode characters. If a character cannot be represented, it is transformed to a ?
.
Once you have a character set, you can use it to convert between Unicode strings and encoded byte sequences. Here is how you encode a Unicode string.
String str = . . .; ByteBuffer buffer = cset.encode(str); byte[] bytes = buffer.array();
Conversely, to decode a byte sequence, you need a byte buffer. Use the static wrap
method of the ByteBuffer
array to turn a byte array into a byte buffer. The result of the decode
method is a CharBuffer
. Call its toString
method to get a string.
byte[] bytes = . . .; ByteBuffer bbuf = ByteBuffer.wrap(bytes, offset, length); CharBuffer cbuf = cset.decode(bbuf); String str = cbuf.toString();
static SortedMap availableCharsets()
gets all available character sets for this virtual machine. Returns a map whose keys are character set names and whose values are character sets.
static Charset forName(String name)
gets a character set for the given name.
Set aliases()
returns the set of alias names for this character set.
ByteBuffer encode(String str)
encodes the given string into a sequence of bytes.
CharBuffer decode(ByteBuffer buffer)
decodes the given character sequence. Unrecognized inputs are converted to the Unicode “replacement character” ('uFFFD')
.
byte[] array()
returns the array of bytes that this buffer manages.
static ByteBuffer wrap(byte[] bytes)
static ByteBuffer wrap(byte[] bytes, int offset, int length)
return a byte buffer that manages the given array of bytes or the given range.
char[] array()
returns the array of code units that this buffer manages.
char charAt(int index)
returns the code unit at the given index.
String toString()
returns a string consisting of the code units that this buffer manages
For text output, you want to use a PrintWriter
. A print writer can print strings and numbers in text format. Just as a DataOutputStream
has useful output methods but no destination, a PrintWriter
must be combined with a destination writer.
PrintWriter out = new PrintWriter(new FileWriter("employee.txt"));
You can also combine a print writer with a destination (output) stream.
PrintWriter out = new PrintWriter(new FileOutputStream("employee.txt"));
The PrintWriter(OutputStream)
constructor automatically adds an OutputStreamWriter
to convert Unicode characters to bytes in the stream.
To write to a print writer, you use the same print
and println
methods that you used with System.out
. You can use these methods to print numbers (int
, short
, long
, float
, double
), characters, Boolean values, strings, and objects.
Java veterans may wonder whatever happened to the PrintStream
class and to System.out
. In Java 1.0, the PrintStream
class simply truncated all Unicode characters to ASCII characters by dropping the top byte. Conversely, the readLine
method of the DataInputStream
turned ASCII to Unicode by setting the top byte to 0. Clearly, that was not a clean or portable approach, and it was fixed with the introduction of readers and writers in Java 1.1. For compatibility with existing code, System.in
, System.out
, and System.err
are still streams, not readers and writers. But now the PrintStream
class internally converts Unicode characters to the default host encoding in the same way as the PrintWriter
does. Objects of type PrintStream
act exactly like print writers when you use the print
and println
methods, but unlike print writers, they allow you to send raw bytes to them with the write(int)
and write(byte[])
methods.
For example, consider this code:
String name = "Harry Hacker"; double salary = 75000; out.print(name); out.print(' '), out.println(salary);
This writes the characters
Harry Hacker 75000
to the stream out.
The characters are then converted to bytes and end up in the file employee.txt
.
The println
method automatically adds the correct end-of-line character for the target system ("
"
on Windows, "
"
on UNIX, "
"
on Macs) to the line. This is the string obtained by the call System.getProperty("line.separator")
.
If the writer is set to autoflush mode, then all characters in the buffer are sent to their destination whenever println
is called. (Print writers are always buffered.) By default, autoflushing is not enabled. You can enable or disable autoflushing by using the PrintWriter(Writer, boolean)
constructor and passing the appropriate Boolean as the second argument.
PrintWriter out = new PrintWriter(new FileWriter("employee.txt"), true); // autoflush
The print
methods don’t throw exceptions. You can call the checkError
method to see if something went wrong with the stream.
PrintWriter(Writer out)
creates a new PrintWriter
, without automatic line flushing.
Parameters: |
| A character-output writer |
PrintWriter(Writer out, boolean autoFlush)
creates a new PrintWriter
.
Parameters: |
| A character-output writer |
| If |
PrintWriter(OutputStream out)
creates a new PrintWriter
, without automatic line flushing, from an existing OutputStream
by automatically creating the necessary intermediate OutputStreamWriter
.
Parameters: |
| An output stream |
PrintWriter(OutputStream out, boolean autoFlush)
creates a new PrintWriter
from an existing OutputStream
but allows you to determine whether the writer autoflushes or not.
Parameters: |
| An output stream |
| If |
void print(Object obj)
prints an object by printing the string resulting from toString
.
Parameters: |
| The object to be printed |
void print(String s)
prints a Unicode string.
void println(String s)
prints a string followed by a line terminator. Flushes the stream if the stream is in autoflush mode.
void print(char[] s)
prints an array of Unicode characters.
void print(char c)
prints a Unicode character.
void print(int i)
prints an integer in text format.
void print(long l)
prints a long integer in text format.
void print(float f)
prints a floating-point number in text format.
void print(double d)
prints a double-precision floating-point number in text format.
void print(boolean b)
prints a Boolean value in text format.
boolean checkError()
returns true
if a formatting or output error occurred. Once the stream has encountered an error, it is tainted and all calls to checkError
return true
.
As you know:
To write data in binary format, you use a DataOutputStream
.
To write in text format, you use a PrintWriter
.
Therefore, you might expect that there is an analog to the DataInputStream
that lets you read data in text format. The closest analog is the Scanner
class that we have used extensively. However, before JDK 5.0, the only game in town for processing text input was the BufferedReader
method—it has a method, readLine
, that lets you read a line of text. You need to combine a buffered reader with an input source.
BufferedReader in = new BufferedReader(new FileReader("employee.txt"));
The readLine
method returns null
when no more input is available. A typical input loop, therefore, looks like this:
String line;
while ((line = in.readLine()) != null)
{
do something with line
}
The FileReader
class already converts bytes to Unicode characters. For other input sources, you need to use the InputStreamReader
—unlike the PrintWriter
, the InputStreamReader
has no automatic convenience method to bridge the gap between bytes and Unicode characters.
BufferedReader in2 = new BufferedReader(new InputStreamReader(System.in)); BufferedReader in3 = new BufferedReader(new InputStreamReader(url.openStream()));
To read numbers from text input, you need to read a string first and then convert it.
String s = in.readLine(); double x = Double.parseDouble(s);
That works if there is a single number on each line. Otherwise, you must work harder and break up the input string, for example, by using the StringTokenizer
utility class. We see an example of this later in this chapter.
ZIP files are archives that store one or more files in (usually) compressed format. Java 1.1 can handle both GZIP and ZIP format. (See RFC 1950, RFC 1951, and RFC 1952, for example, at http://www.faqs.org/rfcs.) In this section we concentrate on the more familiar (but somewhat more complicated) ZIP format and leave the GZIP classes to you if you need them. (They work in much the same way.)
The classes for handling ZIP files are in java.util.zip
and not in java.io
, so remember to add the necessary import
statement. Although not part of java.io
, the GZIP
and ZIP
classes subclass java.io.FilterInputStream
and java.io.FilterOutputStream
. The java.util.zip
packages also contain classes for computing cyclic redundancy check (CRC) checksums. (CRC is a method to generate a hashlike code that the receiver of a file can use to check the integrity of the data.)
Each ZIP file has a header with information such as the name of the file and the compression method that was used. In Java, you use a ZipInputStream
to read a ZIP file by layering the ZipInputStream
constructor onto a FileInputStream
. You then need to look at the individual entries in the archive. The getNextEntry
method returns an object of type ZipEntry
that describes the entry. The read
method of the ZipInputStream
is modified to return –1 at the end of the current entry (instead of just at the end of the ZIP file). You must then call closeEntry
to read the next entry. Here is a typical code sequence to read through a ZIP file:
ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname));
ZipEntry entry;
while ((entry = zin.getNextEntry()) != null)
{
analyze entry;
read the contents of zin;
zin.closeEntry();
}
zin.close();
To read the contents of a ZIP entry, you will probably not want to use the raw read
method; usually, you will use the methods of a more competent stream filter. For example, to read a text file inside a ZIP file, you can use the following loop:
BufferedReader in = new BufferedReader(new InputStreamReader(zin));
String s;
while ((s = in.readLine()) != null)
do something with s;
The program in Example 12–1 lets you open a ZIP file. It then displays the files stored in the ZIP archive in the combo box at the bottom of the screen. If you double-click on one of the files, the contents of the file are displayed in the text area, as shown in Figure 12–5.
Example 12–1. ZipTest.java
1. import java.awt.*; 2. import java.awt.event.*; 3. import java.io.*; 4. import java.util.*; 5. import java.util.zip.*; 6. import javax.swing.*; 7. import javax.swing.filechooser.FileFilter; 8. 9. public class ZipTest 10. { 11. public static void main(String[] args) 12. { 13. ZipTestFrame frame = new ZipTestFrame(); 14. frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); 15. frame.setVisible(true); 16. } 17. } 18. 19. /** 20. A frame with a text area to show the contents of a file inside 21. a zip archive, a combo box to select different files in the 22. archive, and a menu to load a new archive. 23. */ 24. class ZipTestFrame extends JFrame 25. { 26. public ZipTestFrame() 27. { 28. setTitle("ZipTest"); 29. setSize(DEFAULT_WIDTH, DEFAULT_HEIGHT); 30. 31. // add the menu and the Open and Exit menu items 32. JMenuBar menuBar = new JMenuBar(); 33. JMenu menu = new JMenu("File"); 34. 35. JMenuItem openItem = new JMenuItem("Open"); 36. menu.add(openItem); 37. openItem.addActionListener(new OpenAction()); 38. 39. JMenuItem exitItem = new JMenuItem("Exit"); 40. menu.add(exitItem); 41. exitItem.addActionListener(new 42. ActionListener() 43. { 44. public void actionPerformed(ActionEvent event) 45. { 46. System.exit(0); 47. } 48. }); 49. 50. menuBar.add(menu); 51. setJMenuBar(menuBar); 52. 53. // add the text area and combo box 54. fileText = new JTextArea(); 55. fileCombo = new JComboBox(); 56. fileCombo.addActionListener(new 57. ActionListener() 58. { 59. public void actionPerformed(ActionEvent event) 60. { 61. loadZipFile((String) fileCombo.getSelectedItem()); 62. } 63. }); 64. 65. add(fileCombo, BorderLayout.SOUTH); 66. add(new JScrollPane(fileText), BorderLayout.CENTER); 67. } 68. 69. /** 70. This is the listener for the File->Open menu item. 71. */ 72. private class OpenAction implements ActionListener 73. { 74. public void actionPerformed(ActionEvent event) 75. { 76. // prompt the user for a zip file 77. JFileChooser chooser = new JFileChooser(); 78. chooser.setCurrentDirectory(new File(".")); 79. ExtensionFileFilter filter = new ExtensionFileFilter(); 80. filter.addExtension(".zip"); 81. filter.addExtension(".jar"); 82. filter.setDescription("ZIP archives"); 83. chooser.setFileFilter(filter); 84. int r = chooser.showOpenDialog(ZipTestFrame.this); 85. if (r == JFileChooser.APPROVE_OPTION) 86. { 87. zipname = chooser.getSelectedFile().getPath(); 88. scanZipFile(); 89. } 90. } 91. } 92. 93. /** 94. Scans the contents of the zip archive and populates 95. the combo box. 96. */ 97. public void scanZipFile() 98. { 99. fileCombo.removeAllItems(); 100. try 101. { 102. ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname)); 103. ZipEntry entry; 104. while ((entry = zin.getNextEntry()) != null) 105. { 106. fileCombo.addItem(entry.getName()); 107. zin.closeEntry(); 108. } 109. zin.close(); 110. } 111. catch (IOException e) 112. { 113. e.printStackTrace(); 114. } 115. } 116. 117. /** 118. Loads a file from the zip archive into the text area 119. @param name the name of the file in the archive 120. */ 121. public void loadZipFile(String name) 122. { 123. try 124. { 125. ZipInputStream zin = new ZipInputStream(new FileInputStream(zipname)); 126. ZipEntry entry; 127. fileText.setText(""); 128. 129. // find entry with matching name in archive 130. while ((entry = zin.getNextEntry()) != null) 131. { 132. if (entry.getName().equals(name)) 133. { 134. // read entry into text area 135. BufferedReader in = new BufferedReader(new InputStreamReader(zin)); 136. String line; 137. while ((line = in.readLine()) != null) 138. { 139. fileText.append(line); 140. fileText.append(" "); 141. } 142. } 143. zin.closeEntry(); 144. } 145. zin.close(); 146. } 147. catch (IOException e) 148. { 149. e.printStackTrace(); 150. } 151. } 152. 153. public static final int DEFAULT_WIDTH = 400; 154. public static final int DEFAULT_HEIGHT = 300; 155. 156. private JComboBox fileCombo; 157. private JTextArea fileText; 158. private String zipname; 159. } 160. 161. /** 162. This file filter matches all files with a given set of 163. extensions. From FileChooserTest in chapter 9 164. */ 165. class ExtensionFileFilter extends FileFilter 166. { 167. /** 168. Adds an extension that this file filter recognizes. 169. @param extension a file extension (such as ".txt" or "txt") 170. */ 171. public void addExtension(String extension) 172. { 173. if (!extension.startsWith(".")) 174. extension = "." + extension; 175. extensions.add(extension.toLowerCase()); 176. } 177. 178. /** 179. Sets a description for the file set that this file filter 180. recognizes. 181. @param aDescription a description for the file set 182. */ 183. public void setDescription(String aDescription) 184. { 185. description = aDescription; 186. } 187. 188. /** 189. Returns a description for the file set that this file 190. filter recognizes. 191. @return a description for the file set 192. */ 193. public String getDescription() 194. { 195. return description; 196. } 197. 198. public boolean accept(File f) 199. { 200. if (f.isDirectory()) return true; 201. String name = f.getName().toLowerCase(); 202. 203. // check if the file name ends with any of the extensions 204. for (String e : extensions) 205. if (name.endsWith(e)) 206. return true; 207. return false; 208. } 209. 210. private String description = ""; 211. private ArrayList<String> extensions = new ArrayList<String>(); 212. }
The ZIP input stream throws a ZipException
when there is an error in reading a ZIP file. Normally this error occurs when the ZIP file has been corrupted.
To write a ZIP file, you open a ZipOutputStream
by layering it onto a FileOutputStream
. For each entry that you want to place into the ZIP file, you create a ZipEntry
object. You pass the file name to the ZipEntry
constructor; it sets the other parameters such as file date and decompression method automatically. You can override these settings if you like. Then, you call the putNextEntry
method of the ZipOutputStream
to begin writing a new file. Send the file data to the ZIP stream. When you are done, call closeEntry
. Repeat for all the files you want to store. Here is a code skeleton:
FileOutputStream fout = new FileOutputStream("test.zip");
ZipOutputStream zout = new ZipOutputStream(fout);
for all files{
ZipEntry ze = new ZipEntry
(filename); zout.putNextEntry(ze);
send data to zout;
zout.closeEntry();
}
zout.close();
JAR files (which were discussed in Chapter 10) are simply ZIP files with another entry, the so-called manifest. You use the JarInputStream
and JarOutputStream
classes to read and write the manifest entry.
ZIP streams are a good example of the power of the stream abstraction. Both the source and the destination of the ZIP data are completely flexible. You layer the most convenient reader stream onto the ZIP file stream to read the data that are stored in compressed form, and that reader doesn’t even realize that the data are being decompressed as they are being requested. And the source of the bytes in ZIP formats need not be a file—the ZIP data can come from a network connection. In fact, the JAR files that we discussed in Chapter 10 are ZIP-formatted files. Whenever the class loader of an applet reads a JAR file, it reads and decompresses data from the network.
The article at http://www.javaworld.com/javaworld/jw-10-2000/jw-1027-toolbox.html shows you how to modify a ZIP archive.
ZipInputStream(InputStream in)
This constructor creates a ZipInputStream
that allows you to inflate data from the given InputStream
.
Parameters: |
| The underlying input stream |
ZipEntry getNextEntry()
returns a ZipEntry
object for the next entry, or null
if there are no more entries.
void closeEntry()
closes the current open entry in the ZIP file. You can then read the next entry by using getNextEntry()
.
ZipOutputStream(OutputStream out)
this constructor creates a ZipOutputStream
that you use to write compressed data to the specified OutputStream
.
Parameters: |
| The underlying output stream |
void putNextEntry(ZipEntry ze)
writes the information in the given ZipEntry
to the stream and positions the stream for the data. The data can then be written to the stream by write()
.
Parameters: |
| The new entry |
void closeEntry()
closes the currently open entry in the ZIP file. Use the putNextEntry
method to start the next entry.
void setLevel(int level)
sets the default compression level of subsequent DEFLATED
entries. The default value is Deflater.DEFAULT_COMPRESSION
. Throws an IllegalArgumentException
if the level is not valid.
Parameters: |
| A compression level, from 0 (NO_COMPRESSION) to 9 (BEST_COMPRESSION) |
void setMethod(int method)
sets the default compression method for this ZipOutputStream
for any entries that do not specify a method.
Parameters: |
| The compression method, either DEFLATED or STORED |
ZipEntry(String name)
Parameters: |
| The name of the entry |
long getCrc()
returns the CRC32 checksum value for this ZipEntry
.
String getName()
returns the name of this entry.
long getSize()
returns the uncompressed size of this entry, or –1 if the uncompressed size is not known.
boolean isDirectory()
returns a Boolean that indicates whether this entry is a directory.
void setMethod(int method)
Parameters: |
| The compression method for the entry; must be either DEFLATED or STORED |
void setSize(long size)
sets the size of this entry. Only required if the compression method is STORED
.
Parameters: |
| The uncompressed size of this entry |
void setCrc(long crc)
sets the CRC32 checksum of this entry. Use the CRC32
class to compute this checksum. Only required if the compression method is STORED
.
Parameters: |
| The checksum of this entry |
ZipFile(String name)
this constructor creates a ZipFile
for reading from the given string.
Parameters: |
| A string that contains the path name of the file |
ZipFile(File file)
this constructor creates a ZipFile
for reading from the given File
object.
Parameters: |
| The file to read; the File class is described at the end of this chapter |
Enumeration entries()
returns an Enumeration
object that enumerates the ZipEntry
objects that describe the entries of the ZipFile
.
ZipEntry getEntry(String name)
returns the entry corresponding to the given name, or null
if there is no such entry.
Parameters: |
| The entry name |
InputStream getInputStream(ZipEntry ze)
returns an InputStream
for the given entry.
Parameters: |
| A ZipEntry in the ZIP file |
String getName()
returns the path of this ZIP file.
In the next four sections, we show you how to put some of the creatures in the stream zoo to good use. For these examples, we assume you are working with the Employee
class and some of its subclasses, such as Manager
. (See Chapters 4 and 5 for more on these example classes.) We consider four separate scenarios for saving an array of employee records to a file and then reading them back into memory:
Saving data of the same type (Employee
) in text format
Saving data of the same type in binary format
Saving and restoring polymorphic data (a mixture of Employee
and Manager
objects)
Saving and restoring data containing embedded references (managers with pointers to other employees)
In this section, you learn how to store an array of Employee
records in the time-honored delimited format. This means that each record is stored in a separate line. Instance fields are separated from each other by delimiters. We use a vertical bar (|
) as our delimiter. (A colon (:
) is another popular choice. Part of the fun is that everyone uses a different delimiter.) Naturally, we punt on the issue of what might happen if a |
actually occurred in one of the strings we save.
Especially on UNIX systems, an amazing number of files are stored in exactly this format. We have seen entire employee databases with thousands of records in this format, queried with nothing more than the UNIX awk
, sort
, and join
utilities. (In the PC world, where desktop database programs are available at low cost, this kind of ad hoc storage is much less common.)
Here is a sample set of records:
Harry Hacker|35500|1989|10|1 Carl Cracker|75000|1987|12|15 Tony Tester|38000|1990|3|15
Writing records is simple. Because we write to a text file, we use the PrintWriter
class. We simply write all fields, followed by either a |
or, for the last field, a
. Finally, in keeping with the idea that we want the class to be responsible for responding to messages, we add a method, writeData
, to our Employee
class.
public void writeData(PrintWriter out) throws IOException { GregorianCalendar calendar = new GregorianCalendar(); calendar.setTime(hireDay); out.println(name + "|" + salary + "|" + calendar.get(Calendar.YEAR) + "|" + (calendar.get(Calendar.MONTH) + 1) + "|" + calendar.get(Calendar.DAY_OF_MONTH)); }
To read records, we read in a line at a time and separate the fields. This is the topic of the next section, in which we use a utility class supplied with Java to make our job easier.
When reading a line of input, we get a single long string. We want to split it into individual strings. This means finding the |
delimiters and then separating out the individual pieces, that is, the sequence of characters up to the next delimiter. (These are usually called tokens
.) The StringTokenizer
class in java.util
is designed for exactly this purpose. It gives you an easy way to break up a large string that contains delimited text. The idea is that a string tokenizer object attaches to a string. When you construct the tokenizer object, you specify which characters are the delimiters. For example, we need to use
StringTokenizer tokenizer = new StringTokenizer(line, "|");
You can specify multiple delimiters in the string, for example:
StringTokenizer tokenizer = new StringTokenizer(line, "|,;");
This means that any of the characters in the string can serve as delimiters.
If you don’t specify a delimiter set, the default is "
",
that is, all whitespace characters (space, tab, newline, and carriage return)
Once you have constructed a string tokenizer, you can use its methods to quickly extract the tokens from the string. The nextToken
method returns the next unread token. The hasMoreTokens
method returns true
if more tokens are available. The following loop processes all tokens:
while (tokenizer.hasMoreTokens())
{
String token = tokenizer.nextToken();
process token
}
An alternative to the StringTokenizer
is the split
method of the String
class. The call line.split("[|,;]")
returns a String[]
array consisting of all tokens, using the delimiters inside the brackets. You can use any regular expression to describe delimiters—we will discuss regular expression on page 698.
StringTokenizer(String str, String delim)
constructs a string tokenizer with the given delimiter set.
Parameters: |
| The input string from which tokens are read |
| A string containing delimiter characters (every character in this string is a delimiter) |
StringTokenizer(String str)
constructs a string tokenizer with the default delimiter set "
"
.
boolean hasMoreTokens()
returns true
if more tokens exist.
String nextToken()
returns the next token; throws a NoSuchElementException
if there are no more tokens.
String nextToken(String delim)
returns the next token after switching to the new delimiter set. The new delimiter set is subsequently used.
int countTokens()
returns the number of tokens still in the string.
Reading in an Employee
record is simple. We simply read in a line of input with the readLine
method of the BufferedReader
class. Here is the code needed to read one record into a string.
BufferedReader in = new BufferedReader(new FileReader("employee.dat")); . . . String line = in.readLine();
Next, we need to extract the individual tokens. When we do this, we end up with strings, so we need to convert them to numbers.
Just as with the writeData
method, we add a readData
method of the Employee
class. When you call
e.readData(in);
this method overwrites the previous contents of e
. Note that the method may throw an IOException
if the readLine
method throws that exception. This method can do nothing if an IOException
occurs, so we just let it propagate up the call chain.
Here is the code for this method:
public void readData(BufferedReader in) throws IOException { String s = in.readLine(); StringTokenizer t = new StringTokenizer(s, "|"); name = t.nextToken(); salary = Double.parseDouble(t.nextToken()); int y = Integer.parseInt(t.nextToken()); int m = Integer.parseInt(t.nextToken()); int d = Integer.parseInt(t.nextToken()); GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); // GregorianCalendar uses 0 = January hireDay = calendar.getTime(); }
Finally, in the code for a program that tests these methods, the static method
void writeData(Employee[] e, PrintWriter out)
first writes the length of the array, then writes each record. The static method
Employee[] readData(BufferedReader in)
first reads in the length of the array, then reads in each record, as illustrated in Example 12–2.
Example 12–2. DataFileTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. public class DataFileTest 5. { 6. public static void main(String[] args) 7. { 8. Employee[] staff = new Employee[3]; 9. 10. staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15); 11. staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1); 12. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 13. 14. try 15. { 16. // save all employee records to the file employee.dat 17. PrintWriter out = new PrintWriter(new FileWriter("employee.dat")); 18. writeData(staff, out); 19. out.close(); 20. 21. // retrieve all records into a new array 22. BufferedReader in = new BufferedReader(new FileReader("employee.dat")); 23. Employee[] newStaff = readData(in); 24. in.close(); 25. 26. // print the newly read employee records 27. for (Employee e : newStaff) 28. System.out.println(e); 29. } 30. catch(IOException exception) 31. { 32. exception.printStackTrace(); 33. } 34. } 35. 36. /** 37. Writes all employees in an array to a print writer 38. @param employees an array of employees 39. @param out a print writer 40. */ 41. static void writeData(Employee[] employees, PrintWriter out) 42. throws IOException 43. { 44. // write number of employees 45. out.println(employees.length); 46. 47. for (Employee e : employees) 48. e.writeData(out); 49. } 50. 51. /** 52. Reads an array of employees from a buffered reader 53. @param in the buffered reader 54. @return the array of employees 55. */ 56. static Employee[] readData(BufferedReader in) 57. throws IOException 58. { 59. // retrieve the array size 60. int n = Integer.parseInt(in.readLine()); 61. 62. Employee[] employees = new Employee[n]; 63. for (int i = 0; i < n; i++) 64. { 65. employees[i] = new Employee(); 66. employees[i].readData(in); 67. } 68. return employees; 69. } 70. } 71. 72. class Employee 73. { 74. public Employee() {} 75. 76. public Employee(String n, double s, int year, int month, int day) 77. { 78. name = n; 79. salary = s; 80. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 81. hireDay = calendar.getTime(); 82. } 83. 84. public String getName() 85. { 86. return name; 87. } 88. 89. public double getSalary() 90. { 91. return salary; 92. } 93. 94. public Date getHireDay() 95. { 96. return hireDay; 97. } 98. 99. public void raiseSalary(double byPercent) 100. { 101. double raise = salary * byPercent / 100; 102. salary += raise; 103. } 104. 105. public String toString() 106. { 107. return getClass().getName() 108. + "[name=" + name 109. + ",salary=" + salary 110. + ",hireDay=" + hireDay 111. + "]"; 112. } 113. 114. /** 115. Writes employee data to a print writer 116. @param out the print writer 117. */ 118. public void writeData(PrintWriter out) throws IOException 119. { 120. GregorianCalendar calendar = new GregorianCalendar(); 121. calendar.setTime(hireDay); 122. out.println(name + "|" 123. + salary + "|" 124. + calendar.get(Calendar.YEAR) + "|" 125. + (calendar.get(Calendar.MONTH) + 1) + "|" 126. + calendar.get(Calendar.DAY_OF_MONTH)); 127. } 128. 129. /** 130. Reads employee data from a buffered reader 131. @param in the buffered reader 132. */ 133. public void readData(BufferedReader in) throws IOException 134. { 135. String s = in.readLine(); 136. StringTokenizer t = new StringTokenizer(s, "|"); 137. name = t.nextToken(); 138. salary = Double.parseDouble(t.nextToken()); 139. int y = Integer.parseInt(t.nextToken()); 140. int m = Integer.parseInt(t.nextToken()); 141. int d = Integer.parseInt(t.nextToken()); 142. GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); 143. hireDay = calendar.getTime(); 144. } 145. 146. private String name; 147. private double salary; 148. private Date hireDay; 149. }
When you process input, you often need to construct strings from individual characters or Unicode code units. It would be inefficient to use string concatenation for this purpose. Every time you append characters to a string, the string object needs to find new memory to hold the larger string: this is time consuming. Appending even more characters means the string needs to be relocated again and again. Using the StringBuilder
class avoids this problem.
In contrast, a StringBuilder
works much like an ArrayList
. It manages a char[]
array that can grow and shrink on demand. You can append, insert, or remove code units until the string builder holds the desired string. Then you use the toString
method to convert the contents to an actual String
object.
The StringBuilder
class was introduced in JDK 5.0. Its predecessor, StringBuffer
, is slightly less efficient, but it allows multiple threads to add or remove characters. If all string editing happens in a single thread, you should use StringBuilder
instead. The APIs of both classes are identical.
The following API notes contain the most important methods for the StringBuilder
and StringBuffer
classes.
java.lang.StringBuffer 1.0
StringBuilder/StringBuffer()
constructs an empty string builder or string buffer.
StringBuilder/StringBuffer(int length)
constructs an empty string builder or string buffer with the initial capacity length
.
StringBuilder/StringBuffer(String str)
constructs a string builder or string buffer with the initial contents str
.
int length()
returns the number of code units of the builder or buffer.
StringBuilder/StringBuffer append(String str)
appends a string and returns this
.
StringBuilder/StringBuffer append(char c)
appends a code unit and returns this
.
StringBuilder/StringBuffer appendCodePoint(int cp)
5.0
appends a code point, converting it into one or two code units, and returns this
.
void setCharAt(int i, char c)
sets the i
th code unit to c
.
StringBuilder/StringBuffer insert(int offset, String str)
inserts a string at position offset
and returns this
.
StringBuilder/StringBuffer insert(int offset, char c)
inserts a code unit at position offset
and returns this
.
StringBuilder/StringBuffer delete(int startIndex, int endIndex)
deletes the code units with offsets startIndex
to endIndex - 1
and returns this
.
String toString()
returns a string with the same data as the builder or buffer contents.
If you have a large number of employee records of variable length, the storage technique used in the preceding section suffers from one limitation: it is not possible to read a record in the middle of the file without first reading all records that come before it. In this section, we make all records the same length. This lets us implement a random-access method for reading back the information by using the RandomAccessFile
class that you saw earlier—we can use this to get at any record in the same amount of time.
We will store the numbers in the instance fields in our classes in a binary format. We do that with the writeInt
and writeDouble
methods of the DataOutput
interface. (As we mentioned earlier, this is the common interface of the DataOutputStream
and the RandomAccessFile
classes.)
However, because the size of each record must remain constant, we need to make all the strings the same size when we save them. The variable-size UTF format does not do this, and the rest of the Java library provides no convenient means of accomplishing this. We need to write a bit of code to implement two helper methods to make the strings the same size. We will call the methods writeFixedString
and readFixedString
. These methods read and write Unicode strings that always have the same length.
The writeFixedString
method takes the parameter size
. Then, it writes the specified number of code units, starting at the beginning of the string. (If there are too few code units, the method pads the string, using zero values.) Here is the code for the writeFixedString
method:
static void writeFixedString(String s, int size, DataOutput out) throws IOException { int i; for (i = 0; i < size; i++) { char ch = 0; if (i < s.length()) ch = s.charAt(i); out.writeChar(ch); } }
The readFixedString
method reads characters from the input stream until it has consumed size
code units or until it encounters a character with a zero value. Then, it should skip past the remaining zero values in the input field. For added efficiency, this method uses the StringBuilder
class to read in a string.
static String readFixedString(int size, DataInput in) throws IOException { StringBuilder b = new StringBuilder(size); int i = 0; boolean more = true; while (more && i < size) { char ch = in.readChar(); i++; if (ch == 0) more = false; else b.append(ch); } in.skipBytes(2 * (size - i)); return b.toString(); }
To write a fixed-size record, we simply write all fields in binary.
public void writeData(DataOutput out) throws IOException { DataIO.writeFixedString(name, NAME_SIZE, out); out.writeDouble(salary); GregorianCalendar calendar = new GregorianCalendar(); calendar.setTime(hireDay); out.writeInt(calendar.get(Calendar.YEAR)); out.writeInt(calendar.get(Calendar.MONTH) + 1); out.writeInt(calendar.get(Calendar.DAY_OF_MONTH)); }
Reading the data back is just as simple.
public void readData(DataInput in) throws IOException { name = DataIO.readFixedString(NAME_SIZE, in); salary = in.readDouble(); int y = in.readInt(); int m = in.readInt(); int d = in.readInt(); GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); hireDay = calendar.getTime(); }
In our example, each employee record is 100 bytes long because we specified that the name field would always be written using 40 characters. This gives us a breakdown as indicated in the following:
40 characters = 80 bytes for the name
1 double
= 8 bytes
3 int
= 12 bytes
As an example, suppose you want to position the file pointer to the third record. You can use the following version of the seek
method:
long n = 3; int RECORD_SIZE = 100; in.seek((n - 1) * RECORD_SIZE);
Then you can read a record:
Employee e = new Employee(); e.readData(in);
If you want to modify the record and then save it back into the same location, remember to set the file pointer back to the beginning of the record:
in.seek((n - 1) * RECORD_SIZE); e.writeData(out);
To determine the total number of bytes in a file, use the length
method. The total number of records is the length divided by the size of each record.
long int nbytes = in.length(); // length in bytes int nrecords = (int) (nbytes / RECORD_SIZE);
The test program shown in Example 12–3 writes three records into a data file and then reads them from the file in reverse order. To do this efficiently requires random access—we need to get at the third record first.
Example 12–3. RandomFileTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. public class RandomFileTest 5. { 6. public static void main(String[] args) 7. { 8. Employee[] staff = new Employee[3]; 9. 10. staff[0] = new Employee("Carl Cracker", 75000, 1987, 12, 15); 11. staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1); 12. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 13. 14. try 15. { 16. // save all employee records to the file employee.dat 17. DataOutputStream out = new DataOutputStream(new FileOutputStream("employee .dat")); 18. for (Employee e : staff) 19. e.writeData(out); 20. out.close(); 21. 22. // retrieve all records into a new array 23. RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); 24. // compute the array size 25. int n = (int)(in.length() / Employee.RECORD_SIZE); 26. Employee[] newStaff = new Employee[n]; 27. 28. // read employees in reverse order 29. for (int i = n - 1; i >= 0; i--) 30. { 31. newStaff[i] = new Employee(); 32. in.seek(i * Employee.RECORD_SIZE); 33. newStaff[i].readData(in); 34. } 35. in.close(); 36. 37. // print the newly read employee records 38. for (Employee e : newStaff) 39. System.out.println(e); 40. } 41. catch(IOException e) 42. { 43. e.printStackTrace(); 44. } 45. } 46. } 47. 48. class Employee 49. { 50. public Employee() {} 51. 52. public Employee(String n, double s, int year, int month, int day) 53. { 54. name = n; 55. salary = s; 56. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 57. hireDay = calendar.getTime(); 58. } 59. 60. public String getName() 61. { 62. return name; 63. } 64. 65. public double getSalary() 66. { 67. return salary; 68. } 69. 70. public Date getHireDay() 71. { 72. return hireDay; 73. } 74. 75. /** 76. Writes employee data to a data output 77. @param out the data output 78. */ 79. public void raiseSalary(double byPercent) 80. { 81. double raise = salary * byPercent / 100; 82. salary += raise; 83. } 84. 85. public String toString() 86. { 87. return getClass().getName() 88. + "[name=" + name 89. + ",salary=" + salary 90. + ",hireDay=" + hireDay 91. + "]"; 92. } 93. 94. /** 95. Writes employee data to a data output 96. @param out the data output 97. */ 98. public void writeData(DataOutput out) throws IOException 99. { 100. DataIO.writeFixedString(name, NAME_SIZE, out); 101. out.writeDouble(salary); 102. 103. GregorianCalendar calendar = new GregorianCalendar(); 104. calendar.setTime(hireDay); 105. out.writeInt(calendar.get(Calendar.YEAR)); 106. out.writeInt(calendar.get(Calendar.MONTH) + 1); 107. out.writeInt(calendar.get(Calendar.DAY_OF_MONTH)); 108. } 109. 110. /** 111. Reads employee data from a data input 112. @param in the data input 113. */ 114. public void readData(DataInput in) throws IOException 115. { 116. name = DataIO.readFixedString(NAME_SIZE, in); 117. salary = in.readDouble(); 118. int y = in.readInt(); 119. int m = in.readInt(); 120. int d = in.readInt(); 121. GregorianCalendar calendar = new GregorianCalendar(y, m - 1, d); 122. hireDay = calendar.getTime(); 123. } 124. 125. public static final int NAME_SIZE = 40; 126. public static final int RECORD_SIZE = 2 * NAME_SIZE + 8 + 4 + 4 + 4; 127. 128. private String name; 129. private double salary; 130. private Date hireDay; 131. } 132. 133. class DataIO 134. { 135. public static String readFixedString(int size, DataInput in) 136. throws IOException 137. { 138. StringBuilder b = new StringBuilder(size); 139. int i = 0; 140. boolean more = true; 141. while (more && i < size) 142. { 143. char ch = in.readChar(); 144. i++; 145. if (ch == 0) more = false; 146. else b.append(ch); 147. } 148. in.skipBytes(2 * (size - i)); 149. return b.toString(); 150. } 151. 152. public static void writeFixedString(String s, int size, DataOutput out) 153. throws IOException 154. { 155. int i; 156. for (i = 0; i < size; i++) 157. { 158. char ch = 0; 159. if (i < s.length()) ch = s.charAt(i); 160. out.writeChar(ch); 161. } 162. } 163. }
Using a fixed-length record format is a good choice if you need to store data of the same type. However, objects that you create in an object-oriented program are rarely all of the same type. For example, you may have an array called staff
that is nominally an array of Employee
records but contains objects that are actually instances of a subclass such as Manager
.
If we want to save files that contain this kind of information, we must first save the type of each object and then the data that define the current state of the object. When we read this information back from a file, we must
Read the object type;
Create a blank object of that type;
Fill it with the data that we stored in the file.
It is entirely possible (if very tedious) to do this by hand, and in the first edition of this book we did exactly this. However, Sun Microsystems developed a powerful mechanism that allows this to be done with much less effort. As you will soon see, this mechanism, called object serialization, almost completely automates what was previously a very tedious process. (You see later in this chapter where the term “serialization” comes from.)
To save object data, you first need to open an ObjectOutputStream
object:
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("employee.dat"));
Now, to save an object, you simply use the writeObject
method of the ObjectOutputStream
class as in the following fragment:
Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1); Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15); out.writeObject(harry); out.writeObject(boss);
To read the objects back in, first get an ObjectInputStream
object:
ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee.dat"));
Then, retrieve the objects in the same order in which they were written, using the readObject
method.
Employee e1 = (Employee) in.readObject(); Employee e2 = (Employee) in.readObject();
When reading back objects, you must carefully keep track of the number of objects that were saved, their order, and their types. Each call to readObject
reads in another object of the type Object
. You therefore will need to cast it to its correct type.
If you don’t need the exact type or you don’t remember it, then you can cast it to any superclass or even leave it as type Object
. For example, e2
is an Employee
object variable even though it actually refers to a Manager
object. If you need to dynamically query the type of the object, you can use the getClass
method that we described in Chapter 5.
You can write and read only objects with the writeObject/readObject
methods. For primitive type values, you use methods such as writeInt/readInt
or writeDouble/readDouble.
(The object stream classes implement the DataInput/DataOutput
interfaces.) Of course, numbers inside objects (such as the salary
field of an Employee
object) are saved and restored automatically. Recall that, in Java, strings and arrays are objects and can, therefore, be processed with the writeObject/readObject
methods.
There is, however, one change you need to make to any class that you want to save and restore in an object stream. The class must implement the Serializable
interface:
class Employee implements Serializable { . . . }
The Serializable
interface has no methods, so you don’t need to change your classes in any way. In this regard, it is similar to the Cloneable
interface that we also discussed in Chapter 6. However, to make a class cloneable, you still had to override the clone
method of the Object
class. To make a class serializable, you do not need to do anything else.
Example 12–4 is a test program that writes an array containing two employees and one manager to disk and then restores it. Writing an array is done with a single operation:
Employee[] staff = new Employee[3];
. . .
out.writeObject(staff);
Similarly, reading in the result is done with a single operation. However, we must apply a cast to the return value of the readObject
method:
Employee[] newStaff = (Employee[]) in.readObject();
Once the information is restored, we print each employee because you can easily distinguish employee and manager objects by their different toString
results. This should convince you that we did restore the correct types.
Example 12–4. ObjectFileTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. class ObjectFileTest 5. { 6. public static void main(String[] args) 7. { 8. Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15); 9. boss.setBonus(5000); 10. 11. Employee[] staff = new Employee[3]; 12. 13. staff[0] = boss; 14. staff[1] = new Employee("Harry Hacker", 50000, 1989, 10, 1); 15. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 16. 17. try 18. { 19. // save all employee records to the file employee.dat 20. ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream ("employee.dat")); 21. out.writeObject(staff); 22. out.close(); 23. 24. // retrieve all records into a new array 25. ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee .dat")); 26. Employee[] newStaff = (Employee[]) in.readObject(); 27. in.close(); 28. 29. // print the newly read employee records 30. for (Employee e : newStaff) 31. System.out.println(e); 32. } 33. catch (Exception e) 34. { 35. e.printStackTrace(); 36. } 37. } 38. } 39. 40. class Employee implements Serializable 41. { 42. public Employee() {} 43. 44. public Employee(String n, double s, int year, int month, int day) 45. { 46. name = n; 47. salary = s; 48. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 49. hireDay = calendar.getTime(); 50. } 51. 52. public String getName() 53. { 54. return name; 55. } 56. 57. public double getSalary() 58. { 59. return salary; 60. } 61. 62. public Date getHireDay() 63. { 64. return hireDay; 65. } 66. 67. public void raiseSalary(double byPercent) 68. { 69. double raise = salary * byPercent / 100; 70. salary += raise; 71. } 72. 73. public String toString() 74. { 75. return getClass().getName() 76. + "[name=" + name 77. + ",salary=" + salary 78. + ",hireDay=" + hireDay 79. + "]"; 80. } 81. 82. private String name; 83. private double salary; 84. private Date hireDay; 85. } 86. 87. class Manager extends Employee 88. { 89. /** 90. @param n the employee's name 91. @param s the salary 92. @param year the hire year 93. @param month the hire month 94. @param day the hire day 95. */ 96. public Manager(String n, double s, int year, int month, int day) 97. { 98. super(n, s, year, month, day); 99. bonus = 0; 100. } 101. 102. public double getSalary() 103. { 104. double baseSalary = super.getSalary(); 105. return baseSalary + bonus; 106. } 107. 108. public void setBonus(double b) 109. { 110. bonus = b; 111. } 112. 113. public String toString() 114. { 115. return super.toString() 116. + "[bonus=" + bonus 117. + "]"; 118. } 119. 120. private double bonus; 121. }
ObjectOutputStream(OutputStream out)
creates an ObjectOutputStream
so that you can write objects to the specified OutputStream
.
void writeObject(Object obj)
writes the specified object to the ObjectOutputStream
. This method saves the class of the object, the signature of the class, and the values of any nonstatic, nontransient field of the class and its superclasses.
ObjectInputStream(InputStream is)
creates an ObjectInputStream
to read back object information from the specified InputStream
.
Object readObject()
reads an object from the ObjectInputStream
. In particular, this method reads back the class of the object, the signature of the class, and the values of the nontransient and nonstatic fields of the class and all its superclasses. It does deserializing to allow multiple object references to be recovered.
Object serialization saves object data in a particular file format. Of course, you can use the writeObject/readObject
methods without having to know the exact sequence of bytes that represents objects in a file. Nonetheless, we found studying the data format to be extremely helpful for gaining insight into the object streaming process. We did this by looking at hex dumps of various saved object files. However, the details are somewhat technical, so feel free to skip this section if you are not interested in the implementation.
Every file begins with the two-byte “magic number”
AC ED
followed by the version number of the object serialization format, which is currently
00 05
(We use hexadecimal numbers throughout this section to denote bytes.) Then, it contains a sequence of objects, in the order that they were saved.
String objects are saved as
| two-byte length | characters |
For example, the string “Harry” is saved as
74 00 05 Harry
The Unicode characters of the string are saved in “modified UTF-8” format.
When an object is saved, the class of that object must be saved as well. The class description contains
The name of the class;
The serial version unique ID, which is a fingerprint of the data field types and method signatures;
A set of flags describing the serialization method; and
A description of the data fields.
Java gets the fingerprint by
Ordering descriptions of the class, superclass, interfaces, field types, and method signatures in a canonical way;
Then applying the so-called Secure Hash Algorithm (SHA) to that data.
SHA is a fast algorithm that gives a “fingerprint” to a larger block of information. This fingerprint is always a 20-byte data packet, regardless of the size of the original data. It is created by a clever sequence of bit operations on the data that makes it essentially 100 percent certain that the fingerprint will change if the information is altered in any way. SHA is a U.S. standard, recommended by the National Institute for Science and Technology (NIST). (For more details on SHA, see, for example, Cryptography and Network Security: Principles and Practice, by William Stallings [Prentice Hall, 2002].) However, Java uses only the first 8 bytes of the SHA code as a class fingerprint. It is still very likely that the class fingerprint will change if the data fields or methods change in any way.
Java can then check the class fingerprint to protect us from the following scenario: An object is saved to a disk file. Later, the designer of the class makes a change, for example, by removing a data field. Then, the old disk file is read in again. Now the data layout on the disk no longer matches the data layout in memory. If the data were read back in its old form, it could corrupt memory. Java takes great care to make such memory corruption close to impossible. Hence, it checks, using the fingerprint, that the class definition has not changed when it restores an object. It does this by comparing the fingerprint on disk with the fingerprint of the current class.
Technically, as long as the data layout of a class has not changed, it ought to be safe to read objects back in. But Java is conservative and checks that the methods have not changed either. (After all, the methods describe the meaning of the stored data.) Of course, in practice, classes do evolve, and it may be necessary for a program to read in older versions of objects. We discuss this later in the section entitled “Versioning” on page 679.
Here is how a class identifier is stored:
72
2-byte length of class name
class name
8-byte fingerprint
1-byte flag
2-byte count of data field descriptors
data field descriptors
78
(end marker)
superclass type (70
if none)
The flag byte is composed of three bit masks, defined in java.io.ObjectStreamConstants
:
static final byte SC_WRITE_METHOD = 1; // class has writeObject method that writes additional data static final byte SC_SERIALIZABLE = 2; // class implements Serializable interface static final byte SC_EXTERNALIZABLE = 4; // class implements Externalizable interface
We discuss the Externalizable
interface later in this chapter. Externalizable classes supply custom read and write methods that take over the output of their instance fields. The classes that we write implement the Serializable
interface and will have a flag value of 02
. The java.util.Date
class defines its own readObject
/writeObject
methods and has a flag of 03
.
Each data field descriptor has the format:
1-byte type code
2-byte length of field name
field name
class name (if field is an object)
The type code is one of the following:
| byte |
| char |
| double |
| float |
| int |
| long |
| object |
| short |
| boolean |
| array |
When the type code is L
, the field name is followed by the field type. Class and field name strings do not start with the string code 74
, but field types do. Field types use a slightly different encoding of their names, namely, the format used by native methods. (See Volume 2 for native methods.)
For example, the salary field of the Employee
class is encoded as:
D 00 06 salary
Here is the complete class descriptor of the Employee
class:
| ||
| Fingerprint and flags | |
| Number of instance fields | |
| Instance field type and name | |
| Instance field type and name | |
| Instance field class name— | |
| Instance field type and name | |
| Instance field class name— | |
| End marker | |
| No superclass |
These descriptors are fairly long. If the same class descriptor is needed again in the file, then an abbreviated form is used:
| 4-byte serial number |
The serial number refers to the previous explicit class descriptor. We discuss the numbering scheme later.
An object is stored as
| class descriptor | object data |
For example, here is how an Employee
object is stored:
|
| |
|
| |
| Existing class | |
| External storage—details later | |
| name field value— |
As you can see, the data file contains enough information to restore the Employee
object.
Arrays are saved in the following format:
| class descriptor | 4-byte number of entries | entries |
The array class name in the class descriptor is in the same format as that used by native methods (which is slightly different from the class name used by class names in other class descriptors). In this format, class names start with an L
and end with a semicolon.
For example, an array of three Employee
objects starts out like this:
| Array | ||
| New class, string length, class name | ||
| Fingerprint and flags | ||
| Number of instance fields | ||
| End marker | ||
| No superclass | ||
| Number of array entries |
Note that the fingerprint for an array of Employee
objects is different from a fingerprint of the Employee
class itself.
Of course, studying these codes can be about as exciting as reading the average phone book. But it is still instructive to know that the object stream contains a detailed description of all the objects that it contains, with sufficient detail to allow reconstruction of both objects and arrays of objects.
We now know how to save objects that contain numbers, strings, or other simple objects. However, there is one important situation that we still need to consider. What happens when one object is shared by several objects as part of its state?
To illustrate the problem, let us make a slight modification to the Manager
class. Let’s assume that each manager has a secretary, implemented as an instance variable secretary
of type Employee
. (It would make sense to derive a class Secretary
from Employee
for this purpose, but we don’t do that here.)
class Manager extends Employee { . . . private Employee secretary; }
Having done this, you must keep in mind that the Manager
object now contains a reference to the Employee
object that describes the secretary, not a separate copy of the object.
In particular, two managers can share the same secretary, as is the case in Figure 12–6 and the following code:
harry = new Employee("Harry Hacker", . . .); Manager carl = new Manager("Carl Cracker", . . .); carl.setSecretary(harry); Manager tony = new Manager("Tony Tester", . . .); tony.setSecretary(harry);
Now, suppose we write the employee data to disk. What we don’t want is for the Manager
to save its information according to the following logic:
Save employee data;
Save secretary data.
Then, the data for harry
would be saved three times. When reloaded, the objects would have the configuration shown in Figure 12–7.
This is not what we want. Suppose the secretary gets a raise. We would not want to hunt for all other copies of that object and apply the raise as well. We want to save and restore only one copy of the secretary. To do this, we must copy and restore the original references to the objects. In other words, we want the object layout on disk to be exactly like the object layout in memory. This is called persistence in object-oriented circles.
Of course, we cannot save and restore the memory addresses for the secretary objects. When an object is reloaded, it will likely occupy a completely different memory address than it originally did.
Instead, Java uses a serialization approach. Hence, the name object serialization for this mechanism. Here is the algorithm:
All objects that are saved to disk are given a serial number (1, 2, 3, and so on, as shown in Figure 12–8).
When saving an object to disk, find out if the same object has already been stored.
If it has been stored previously, just write “same as previously saved object with serial number x.” If not, store all its data.
When reading back the objects, simply reverse the procedure. For each object that you load, note its sequence number and remember where you put it in memory. When you encounter the tag “same as previously saved object with serial number x,” you look up where you put the object with serial number x and set the object reference to that memory address.
Note that the objects need not be saved in any particular order. Figure 12–9 shows what happens when a manager occurs first in the staff array.
All of this sounds confusing, and it is. Fortunately, when object streams are used, the process is also completely automatic. Object streams assign the serial numbers and keep track of duplicate objects. The exact numbering scheme is slightly different from that used in the figures—see the next section.
In this chapter, we use serialization to save a collection of objects to a disk file and retrieve it exactly as we stored it. Another very important application is the transmittal of a collection of objects across a network connection to another computer. Just as raw memory addresses are meaningless in a file, they are also meaningless when communicating with a different processor. Because serialization replaces memory addresses with serial numbers, it permits the transport of object collections from one machine to another. We study that use of serialization when discussing remote method invocation in Volume 2.
Example 12–5 is a program that saves and reloads a network of Employee
and Manager
objects (some of which share the same employee as a secretary). Note that the secretary object is unique after reloading—when newStaff[1]
gets a raise, that is reflected in the secretary
fields of the managers.
Example 12–5. ObjectRefTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. class ObjectRefTest 5. { 6. public static void main(String[] args) 7. { 8. Employee harry = new Employee("Harry Hacker", 50000, 1989, 10, 1); 9. Manager boss = new Manager("Carl Cracker", 80000, 1987, 12, 15); 10. boss.setSecretary(harry); 11. 12. Employee[] staff = new Employee[3]; 13. 14. staff[0] = boss; 15. staff[1] = harry; 16. staff[2] = new Employee("Tony Tester", 40000, 1990, 3, 15); 17. 18. try 19. { 20. // save all employee records to the file employee.dat 21. ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream ("employee.dat")); 22. out.writeObject(staff); 23. out.close(); 24. 25. // retrieve all records into a new array 26. ObjectInputStream in = new ObjectInputStream(new FileInputStream("employee .dat")); 27. Employee[] newStaff = (Employee[]) in.readObject(); 28. in.close(); 29. 30. // raise secretary's salary 31. newStaff[1].raiseSalary(10); 32. 33. // print the newly read employee records 34. for (Employee e : newStaff) 35. System.out.println(e); 36. } 37. catch (Exception e) 38. { 39. e.printStackTrace(); 40. } 41. } 42. } 43. 44. class Employee implements Serializable 45. { 46. public Employee() {} 47. 48. public Employee(String n, double s, int year, int month, int day) 49. { 50. name = n; 51. salary = s; 52. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 53. hireDay = calendar.getTime(); 54. } 55. 56. public String getName() 57. { 58. return name; 59. } 60. 61. public double getSalary() 62. { 63. return salary; 64. } 65. 66. public Date getHireDay() 67. { 68. return hireDay; 69. } 70. 71. public void raiseSalary(double byPercent) 72. { 73. double raise = salary * byPercent / 100; 74. salary += raise; 75. } 76. 77. public String toString() 78. { 79. return getClass().getName() 80. + "[name=" + name 81. + ",salary=" + salary 82. + ",hireDay=" + hireDay 83. + "]"; 84. } 85. 86. private String name; 87. private double salary; 88. private Date hireDay; 89. } 90. 91. class Manager extends Employee 92. { 93. /** 94. Constructs a Manager without a secretary 95. @param n the employee's name 96. @param s the salary 97. @param year the hire year 98. @param month the hire month 99. @param day the hire day 100. */ 101. public Manager(String n, double s, int year, int month, int day) 102. { 103. super(n, s, year, month, day); 104. secretary = null; 105. } 106. 107. /** 108. Assigns a secretary to the manager. 109. @param s the secretary 110. */ 111. public void setSecretary(Employee s) 112. { 113. secretary = s; 114. } 115. 116. public String toString() 117. { 118. return super.toString() 119. + "[secretary=" + secretary 120. + "]"; 121. } 122. 123. private Employee secretary; 124. }
This section continues the discussion of the output format of object streams. If you skipped the previous discussion, you should skip this section as well.
All objects (including arrays and strings) and all class descriptors are given serial numbers as they are saved in the output file. This process is referred to as serialization because every saved object is assigned a serial number. (The count starts at 00 7E 00 00
.)
We already saw that a full class descriptor for any given class occurs only once. Subsequent descriptors refer to it. For example, in our previous example, a repeated reference to the Date
class was coded as
71 00 7E 00 08
The same mechanism is used for objects. If a reference to a previously saved object is written, it is saved in exactly the same way, that is, 71
followed by the serial number. It is always clear from the context whether the particular serial reference denotes a class descriptor or an object.
Finally, a null reference is stored as
70
Here is the commented output of the ObjectRefTest
program of the preceding section. If you like, run the program, look at a hex dump of its data file employee.dat
, and compare it with the commented listing. The important lines toward the end of the output show the reference to a previously saved object.
| File header | ||||
| Array | ||||
| New class, string length, class name | ||||
| Fingerprint and flags | ||||
| Number of instance fields | ||||
| End marker | ||||
| No superclass | ||||
| Number of array entries | ||||
|
| ||||
| New class, string length, class name (serial #2) | ||||
| Fingerprint and flags | ||||
| Number of data fields | ||||
| Instance field type and name | ||||
| Instance field class name— | ||||
| End marker | ||||
| Superclass—new class, string length, class name (serial #4) | ||||
| Fingerprint and flags | ||||
| Number of instance fields | ||||
| Instance field type and name | ||||
| Instance field type and name | ||||
| Instance field class name— | ||||
| Instance field type and name | ||||
| Instance field class name—String (serial #6) | ||||
| End marker | ||||
| No superclass | ||||
|
| ||||
|
| ||||
| New class, string length, class name (serial #8) | ||||
| Fingerprint and flags | ||||
| No instance variables | ||||
| End marker | ||||
| No superclass | ||||
| External storage, number of bytes | ||||
| Date | ||||
| End marker | ||||
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
| ||||
| Existing class (use serial #8) | ||||
| External storage, number of bytes | ||||
| Date | ||||
| End marker | ||||
| name field value— | ||||
|
| ||||
|
| ||||
| Existing class (use serial #4) | ||||
| salary field value—double | ||||
| hireDay field value—new object (serial #15) | ||||
| Existing class (use serial #8) | ||||
| External storage, number of bytes | ||||
| Date | ||||
| End marker | ||||
| name field value— |
It is usually not important to know the exact file format (unless you are trying to create an evil effect by modifying the data). What you should remember is this:
The object stream output contains the types and data fields of all objects.
Each object is assigned a serial number.
Repeated occurrences of the same object are stored as references to that serial number.
Certain data fields should never be serialized, for example, integer values that store file handles or handles of windows that are only meaningful to native methods. Such information is guaranteed to be useless when you reload an object at a later time or transport it to a different machine. In fact, improper values for such fields can actually cause native methods to crash. Java has an easy mechanism to prevent such fields from ever being serialized. Mark them with the keyword transient
. You also need to tag fields as transient
if they belong to nonserializable classes. Transient fields are always skipped when objects are serialized.
The serialization mechanism provides a way for individual classes to add validation or any other desired action to the default read and write behavior. A serializable class can define methods with the signature
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException; private void writeObject(ObjectOutputStream out) throws IOException;
Then, the data fields are no longer automatically serialized, and these methods are called instead.
Here is a typical example. A number of classes in the java.awt.geom
package, such as Point2D.Double
, are not serializable. Now suppose you want to serialize a class LabeledPoint
that stores a String
and a Point2D.Double
. First, you need to mark the Point2D.Double
field as transient
to avoid a NotSerializableException
.
public class LabeledPoint implements Serializable { . . . private String label; private transient Point2D.Double point; }
In the writeObject
method, we first write the object descriptor and the String
field, state, by calling the defaultWriteObject
method. This is a special method of the ObjectOutputStream
class that can only be called from within a writeObject
method of a serializable class. Then we write the point coordinates, using the standard DataOutput
calls.
private void writeObject(ObjectOutputStream out) throws IOException { out.defaultWriteObject(); out.writeDouble(point.getX()); out.writeDouble(point.getY()); }
In the readObject
method, we reverse the process:
private void readObject(ObjectInputStream in) throws IOException { in.defaultReadObject(); double x = in.readDouble(); double y = in.readDouble(); point = new Point2D.Double(x, y); }
Another example is the java.util.Date
class that supplies its own readObject
and writeObject
methods. These methods write the date as a number of milliseconds from the epoch (January 1, 1970, midnight UTC). The Date
class has a complex internal representation that stores both a Calendar
object and a millisecond count, to optimize lookups. The state of the Calendar
is redundant and does not have to be saved.
The readObject
and writeObject
methods only need to save and load their data fields. They should not concern themselves with superclass data or any other class information.
Rather than letting the serialization mechanism save and restore object data, a class can define its own mechanism. To do this, a class must implement the Externalizable
interface. This in turn requires it to define two methods:
public void readExternal(ObjectInputStream in) throws IOException, ClassNotFoundException; public void writeExternal(ObjectOutputStream out) throws IOException;
Unlike the readObject
and writeObject
methods that were described in the preceding section, these methods are fully responsible for saving and restoring the entire object, including the superclass data. The serialization mechanism merely records the class of the object in the stream. When reading an externalizable object, the object stream creates an object with the default constructor and then calls the readExternal
method. Here is how you can implement these methods for the Employee
class:
public void readExternal(ObjectInput s) throws IOException { name = s.readUTF(); salary = s.readDouble(); hireDay = new Date(s.readLong()); } public void writeExternal(ObjectOutput s) throws IOException { s.writeUTF(name); s.writeDouble(salary); s.writeLong(hireDay.getTime()); }
Serialization is somewhat slow because the virtual machine must discover the structure of each object. If you are concerned about performance and if you read and write a large number of objects of a particular class, you should investigate the use of the Externalizable
interface. The tech tip http://developer.java.sun.com/developer/TechTips/txtarchive/Apr00_Stu.txt demonstrates that in the case of an employee class, using external reading and writing was about 35%–40% faster than the default serialization.
Unlike the readObject
and writeObject
methods, which are private and can only be called by the serialization mechanism, the readExternal
and writeExternal
methods are public. In particular, readExternal
potentially permits modification of the state of an existing object.
For even more exotic variations of serialization, see http://www.absolutejava.com/serialization.
You have to pay particular attention when serializing and deserializing objects that are assumed to be unique. This commonly happens when you are implementing singletons and typesafe enumerations.
If you use the enum
construct of JDK 5.0, then you need not worry about serialization—it just works. However, suppose you maintain legacy code that contains an enumerated type such as
public class Orientation { public static final Orientation HORIZONTAL = new Orientation(1); public static final Orientation VERTICAL = new Orientation(2); private Orientation(int v) { value = v; } private int value; }
This idiom was common before enumerations were added to the Java language. Note that the constructor is private. Thus, no objects can be created beyond Orientation.HORIZONTAL
and Orientation.VERTICAL
. In particular, you can use the ==
operator to test for object equality:
if (orientation == Orientation.HORIZONTAL) . . .
There is an important twist that you need to remember when a typesafe enumeration implements the Serializable
interface. The default serialization mechanism is not appropriate. Suppose we write a value of type Orientation
and read it in again:
Orientation original = Orientation.HORIZONTAL; ObjectOutputStream out = . . .; out.write(value); out.close(); ObjectInputStream in = . . .; Orientation saved = (Orientation) in.read();
Now the test
if (saved == Orientation.HORIZONTAL) . . .
will fail. In fact, the saved
value is a completely new object of the Orientation
type and not equal to any of the predefined constants. Even though the constructor is private, the serialization mechanism can create new objects!
To solve this problem, you need to define another special serialization method, called readResolve
. If the readResolve
method is defined, it is called after the object is deserialized. It must return an object that then becomes the return value of the readObject
method. In our case, the readResolve
method will inspect the value
field and return the appropriate enumerated constant:
protected Object readResolve() throws ObjectStreamException { if (value == 1) return Orientation.HORIZONTAL; if (value == 2) return Orientation.VERTICAL; return null; // this shouldn't happen }
Remember to add a readResolve
method to all typesafe enumerations in your legacy code and to all classes that follow the singleton design pattern.
In the previous sections, we showed you how to save relatively small collections of objects by means of an object stream. But those were just demonstration programs. With object streams, it helps to think big. Suppose you write a program that lets the user produce a document. This document contains paragraphs of text, tables, graphs, and so on. You can stream out the entire document object with a single call to writeObject
:
out.writeObject(doc);
The paragraph, table, and graph objects are automatically streamed out as well. One user of your program can then give the output file to another user who also has a copy of your program, and that program loads the entire document with a single call to readObject
:
doc = (Document) in.readObject();
This is very useful, but your program will inevitably change, and you will release a version 1.1. Can version 1.1 read the old files? Can the users who still use 1.0 read the files that the new version is now producing? Clearly, it would be desirable if object files could cope with the evolution of classes.
At first glance it seems that this would not be possible. When a class definition changes in any way, then its SHA fingerprint also changes, and you know that object streams will refuse to read in objects with different fingerprints. However, a class can indicate that it is compatible with an earlier version of itself. To do this, you must first obtain the fingerprint of the earlier version of the class. You use the stand-alone serialver
program that is part of the JDK to obtain this number. For example, running
serialver Employee
prints
Employee: static final long serialVersionUID = -1814239825517340645L;
If you start the serialver
program with the -show
option, then the program brings up a graphical dialog box (see Figure 12–10).
All later versions of the class must define the serialVersionUID
constant to the same fingerprint as the original.
class Employee implements Serializable // version 1.1 { . . . public static final long serialVersionUID = -1814239825517340645L; }
When a class has a static data member named serialVersionUID
, it will not compute the fingerprint manually but instead will use that value.
Once that static data member has been placed inside a class, the serialization system is now willing to read in different versions of objects of that class.
If only the methods of the class change, then there is no problem with reading the new object data. However, if data fields change, then you may have problems. For example, the old file object may have more or fewer data fields than the one in the program, or the types of the data fields may be different. In that case, the object stream makes an effort to convert the stream object to the current version of the class.
The object stream compares the data fields of the current version of the class with the data fields of the version in the stream. Of course, the object stream considers only the nontransient and nonstatic data fields. If two fields have matching names but different types, then the object stream makes no effort to convert one type to the other—the objects are incompatible. If the object in the stream has data fields that are not present in the current version, then the object stream ignores the additional data. If the current version has data fields that are not present in the streamed object, the added fields are set to their default (null
for objects, zero for numbers, and false
for Boolean values).
Here is an example. Suppose we have saved a number of employee records on disk, using the original version (1.0) of the class. Now we change the Employee
class to version 2.0 by adding a data field called department
. Figure 12–11 shows what happens when a 1.0 object is read into a program that uses 2.0 objects. The department field is set to null
. Figure 12–12 shows the opposite scenario: a program using 1.0 objects reads a 2.0 object. The additional department
field is ignored.
Is this process safe? It depends. Dropping a data field seems harmless—the recipient still has all the data that it knew how to manipulate. Setting a data field to null
may not be so safe. Many classes work hard to initialize all data fields in all constructors to non-null
values, so that the methods don’t have to be prepared to handle null
data. It is up to the class designer to implement additional code in the readObject
method to fix version incompatibilities or to make sure the methods are robust enough to handle null
data.
There is an amusing (and, occasionally, very useful) use for the serialization mechanism: it gives you an easy way to clone an object provided the class is serializable. (Recall from Chapter 6 that you need to do a bit of work to allow an object to be cloned.)
To clone a serializable object, simply serialize it to an output stream and then read it back in. The result is a new object that is a deep copy of the existing object. You don’t have to write the object to a file—you can use a ByteArrayOutputStream
to save the data into a byte array.
As Example 12–6 shows, to get clone
for free, simply extend the SerialCloneable
class, and you are done.
You should be aware that this method, although clever, will usually be much slower than a clone method that explicitly constructs a new object and copies or clones the data fields (as you saw in Chapter 6).
Example 12–6. SerialCloneTest.java
1. import java.io.*; 2. import java.util.*; 3. 4. public class SerialCloneTest 5. { 6. public static void main(String[] args) 7. { 8. Employee harry = new Employee("Harry Hacker", 35000, 1989, 10, 1); 9. // clone harry 10. Employee harry2 = (Employee) harry.clone(); 11. 12. // mutate harry 13. harry.raiseSalary(10); 14. 15. // now harry and the clone are different 16. System.out.println(harry); 17. System.out.println(harry2); 18. } 19. } 20. 21. /** 22. A class whose clone method uses serialization. 23. */ 24. class SerialCloneable implements Cloneable, Serializable 25. { 26. public Object clone() 27. { 28. try 29. { 30. // save the object to a byte array 31. ByteArrayOutputStream bout = new ByteArrayOutputStream(); 32. ObjectOutputStream out = new ObjectOutputStream(bout); 33. out.writeObject(this); 34. out.close(); 35. 36. // read a clone of the object from the byte array 37. ByteArrayInputStream bin = new ByteArrayInputStream(bout.toByteArray()); 38. ObjectInputStream in = new ObjectInputStream(bin); 39. Object ret = in.readObject(); 40. in.close(); 41. 42. return ret; 43. } 44. catch (Exception e) 45. { 46. return null; 47. } 48. } 49. } 50. 51. /** 52. The familiar Employee class, redefined to extend the 53. SerialCloneable class. 54. */ 55. class Employee extends SerialCloneable 56. { 57. public Employee(String n, double s, int year, int month, int day) 58. { 59. name = n; 60. salary = s; 61. GregorianCalendar calendar = new GregorianCalendar(year, month - 1, day); 62. hireDay = calendar.getTime(); 63. } 64. 65. public String getName() 66. { 67. return name; 68. } 69. 70. public double getSalary() 71. { 72. return salary; 73. } 74. 75. public Date getHireDay() 76. { 77. return hireDay; 78. } 79. 80. public void raiseSalary(double byPercent) 81. { 82. double raise = salary * byPercent / 100; 83. salary += raise; 84. } 85. 86. public String toString() 87. { 88. return getClass().getName() 89. + "[name=" + name 90. + ",salary=" + salary 91. + ",hireDay=" + hireDay 92. + "]"; 93. } 94. 95. private String name; 96. private double salary; 97. private Date hireDay; 98. }
You have learned how to read and write data from a file. However, there is more to file management than reading and writing. The File
class encapsulates the functionality that you will need in order to work with the file system on the user’s machine. For example, you use the File
class to find out when a file was last modified or to remove or rename the file. In other words, the stream classes are concerned with the contents of the file, whereas the File
class is concerned with the storage of the file on a disk.
As is so often the case in Java, the File
class takes the least common denominator approach. For example, under Windows, you can find out (or set) the read-only flag for a file, but while you can find out if it is a hidden file, you can’t hide it without using a native method (see Volume 2).
The simplest constructor for a File
object takes a (full) file name. If you don’t supply a path name, then Java uses the current directory. For example,
File f = new File("test.txt");
gives you a file object with this name in the current directory. (The “current directory” is the current directory of the process that executes the virtual machine. If you launched the virtual machine from the command line, it is the directory from which you started the java
executable.)
A call to this constructor does not create a file with this name if it doesn’t exist. Actually, creating a file from a File
object is done with one of the stream class constructors or the createNewFile
method in the File
class. The createNewFile
method only creates a file if no file with that name exists, and it returns a boolean
to tell you whether it was successful.
On the other hand, once you have a File
object, the exists
method in the File
class tells you whether a file exists with that name. For example, the following trial program would almost certainly print “false” on anyone’s machine, and yet it can print out a path name to this nonexistent file.
import java.io.*; public class Test { public static void main(String args[]) { File f = new File("afilethatprobablydoesntexist"); System.out.println(f.getAbsolutePath()); System.out.println(f.exists()); } }
There are two other constructors for File
objects:
File(String path, String name)
which creates a File
object with the given name in the directory specified by the path
parameter. (If the path
parameter is null
, this constructor creates a File
object, using the current directory.)
Finally, you can use an existing File
object in the constructor:
File(File dir, String name)
where the File
object represents a directory and, as before, if dir
is null
, the constructor creates a File
object in the current directory.
Somewhat confusingly, a File
object can represent either a file or a directory (perhaps because the operating system that the Java designers were most familiar with happens to implement directories as files). You use the isDirectory
and isFile
methods to tell whether the file object represents a file or a directory. This is surprising—in an object-oriented system, you might have expected a separate Directory
class, perhaps extending the File
class.
To make an object representing a directory, you simply supply the directory name in the File
constructor:
File tempDir = new File(File.separator + "temp");
If this directory does not yet exist, you can create it with the mkdir
method:
tempDir.mkdir();
If a file object represents a directory, use list()
to get an array of the file names in that directory. The program in Example 12–7 uses all these methods to print out the directory substructure of whatever path is entered on the command line. (It would be easy enough to change this program into a utility class that returns a vector of the subdirectories for further processing.)
Always use File
objects, not strings, when manipulating file or directory names. For example, the equals
method of the File
class knows that some file systems are not case significant and that a trailing /
in a directory name doesn’t matter.
Example 12–7. FindDirectories.java
1. import java.io.*; 2. 3. public class FindDirectories 4. { 5. public static void main(String[] args) 6. { 7. // if no arguments provided, start at the parent directory 8. if (args.length == 0) args = new String[] { ".." }; 9. 10. try 11. { 12. File pathName = new File(args[0]); 13. String[] fileNames = pathName.list(); 14. 15. // enumerate all files in the directory 16. for (int i = 0; i < fileNames.length; i++) 17. { 18. File f = new File(pathName.getPath(), fileNames[i]); 19. 20. // if the file is again a directory, call the main method recursively 21. if (f.isDirectory()) 22. { 23. System.out.println(f.getCanonicalPath()); 24. main(new String [] { f.getPath() }); 25. } 26. } 27. } 28. catch(IOException e) 29. { 30. e.printStackTrace(); 31. } 32. } 33. }
Rather than listing all files in a directory, you can use a FileNameFilter
object as a parameter to the list
method to narrow down the list. These objects are simply instances of a class that satisfies the FilenameFilter
interface.
All a class needs to do to implement the FilenameFilter
interface is define a method called accept
. Here is an example of a simple FilenameFilter
class that allows only files with a specified extension:
public class ExtensionFilter implements FilenameFilter { public ExtensionFilter(String ext) { extension = "." + ext; } public boolean accept(File dir, String name) { return name.endsWith(extension); } private String extension; }
When writing portable programs, it is a challenge to specify file names with subdirectories. As we mentioned earlier, it turns out that you can use a forward slash (the UNIX separator) as the directory separator in Windows as well, but other operating systems might not permit this, so we don’t recommend using a forward slash.
If you do use forward slashes as directory separators in Windows when constructing a File
object, the getAbsolutePath
method returns a file name that contains forward slashes, which will look strange to Windows users. Instead, use the getCanonicalPath
method—it replaces the forward slashes with backslashes.
It is much better to use the information about the current directory separator that the File
class stores in a static instance field called separator
. (In a Windows environment, this is a backslash (); in a UNIX environment, it is a forward slash (
/
). For example:
File foo = new File("Documents" + File.separator + "data.txt")
Of course, if you use the second alternate version of the File
constructor
File foo = new File("Documents", "data.txt")
then the constructor will supply the correct separator.
The API notes that follow give you what we think are the most important remaining methods of the File
class; their use should be straightforward.
boolean canRead()
indicates whether the file can be read by the current application.
boolean canWrite()
indicates whether the file is writable or read-only.
static boolean createTempFile(String prefix, String suffix)
1.2
static boolean createTempFile(String prefix, String suffix, File directory)
1.2
create a temporary file in the system’s default temp directory or the given directory, using the given prefix and suffix to generate the temporary name.
Parameters: |
| A prefix string that is at least three characters long |
| An optional suffix. If | |
| The directory in which the file is created. If it is |
boolean delete()
tries to delete the file. Returns true
if the file was deleted, false
otherwise.
void deleteOnExit()
requests that the file be deleted when the virtual machine shuts down.
boolean exists()
true
if the file or directory exists; false
otherwise.
String getAbsolutePath()
returns a string that contains the absolute path name. Tip: Use getCanonicalPath
instead.
File getCanonicalFile()
1.2
returns a File
object that contains the canonical path name for the file. In particular, redundant “.
” directories are removed, the correct directory separator is used, and the capitalization preferred by the underlying file system is obtained.
String getCanonicalPath()
1.1
returns a string that contains the canonical path name. In particular, redundant “.
” directories are removed, the correct directory separator is used, and the capitalization preferred by the underlying file system is obtained.
String getName()
returns a string that contains the file name of the File
object (does not include path information).
String getParent()
returns a string that contains the name of the parent of this File
object. If this File
object is a file, then the parent is the directory containing it. If it is a directory, then the parent is the parent directory or null
if there is no parent directory.
File getParentFile()
1.2
returns a File
object for the parent of this File
directory. See getParent
for a definition of “parent.”
String getPath()
returns a string that contains the path name of the file.
boolean isDirectory()
returns true
if the File
represents a directory; false
otherwise.
boolean isFile()
returns true
if the File
object represents a file as opposed to a directory or a device.
boolean isHidden()
1.2
returns true
if the File
object represents a hidden file or directory.
long lastModified()
returns the time the file was last modified (counted in milliseconds since Midnight January 1, 1970 GMT), or 0 if the file does not exist. Use the Date(long)
constructor to convert this value to a date.
long length()
returns the length of the file in bytes, or 0 if the file does not exist.
String[] list()
returns an array of strings that contain the names of the files and directories contained by this File
object, or null
if this File
was not representing a directory.
String[] list(FilenameFilter filter)
returns an array of the names of the files and directories contained by this File
that satisfy the filter, or null
if none exist.
Parameters: |
| The FilenameFilter object to use |
File[] listFiles()
1.2
returns an array of File
objects corresponding to the files and directories contained by this File
object, or null
if this File
was not representing a directory.
File[] listFiles(FilenameFilter filter)
1.2
returns an array of File
objects for the files and directories contained by this File
that satisfy the filter, or null
if none exist.
Parameters: |
| The |
static File[] listRoots()
1.2
returns an array of File
objects corresponding to all the available file roots. (For example, on a Windows system, you get the File
objects representing the installed drives (both local drives and mapped network drives). On a UNIX system, you simply get “/
”.)
boolean createNewFile()
1.2
automatically makes a new file whose name is given by the File
object if no file with that name exists. That is, the checking for the file name and the creation are not interrupted by other file system activity. Returns true
if the method created the file.
boolean mkdir()
makes a subdirectory whose name is given by the File
object. Returns true
if the directory was successfully created; false
otherwise.
boolean mkdirs()
unlike mkdir
, creates the parent directories if necessary. Returns false
if any of the necessary directories could not be created.
boolean renameTo(File dest)
returns true
if the name was changed; false
otherwise.
Parameters: |
| A File object that specifies the new name |
boolean setLastModified(long time)
1.2
sets the last modified time of the file. Returns true
if successful, false
otherwise.
Parameters: |
| A long integer representing the number of milliseconds since Midnight January 1, 1970, GMT. Use the getTime method of the Date class to calculate this value |
boolean setReadOnly()
1.2
sets the file to be read-only. Returns true
if successful, false
otherwise.
URL toURL()
1.2
converts the File
object to a file URL
.
boolean accept(File dir, String name)
should be defined to return true
if the file matches the filter criterion.
Parameters: |
| A File object representing the directory that contains the file |
| The name of the file |
JDK 1.4 contains a number of features for improved input/output processing, collectively called the “new I/O,” in the java.nio
package. (Of course, the “new” moniker is somewhat regrettable because, a few years down the road, the package won’t be new any longer.)
The package includes support for the following features:
Memory-mapped files
File locking
Character set encoders and decoders
Nonblocking I/O
We already introduced character sets on page 633. In this section, we discuss only the first two features. Nonblocking I/O requires the use of threads, which are covered in Volume 2.
Most operating systems can take advantage of the virtual memory implementation to “map” a file, or a region of a file, into memory. Then the file can be accessed as if it were an in-memory array, which is much faster than the traditional file operations.
At the end of this section, you can find a program that computes the CRC32 checksum of a file, using traditional file input and a memory-mapped file. On one machine, we got the timing data shown in Table 12–7 when computing the checksum of the 37-Mbyte file rt.jar
in the jre/lib
directory of the JDK.
As you can see, on this particular machine, memory mapping is a bit faster than using buffered sequential input and dramatically faster than using a RandomAccessFile
.
Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain can be substantial if you need to use random access. For sequential reading of files of moderate size, on the other hand, there is no reason to use memory mapping.
The java.nio
package makes memory mapping quite simple. Here is what you do.
First, get a channel from the file. A channel is an abstraction for disk files that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files. You get a channel by calling the getChannel
method that has been added to the FileInputStream
, FileOutputStream
, and RandomAccessFile
class.
FileInputStream in = new FileInputStream(. . .); FileChannel channel = in.getChannel();
Then you get a MappedByteBuffer
from the channel by calling the map
method of the FileChannel
class. You specify the area of the file that you want to map and a mapping mode. Three modes are supported:
FileChannel.MapMode.READ_ONLY
: The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.
FileChannel.MapMode.READ_WRITE
: The resulting buffer is writable, and the changes will be written back to the file at some time. Note that other programs that have mapped the same file may not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs is operating-system dependent.
FileChannel.MapMode.PRIVATE
: The resulting buffer is writable, but any changes are private to this buffer and are not propagated to the file.
Once you have the buffer, you can read and write data, using the methods of the ByteBuffer
class and the Buffer
superclass.
Buffers support both sequential and random data access. A buffer has a position that is advanced by get
and put
operations. For example, you can sequentially traverse all bytes in the buffer as
while (buffer.hasRemaining()) { byte b = buffer.get(); . . . }
Alternatively, you can use random access:
for (int i = 0; i < buffer.limit(); i++) { byte b = buffer.get(i); . . . }
You can also read and write arrays of bytes with the methods
get(byte[] bytes) get(byte[], int offset, int length)
Finally, there are methods
getInt getLong getShort getChar getFloat getDouble
to read primitive type values that are stored as binary values in the file. As we already mentioned, Java uses big-endian ordering for binary data. However, if you need to process a file containing binary numbers in little-endian order, simply call
buffer.order(ByteOrder.LITTLE_ENDIAN);
To find out the current byte order of a buffer, call
ByteOrder b = buffer.order()
To write numbers to a buffer, use one of the methods
putInt putLong putShort putChar putFloat putDouble
Example 12–8 computes the 32-bit cyclic redundancy checksum (CRC32) of a file. That quantity is a checksum that is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip
package contains a class CRC32
that computes the checksum of a sequence of bytes, using the following loop:
CRC32 crc = new CRC32();
while
(more bytes) crc.update
(next byte)long checksum = crc.getValue();
For a nice explanation of the CRC algorithm, see http://www.relisoft.com/Science/CrcMath.html.
The details of the CRC computation are not important. We just use it as an example of a useful file operation.
Run the program as
java NIOTest
filename
Example 12–8. NIOTest.java
1. import java.io.*; 2. import java.nio.*; 3. import java.nio.channels.*; 4. import java.util.zip.*; 5. 6. /** 7. This program computes the CRC checksum of a file. 8. Usage: java NIOTest filename 9. */ 10. public class NIOTest 11. { 12. public static long checksumInputStream(String filename) 13. throws IOException 14. { 15. InputStream in = new FileInputStream(filename); 16. CRC32 crc = new CRC32(); 17. 18. int c; 19. while((c = in.read()) != -1) 20. crc.update(c); 21. return crc.getValue(); 22. } 23. 24. public static long checksumBufferedInputStream(String filename) 25. throws IOException 26. { 27. InputStream in = new BufferedInputStream(new FileInputStream(filename)); 28. CRC32 crc = new CRC32(); 29. 30. int c; 31. while((c = in.read()) != -1) 32. crc.update(c); 33. return crc.getValue(); 34. } 35. 36. public static long checksumRandomAccessFile(String filename) 37. throws IOException 38. { 39. RandomAccessFile file = new RandomAccessFile(filename, "r"); 40. long length = file.length(); 41. CRC32 crc = new CRC32(); 42. 43. for (long p = 0; p < length; p++) 44. { 45. file.seek(p); 46. int c = file.readByte(); 47. crc.update(c); 48. } 49. return crc.getValue(); 50. } 51. 52. public static long checksumMappedFile(String filename) 53. throws IOException 54. { 55. FileInputStream in = new FileInputStream(filename); 56. FileChannel channel = in.getChannel(); 57. 58. CRC32 crc = new CRC32(); 59. int length = (int) channel.size(); 60. MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, length); 61. 62. for (int p = 0; p < length; p++) 63. { 64. int c = buffer.get(p); 65. crc.update(c); 66. } 67. return crc.getValue(); 68. } 69. 70. public static void main(String[] args) 71. throws IOException 72. { 73. System.out.println("Input Stream:"); 74. long start = System.currentTimeMillis(); 75. long crcValue = checksumInputStream(args[0]); 76. long end = System.currentTimeMillis(); 77. System.out.println(Long.toHexString(crcValue)); 78. System.out.println((end - start) + " milliseconds"); 79. 80. System.out.println("Buffered Input Stream:"); 81. start = System.currentTimeMillis(); 82. crcValue = checksumBufferedInputStream(args[0]); 83. end = System.currentTimeMillis(); 84. System.out.println(Long.toHexString(crcValue)); 85. System.out.println((end - start) + " milliseconds"); 86. 87. System.out.println("Random Access File:"); 88. start = System.currentTimeMillis(); 89. crcValue = checksumRandomAccessFile(args[0]); 90. end = System.currentTimeMillis(); 91. System.out.println(Long.toHexString(crcValue)); 92. System.out.println((end - start) + " milliseconds"); 93. 94. System.out.println("Mapped File:"); 95. start = System.currentTimeMillis(); 96. crcValue = checksumMappedFile(args[0]); 97. end = System.currentTimeMillis(); 98. System.out.println(Long.toHexString(crcValue)); 99. System.out.println((end - start) + " milliseconds"); 100. } 101. }
FileChannel getChannel()
1.4
returns a channel for accessing this stream.
java.io.FileOutputStream 1.0
FileChannel getChannel()
1.4
returns a channel for accessing this stream.
FileChannel getChannel()
1.4
returns a channel for accessing this file.
MappedByteBuffer map(FileChannel.MapMode mode, long position, long size)
maps a region of the file to memory.
Parameters: |
| One of the constants READ_ONLY, READ_WRITE, or PRIVATE in the FileChannel.MapMode class |
| The start of the mapped region | |
| The size of the mapped region |
boolean hasRemaining()
returns true
if the current buffer position has not yet reached the buffer’s limit position.
int limit()
returns the limit position of the buffer, that is, the first position at which no more values are available.
byte get()
gets a byte from the current position and advances the current position to the next byte.
byte get(int index)
gets a byte from the specified index.
ByteBuffer put(byte b)
puts a byte to the current position and advances the current position to the next byte. Returns a reference to this buffer.
ByteBuffer put(int index, byte b)
puts a byte at the specified index. Returns a reference to this buffer.
ByteBuffer get(byte[] destination)
ByteBuffer get(byte[] destination, int offset, int length)
fill a byte array, or a region of a byte array, with bytes from the buffer, and advance the current position by the number of bytes read. If not enough bytes remain in the buffer, then no bytes are read, and a BufferUnderflowException
is thrown. Return a reference to this buffer.
Parameters: |
| The byte array to be filled |
| The offset of the region to be filled | |
| The length of the region to be filled |
ByteBuffer put(byte[] source)
ByteBuffer put(byte[] source, int offset, int length)
put all bytes from a byte array, or the bytes from a region of a byte array, into the buffer, and advance the current position by the number of bytes read. If not enough bytes remain in the buffer, then no bytes are written, and a BufferOverflowException
is thrown. Returns a reference to this buffer.
Parameters: |
| The byte array to be written |
| The offset of the region to be written | |
| The length of the region to be written |
Xxx get
Xxx()
Xxx get
Xxx(int index)
ByteBuffer put
Xxx(
xxx value)
ByteBuffer put
Xxx(int index,
xxx value)
are used for relative and absolute reading and writing of binary numbers. Xxx is one of Int
, Long
, Short
, Char
, Float
, or Double
.
ByteBuffer order(ByteOrder order)
ByteOrder order()
set or get the byte order. The value for order
is one of the constants BIG_ENDIAN
or LITTLE_ENDIAN
of the ByteOrder
class.
When you use memory mapping, you make a single buffer that spans the entire file, or the area of the file in which you are interested. You can also use buffers to read and write more modest chunks of information.
In this section, we briefly describe the basic operations on Buffer
objects. A buffer is an array of values of the same type. The Buffer
class is an abstract class with concrete subclasses ByteBuffer
, CharBuffer
, DoubleBuffer
, FloatBuffer
, IntBuffer
, LongBuffer
, and ShortBuffer
. In practice, you will most commonly use ByteBuffer
and CharBuffer
. As shown in Figure 12–13, a buffer has
These values fulfill the condition
0 | mark | position | limit | capacity |
The principal purpose for a buffer is a “write, then read” cycle. At the outset, the buffer’s position is 0 and the limit is the capacity. Keep calling put
to add values to the buffer. When you run out of data or you reach the capacity, it is time to switch to reading.
Call flip
to set the limit to the current position and the position to 0. Now keep calling get
while the remaining
method (which returns limit - position) is positive. When you have read all values in the buffer, call clear
to prepare the buffer for the next writing cycle. The clear
method resets the position to 0 and the limit to the capacity.
If you want to re-read the buffer, use rewind
or mark/reset
—see the API notes for details.
Buffer clear()
prepares this buffer for writing by setting the position to zero and the limit to the capacity; returns this
.
Buffer flip()
prepares this buffer for reading by setting the limit to the position and the position to zero; returns this
.
Buffer rewind()
prepares this buffer for re-reading the same values by setting the position to zero and leaving the limit unchanged; returns this
.
Buffer mark()
sets the mark of this buffer to the position; returns this
.
Buffer reset()
sets the position of this buffer to the mark, thus allowing the marked portion to be read or written again; returns this
.
int remaining()
returns the remaining number of readable or writable values, that is, the difference between limit and position.
int position()
returns the position of this buffer.
int capacity()
returns the capacity of this buffer.
char get()
CharBuffer get(char[] destination)
CharBuffer get(char[] destination, int offset, int length)
gets one char
value, or a range of char
values, starting at the buffer’s position and moving the position past the characters that were read. The last two methods return this
.
CharBuffer put(char c)
CharBuffer put(char[] source)
CharBuffer put(char[] source, int offset, int length)
CharBuffer put(String source)
CharBuffer put(CharBuffer source)
puts one char
value, or a range of char
values, starting at the buffer’s position and advancing the position past the characters that were written. When reading from a CharBuffer
, all remaining characters are read. All methods return this
.
CharBuffer read(CharBuffer destination)
gets char
values from this buffer and puts them into the destination until the destination’s limit is reached. Returns this
.
Consider a situation in which multiple simultaneously executing programs need to modify the same file. Clearly, the programs need to communicate in some way, or the file can easily become damaged.
File locks control access to a file or a range of bytes within a file. However, file locking varies greatly among operating systems, which explains why file locking capabilities were absent from prior versions of the JDK.
Frankly, file locking is not all that common in application programs. Many applications use a database for data storage, and the database has mechanisms for resolving concurrent access problems. If you store information in flat files and are worried about concurrent access, you may well find it simpler to start using a database rather than designing complex file locking schemes.
Still, there are situations in which file locking is essential. Suppose your application saves a configuration file with user preferences. If a user invokes two instances of the application, it could happen that both of them want to write the configuration file at the same time. In that situation, the first instance should lock the file. When the second instance finds the file locked, it can decide to wait until the file is unlocked or simply skip the writing process.
To lock a file, call either the lock
or tryLock
method of the FileChannel
class:
FileLock lock = channel.lock();
or
FileLock lock = channel.tryLock();
The first call blocks until the lock becomes available. The second call returns immediately, either with the lock or null
if the lock is not available. The file remains locked until the channel is closed or the release
method is invoked on the lock.
You can also lock a portion of the file with the call
FileLock lock(long start, long size, boolean exclusive)
or
FileLock tryLock(long start, long size, boolean exclusive)
The exclusive
flag is true
to lock the file for both reading and writing. It is false
for a shared lock, which allows multiple processes to read from the file, while preventing any process
from acquiring an exclusive lock. Not all operating systems support shared locks. You may get an exclusive lock even if you just asked for a shared one. Call the isShared
method of the FileLock
class to find out which kind you have.
If you lock the tail portion of a file and the file subsequently grows beyond the locked portion, the additional area is not locked. To lock all bytes, use a size of Long.MAX_VALUE
.
Keep in mind that file locking is system dependent. Here are some points to watch for:
On some systems, file locking is merely advisory. If an application fails to get a lock, it may still write to a file that another application has currently locked.
On some systems, you cannot simultaneously lock a file and map it into memory.
File locks are held by the entire Java virtual machine. If two programs are launched by the same virtual machine (such as an applet or application launcher), then they can’t each acquire a lock on the same file. The lock
and tryLock
methods will throw an OverlappingFileLockException
if the virtual machine already holds another overlapping lock on the same file.
On some systems, closing a channel releases all locks on the underlying file held by the Java virtual machine. You should therefore avoid multiple channels on the same locked file.
Locking files on a networked file system is highly system dependent and should probably be avoided.
FileLock lock()
acquires an exclusive lock on the entire file. This method blocks until the lock is acquired.
FileLock tryLock()
acquires an exclusive lock on the entire file, or returns null
if the lock cannot be acquired.
FileLock lock(long position, long size, boolean shared)
FileLock tryLock(long position, long size, boolean shared)
acquire a lock on a region of the file. The first method blocks until the lock is acquired, and the second method returns null
if the lock cannot be acquired.
Parameters: |
| The start of the region to be locked |
| The size of the region to be locked | |
| true for a shared lock, false for an exclusive lock |
void release()
releases this lock.
Regular expressions are used to specify string patterns. You can use regular expressions whenever you need to locate strings that match a particular pattern. For example, one of our sample programs locates all hyperlinks in an HTML file by looking for strings of the pattern <a href="...">
.
Of course, for specifying a pattern, the ...
notation is not precise enough. You need to specify precisely what sequence of characters is a legal match. You need to use a special syntax whenever you describe a pattern.
Here is a simple example. The regular expression
[Jj]ava.+
matches any string of the following form:
The first letter is a J
or j
.
The next three letters are ava
.
The remainder of the string consists of one or more arbitrary characters.
For example, the string "javanese"
matches the particular regular expression, but the string "Core Java"
does not.
As you can see, you need to know a bit of syntax to understand the meaning of a regular expression. Fortunately, for most purposes, a small number of straightforward constructs are sufficient.
A character class is a set of character alternatives, enclosed in brackets, such as [Jj], [0-9]
, [A-Za-z]
, or [^0-9]
. Here the -
denotes a range (all characters whose Unicode value falls between the two bounds), and ^
denotes the complement (all characters except the ones specified).
There are many predefined character classes such as d
(digits) or p{Sc}
(Unicode currency symbol). See Tables 12–8 and 12–9.
Most characters match themselves, such as the ava
characters in the example above.
The .
symbol matches any character (except possibly line terminators, depending on flag settings).
Use as an escape character, for example
.
matches a period and \
matches a backslash.
^
and $
match the beginning and end of a line respectively.
If X and Y are regular expressions, then XY means “any match for X followed by a match for Y”. X |
Y means “any match for X or Y”.
You can apply quantifiers X+
(1 or more), X*
(0 or more), and X?
(0 or 1) to an expression X.
By default, a quantifier matches the largest possible repetition that makes the overall match succeed. You can modify that behavior with suffixes ?
(reluctant or stingy match—match the smallest repetition count) and +
(possessive or greedy match—match the largest count even if that makes the overall match fail).
For example, the string cab
matches [a-z]*ab
but not [a-z]*+ab
. In the first case, the expression [a-z]*
only matches the character c
, so that the characters ab
match the remainder of the pattern. But the greedy version [a-z]*+
matches the characters cab
, leaving the remainder of the pattern unmatched.
You can use groups to define subexpressions. Enclose the groups in ( )
, for example ([+-]?)([0-9]+)
. You can then ask the pattern matcher to return the match of each group or to refer back to a group with
, where n
is the group number (starting with 1
).
For example, here is a somewhat complex but potentially useful regular expression—it describes decimal or hexadecimal integers:
[+-]?[0-9]+|0[Xx][0-9A-Fa-f]+
Unfortunately, the expression syntax is not completely standardized between the various programs and libraries that use regular expressions. While there is consensus on the basic constructs, there are many maddening differences in the details. The Java regular expression classes use a syntax that is similar to, but not quite the same as, the one used in the Perl language. Table 12–8 shows all constructs of the Java syntax. For more information on the regular expression syntax, consult the API documentation for the Pattern
class or the book Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly and Associates, 1997).
Table 12–8. Regular Expression Syntax
Syntax | Explanation |
---|---|
Characters | |
| |
| The code unit with the given hex or octal value |
| The control characters tab, newline, return, form feed, alert, and escape |
| The control character corresponding to the character c |
Character Classes | |
| Any of the characters represented by C1, C2, . . . The Ci are characters, character ranges (c1-c2), or character classes |
| Complement of character class |
| Intersection of two character classes |
Predefined Character Classes | |
| Any character except line terminators (or any character if the |
| A digit [ |
| A nondigit [^ |
| A whitespace character [ |
| A non-whitespace character |
| A word character [ |
| A nonword character |
| A named character class—see Table 12–9 |
| The complement of a named character class |
Boundary Matchers | |
| Beginning, end of input (or beginning, end of line in multiline mode) |
A word boundary | |
| A nonword boundary |
Syntax | Explanation |
| Beginning of input |
| End of input |
End of input except final line terminator | |
| End of previous match |
Quantifiers | |
X? | Optional X |
X* | X, 0 or more times |
X+ | X, 1 or more times |
X{n} X{n,} X{n,m} | X n times, at least n times, between n and m times |
Quantifier Suffixes | |
| Turn default (greedy) match into reluctant match |
| Turn default (greedy) match into possessive match |
Set Operations | |
XY | Any string from X, followed by any string from Y |
X|Y | Any string from X or Y |
Grouping | |
(X) | Capture the string matching X as a group |
| The match of the nth group |
Escapes | |
| The character c (must not be an alphabetic character) |
| Quote . . . verbatim |
| Special construct—see API notes of Pattern class |
The simplest use for a regular expression is to test whether a particular string matches it. Here is how you program that test in Java. First construct a Pattern
object from the string denoting the regular expression. Then get a Matcher
object from the pattern, and call its matches
method:
Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(input); if (matcher.matches()) . . .
Table 12–9. Predefined Character Class Names
| ASCII lower case |
| ASCII upper case |
| ASCII alphabetic |
| ASCII digits |
| ASCII alphabetic or digit |
| Hex digits |
| Printable ASCII character |
| ASCII non-alpha or digit |
| All ASCII |
| ASCII Control character |
| Space or tab |
| Whitespace |
| Lower case, as determined by |
| Upper case, as determined by |
| Whitespace, as determined by |
| Mirrored, as determined by |
| Block is the name of a Unicode character block, with spaces removed, such as BasicLatin or Mongolian. See http://www.unicode.org for a list of block names. |
| Category is the name of a Unicode character category such as L (letter) or Sc (currency symbol). See http://www.unicode.org for a list of category names. |
The input of the matcher is an object of any class that implements the CharSequence
interface, such as a String
, StringBuilder
, or CharBuffer
.
When compiling the pattern, you can set one or more flags, for example,
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE + Pattern.UNICODE_CASE);
The following six flags are supported:
CASE_INSENSITIVE
: Match characters independently of the letter case. By default, this flag takes only US ASCII characters into account.
UNICODE_CASE
: When used in combination with CASE_INSENSITIVE
, use Unicode letter case for matching.
MULTILINE
: ^
and $
match the beginning and end of a line, not the entire input.
UNIX_LINES
: Only '
'
is recognized as a line terminator when matching ^
and $
in multiline mode.
DOTALL
: When using this flag, the .
symbol matches all characters, including line terminators.
CANON_EQ
: Takes canonical equivalence of Unicode characters into account. For example, u
followed by ¨
(diaeresis) matches ü
.
If the regular expression contains groups, then the Matcher
object can reveal the group boundaries. The methods
int start(int groupIndex) int end(int groupIndex)
yield the starting index and the past-the-end index of a particular group.
You can simply extract the matched string by calling
String group(int groupIndex)
Group 0 is the entire input; the group index for the first actual group is 1. Call the groupCount
method to get the total group count.
Nested groups are ordered by the opening parentheses. For example, given the pattern
((1?[0-9]):([0-5][0-9]))[ap]m
and the input
11:59am
the matcher reports the following groups
Group Index | Start | End | String |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example 12–9 prompts for a pattern, then for strings to match. It prints out whether or not the input matches the pattern. If the input matches and the pattern contains groups, then the program prints the group boundaries as parentheses, such as
((11):(59))am
Example 12–9. RegexTest.java
1. import java.util.*; 2. import java.util.regex.*; 3. 4. /** 5. This program tests regular expression matching. 6. Enter a pattern and strings to match, or hit Cancel 7. to exit. If the pattern contains groups, the group 8. boundaries are displayed in the match. 9. */ 10. public class RegExTest 11. { 12. public static void main(String[] args) 13. { 14. Scanner in = new Scanner(System.in); 15. System.out.println("Enter pattern: "); 16. String patternString = in.nextLine(); 17. 18. Pattern pattern = null; 19. try 20. { 21. pattern = Pattern.compile(patternString); 22. } 23. catch (PatternSyntaxException e) 24. { 25. System.out.println("Pattern syntax error"); 26. System.exit(1); 27. } 28. 29. while (true) 30. { 31. System.out.println("Enter string to match: "); 32. String input = in.nextLine(); 33. if (input == null || input.equals("")) return; 34. Matcher matcher = pattern.matcher(input); 35. if (matcher.matches()) 36. { 37. System.out.println("Match"); 38. int g = matcher.groupCount(); 39. if (g > 0) 40. { 41. for (int i = 0; i < input.length(); i++) 42. { 43. for (int j = 1; j <= g; j++) 44. if (i == matcher.start(j)) 45. System.out.print('('), 46. System.out.print(input.charAt(i)); 47. for (int j = 1; j <= g; j++) 48. if (i + 1 == matcher.end(j)) 49. System.out.print(')'), 50. } 51. System.out.println(); 52. } 53. } 54. else 55. System.out.println("No match"); 56. } 57. } 58. }
Usually, you don’t want to match the entire input against a regular expression, but you want to find one or more matching substrings in the input. Use the find
method of the Matcher
class to find the next match. If it returns true
, use the start
and end
methods to find the extent of the match.
while (matcher.find()) { int start = matcher.start(); int end = matcher.end(); String match = input.substring(start, end); . . . }
Example 12–10 puts this mechanism to work. It locates all hypertext references in a web page and prints them. To run the program, supply a URL on the command line, such as
java HrefMatch http://www.horstmann.com
Example 12–10. HrefMatch.java
1. import java.io.*; 2. import java.net.*; 3. import java.util.regex.*; 4. 5. /** 6. This program displays all URLs in a web page by 7. matching a regular expression that describes the 8. <a href=...> HTML tag. Start the program as 9. java HrefMatch URL 10. */ 11. public class HrefMatch 12. { 13. public static void main(String[] args) 14. { 15. try 16. { 17. // get URL string from command line or use default 18. String urlString; 19. if (args.length > 0) urlString = args[0]; 20. else urlString = "http://java.sun.com"; 21. 22. // open reader for URL 23. InputStreamReader in = new InputStreamReader(new URL(urlString).openStream()); 24. 25. // read contents into string buffer 26. StringBuilder input = new StringBuilder(); 27. int ch; 28. while ((ch = in.read()) != -1) input.append((char) ch); 29. 30. // search for all occurrences of pattern 31. String patternString = "<a\s+href\s*=\s*("[^"]*"|[^\s>])\s*>"; 32. Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); 33. Matcher matcher = pattern.matcher(input); 34. 35. while (matcher.find()) 36. { 37. int start = matcher.start(); 38. int end = matcher.end(); 39. String match = input.substring(start, end); 40. System.out.println(match); 41. } 42. } 43. catch (IOException e) 44. { 45. e.printStackTrace(); 46. } 47. catch (PatternSyntaxException e) 48. { 49. e.printStackTrace(); 50. } 51. } 52. }
The replaceAll
method of the Matcher
class replaces all occurrences of a regular expression with a replacement string. For example, the following instructions replace all sequences of digits with a #
character.
Pattern pattern = Pattern.compile("[0-9]+"); Matcher matcher = pattern.matcher(input); String output = matcher.replaceAll("#");
The replacement string can contain references to groups in the pattern: $n
is replaced with the nth group. Use $
to include a $
character in the replacement text.
The replaceFirst
method replaces only the first occurrence of the pattern.
Finally, the Pattern
class has a split
method that works like a string tokenizer on steroids. It splits an input into an array of strings, using the regular expression matches as boundaries. For example, the following instructions split the input into tokens, where the delimiters are punctuation marks surrounded by optional whitespace.
Pattern pattern = Pattern.compile("\s*\p{Punct}\s*"); String[] tokens = pattern.split(input);
static Pattern compile(String expression)
static Pattern compile(String expression, int flags)
compile the regular expression string into a pattern object for fast processing of matches.
Parameters: |
| The regular expression |
| One or more of the flags |
Matcher matcher(CharSequence input)
returns a matcher
object that you can use to locate the matches of the pattern in the input.
String[] split(CharSequence input)
String[] split(CharSequence input, int limit)
split the input string into tokens, where the pattern specifies the form of the delimiters. Return an array of tokens. The delimiters are not part of the tokens.
Parameters: |
| The string to be split into tokens |
| The maximum number of strings to produce. If |
boolean matches()
returns true
if the input matches the pattern.
boolean lookingAt()
returns true
if the beginning of the input matches the pattern.
boolean find()
boolean find(int start)
attempt to find the next match and return true
if another match is found.
Parameters: |
| The index at which to start searching |
int start()
int end()
return the start and past-the-end position of the current match.
String group()
returns the current match.
int groupCount()
returns the number of groups in the input pattern.
int start(int groupIndex)
int end(int groupIndex)
return the start and past-the-end position of a given group in the current match.
Parameters: |
| The group index (starting with 1), or 0 to indicate the entire match |
String group(int groupIndex)
returns the string matching a given group.
Parameters: |
| The group index (starting with 1), or 0 to indicate the entire match |
String replaceAll(String replacement)
String replaceFirst(String replacement)
return a string obtained from the matcher input by replacing all matches, or the first match, with the replacement string.
Parameters: |
| The replacement string. It can contain references to a pattern group as $n. Use $ to include a $ symbol |
Matcher reset()
Matcher reset(CharSequence input)
reset the matcher state. The second method makes the matcher work on a different input. Both methods return this
.