This chapter deals with a number of filesystem-related subjects, such as directory- or folder-based programming tasks. Some of the more advanced topics in filesystem I/O (input/output) are also touched on, such as:
Locking subsections of a file
Monitoring for certain filesystem actions
Version information in files
File compression
Various file and directory I/O techniques are used throughout the recipes to show you how to perform tasks such as creating, opening, deleting, reading, and writing with files and directories. This is fundamental knowledge that will help you understand the file I/O recipes and how to modify them for your purposes.
A number of the recipes have been updated to use the async
and await
operators to help alleviate the latency you’d typically encounter when dealing with the filesystem or network when performing file I/O. Using async
and await
improves your code’s overall responsiveness by allowing the I/O operations to occur but not to block the calling thread as they normally would until they’ve completed.
Unless otherwise specified, you need the following using
statements in any program that uses snippets or methods from this chapter:
using System; using System.IO;
You are attempting to find one or more specific files or directories that may or may not exist within the current filesystem. You might need to use wildcard characters in order to widen the search—for example, searching for all usermode dump files in a filesystem. These files have a .dmp extension.
There are several methods of obtaining this information. The first three methods return a string array containing the full path of each item. The next three methods return an object that encapsulates a directory, a file, or both.
The static GetFileSystemEntries
method on the Directory
class returns a string array containing the names of all files and directories within a single directory, for example:
public static void DisplayFilesAndSubDirectories(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); string[] items = Directory.GetFileSystemEntries(path); Array.ForEach(items, item => { Console.WriteLine(item); }); }
The static GetDirectories
method on the Directory
class returns a string array containing the names of all directories within a single directory. The following method, DisplayDirs
, shows how you might use it:
public static void DisplaySubDirectories(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); string[] items = Directory.GetDirectories(path); Array.ForEach(items, item => { Console.WriteLine(item); }); }
The static GetFiles
method on the Directory
class returns a string array containing the names of all files within a single directory. The following method is very similar to DisplayDirs
but calls Directory.GetFiles
instead of Directory.GetDirectories
:
public static void DisplayFiles(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); string[] items = Directory.GetFiles(path); Array.ForEach(items, item => { Console.WriteLine(item); }); }
These next two methods return an object instead of simply a string. The GetFileSystemInfos
method of the DirectoryInfo
object returns a strongly typed array of FileSystemInfo
objects (that is, of DirectoryInfo
and FileInfo
objects) representing the directories and files within a single directory. The following example calls the GetFileSystemInfos
method to retrieve an array of FileSystemInfo
objects representing all the items in a particular directory and then lists a string of display information for FileSystemInfo
to the console window. The display information is created by the extension method ToDisplayString
on FileSystemInfo
:
public static void DisplayDirectoryContents(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); DirectoryInfo mainDir = new DirectoryInfo(path); var fileSystemDisplayInfos = (from fsi in mainDir.GetFileSystemInfos() where fsi is FileSystemInfo || fsi is DirectoryInfo select fsi.ToDisplayString()).ToArray(); Array.ForEach(fileSystemDisplayInfos, s => { Console.WriteLine(s); }); } public static string ToDisplayString(this FileSystemInfo fileSystemInfo) { string type = fileSystemInfo.GetType().ToString(); if (fileSystemInfo is DirectoryInfo) type = "DIRECTORY"; else if (fileSystemInfo is FileInfo) type = "FILE"; return $"{type}: {fileSystemInfo.Name}"; }
The output for this code is shown here:
DIRECTORY: MyNestedTempDir DIRECTORY: MyNestedTempDirPattern FILE: MyTempFile.PDB FILE: MyTempFile.TXT
The GetDirectories
instance method of the DirectoryInfo
object returns an array of DirectoryInfo
objects representing only subdirectories in a single directory. For example, the following code calls the GetDirectories
method to retrieve an array of DirectoryInfo
objects and then displays the Name
property of each object to the console window:
public static void DisplayDirectoriesFromInfo(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); DirectoryInfo mainDir = new DirectoryInfo(path); DirectoryInfo[] items = mainDir.GetDirectories(); Array.ForEach(items, item => { Console.WriteLine($"DIRECTORY: {item.Name}"); }); }
The GetFiles
instance method of the DirectoryInfo
object returns an array of FileInfo
objects representing only the files in a single directory. For example, the following code calls the GetFiles
method to retrieve an array of FileInfo
objects, and then it displays the Name
property of each object to the console window:
public static void DisplayFilesFromInfo(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); DirectoryInfo mainDir = new DirectoryInfo(path); FileInfo[] items = mainDir.GetFiles(); Array.ForEach(items, item => { Console.WriteLine($"FILE: {item.Name}"); }); }
The static GetFileSystemEntries
method on the Directory
class returns all files and directories in a single directory that match pattern
:
public static void DisplayFilesWithPattern(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); string[] items = Directory.GetFileSystemEntries(path, pattern); Array.ForEach(items, item => { Console.WriteLine(item); }); }
The static GetDirectories
method on the Directory
class returns only those directories in a single directory that match pattern
:
public static void DisplayDirectoriesWithPattern(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); string[] items = Directory.GetDirectories(path, pattern); Array.ForEach(items, item => { Console.WriteLine(item); }); }
The static GetFiles
method on the Directory
class returns only those files in a single directory that match pattern
:
public static void DisplayFilesWithGetFiles(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); string[] items = Directory.GetFiles(path, pattern); Array.ForEach(items, item => { Console.WriteLine(item); }); }
These next three methods return an object instead of simply a string. The first instance method is GetFileSystemInfos
, which returns both directories and files in a single directory that match pattern
:
public static void DisplayDirectoryContentsWithPattern(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); DirectoryInfo mainDir = new DirectoryInfo(path); var fileSystemDisplayInfos = (from fsi in mainDir.GetFileSystemInfos(pattern) where fsi is FileSystemInfo || fsi is DirectoryInfo select fsi.ToDisplayString()).ToArray(); Array.ForEach(fileSystemDisplayInfos, s => { Console.WriteLine(s); }); }
The GetDirectories
instance method returns only directories (contained in the DirectoryInfo
object) in a single directory that match pattern
:
public static void DisplayDirectoriesWithPatternFromInfo(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); DirectoryInfo mainDir = new DirectoryInfo(path); DirectoryInfo[] items = mainDir.GetDirectories(pattern); Array.ForEach(items, item => { Console.WriteLine($"DIRECTORY: {item.Name}"); }); }
The GetFiles
instance method returns only file information (contained in the FileInfo
object) in a single directory that matches pattern
:
public static void DisplayFilesWithInstanceGetFiles(string path, string pattern) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(pattern)) throw new ArgumentNullException(nameof(pattern)); DirectoryInfo mainDir = new DirectoryInfo(path); FileInfo[] items = mainDir.GetFiles(pattern); Array.ForEach(items, item => { Console.WriteLine($"FILE: {item.Name}"); }); }
If you need just an array of strings containing paths to both directories and files, you can use the static method Directory.GetFileSystemEntries
. The string array returned does not include any information about whether an individual element is a directory or a file. Each string element contains the entire path to either a directory or file contained within the specified path.
To quickly and easily distinguish between directories and files, use the Directory. GetDirectories
and Directory.GetFiles
static methods. These methods return arrays of directory names and filenames. These methods return an array of string objects. Each element contains the full path to the directory or file.
Returning a string is fine if you do not need any other information about the directory or file returned or if you are going to need more information for only one of the files returned. It is more efficient to use the static methods to get the list of filenames and just retrieve the FileInfo
for the ones you need than to have all of the FileInfos
constructed for the directory, as the instance methods will do. If you need to access attributes, lengths, or times on every one of the files, you should consider using the instance methods that retrieve the FileInfo
details.
The instance method GetFileSystemInfos
returns an array of strongly typed FileSystemInfo
objects. (The FileSystemInfo
object is the base class to the DirectoryInfo
and FileInfo
objects.) Therefore, you can test whether the returned type is a DirectoryInfo
or FileInfo
object using the is
or as
keyword. Once you know what subclass the object really is, you can cast the object to that type and begin using it.
To get only DirectoryInfo
objects, use the overloaded GetDirectories
instance method. To get only FileInfo
objects, use the overloaded GetFiles
instance method. These methods return an array of DirectoryInfo
and FileInfo
objects, respectively, each element of which encapsulates a directory or file.
There are certain behaviors to be aware of for the patterns you can provide when filtering the results from GetFiles
or GetFileSystemInfos
:
The pattern cannot contain any of the InvalidPathChar
s and cannot use the “go back up in the folder structure one level” symbol (..
).
The order in which the items in the array are returned is not guaranteed, but you can use Sort
or order the results in a query.
When an extension is exactly three characters, the behavior is different in that the pattern will match on any files with those first three characters in the extension.
*.htm
returns files having an extension of .htm, .html, .htma, and so on.
When an extension has fewer than or more than three characters, the pattern will perform exact matching.
*.cs
returns only files having an extension of .cs.
The “DirectoryInfo Class,” “FileInfo Class,” and “FileSystemInfo Class” topics in the MSDN documentation.
You need to get a directory tree, potentially including filenames, extending from any point in the directory hierarchy. In addition, each directory or file returned must be in the form of an object encapsulating that item. This will allow you to perform operations on the returned objects, such as deleting the file, renaming the file, or examining/changing its attributes. Finally, you potentially need the ability to search for a specific subset of these items based on a pattern, such as finding only files with the .pdb extension.
By calling the GetFileSystemInfos
instance method, you can retrieve all of the files and directories down the directory hierarchy from any starting point as an enumerable list:
public static IEnumerable<FileSystemInfo> GetAllFilesAndDirectories(string dir) { if (string.IsNullOrWhiteSpace(dir)) throw new ArgumentNullException(nameof(dir)); DirectoryInfo dirInfo = new DirectoryInfo(dir); Stack<FileSystemInfo> stack = new Stack<FileSystemInfo>(); stack.Push(dirInfo); while (dirInfo != null || stack.Count > 0) { FileSystemInfo fileSystemInfo = stack.Pop(); DirectoryInfo subDirectoryInfo = fileSystemInfo as DirectoryInfo; if (subDirectoryInfo != null) { yield return subDirectoryInfo; foreach (FileSystemInfo fsi in subDirectoryInfo.GetFileSystemInfos()) stack.Push(fsi); dirInfo = subDirectoryInfo; } else { yield return fileSystemInfo; dirInfo = null; } } }
To display the results of the file and directory retrieval, use the following query:
public static void DisplayAllFilesAndDirectories(string dir) { if (string.IsNullOrWhiteSpace(dir)) throw new ArgumentNullException(nameof(dir)); var strings = (from fileSystemInfo in GetAllFilesAndDirectories(dir) select fileSystemInfo.ToDisplayString()).ToArray(); Array.ForEach(strings, s => { Console.WriteLine(s); }); }
Since the results are queryable, you don’t have to retrieve information about all files and directories. The following query uses a case-insensitive comparison to obtain a listing of all files with the extension of .pdb that reside in directories that contain Chapter 1:
public static void DisplayAllFilesWithExtension(string dir, string extension) { if (string.IsNullOrWhiteSpace(dir)) throw new ArgumentNullException(nameof(dir)); if (string.IsNullOrWhiteSpace(extension)) throw new ArgumentNullException(nameof(extension)); var strings = (from fileSystemInfo in GetAllFilesAndDirectories(dir) where fileSystemInfo is FileInfo && fileSystemInfo.FullName.Contains("Chapter 1") && (string.Compare(fileSystemInfo.Extension, extension, StringComparison.OrdinalIgnoreCase) == 0) select fileSystemInfo.ToDisplayString()).ToArray(); Array.ForEach(strings, s => { Console.WriteLine(s); }); }
To obtain a tree representation of a directory and the files it contains, you could use recursive iterators in a method like this:
public static IEnumerable<FileSystemInfo> GetAllFilesAndDirectoriesWithRecursion( string dir) { if (string.IsNullOrWhiteSpace(dir)) throw new ArgumentNullException(nameof(dir)); DirectoryInfo dirInfo = new DirectoryInfo(dir); FileSystemInfo[] fileSystemInfos = dirInfo.GetFileSystemInfos(); foreach (FileSystemInfo fileSystemInfo in fileSystemInfos) { yield return fileSystemInfo; if (fileSystemInfo is DirectoryInfo) { foreach (FileSystemInfo fsi in GetAllFilesAndDirectoriesWithRecursion(fileSystemInfo.FullName)) yield return fsi; } } } public static void DisplayAllFilesAndDirectoriesWithRecursion(string dir) { if (string.IsNullOrWhiteSpace(dir)) throw new ArgumentNullException(nameof(dir)); var strings = (from fileSystemInfo in GetAllFilesAndDirectoriesWithRecursion(dir) select fileSystemInfo.ToDisplayString()).ToArray(); Array.ForEach(strings, s => { Console.WriteLine(s); }); }
The main difference between this and the Solution code is that this uses recursive iterators, and the Solution uses iterative iterators and an explicit stack.
You would not want to use the recursive iterator method, as the performance is in fact O(n * d)
, where n
is the number of FileSystemInfos
and d
is the depth of the directory hierarchy—which is typically log n. See the demonstration code.
You can check the performance with the following code if the Solution methods are renamed to DisplayAllFilesAndDirectoriesWithoutRecursion
and DisplayAllFilesWithExtensionWithoutRecursion
, respectively:
string dir = Environment.GetFolderPath(Environment.SpecialFolder.ProgramFiles); // list all of the files without recursion Stopwatch watch1 = Stopwatch.StartNew(); DisplayAllFilesAndDirectoriesWithoutRecursion(tempDir1); watch1.Stop(); Console.WriteLine("*************************"); // list all of the files without using recursion Stopwatch watch2 = Stopwatch.StartNew(); DisplayAllFilesAndDirectoriesWithoutRecursion(tempDir1); watch2.Stop(); Console.WriteLine("*************************"); Console.WriteLine( $"Non-Recursive method time elapsed {watch1.Elapsed.ToString()}"); Console.WriteLine($"Recursive method time elapsed {watch2.Elapsed.ToString()}");
Here is the code without recursion methods:
public static void DisplayAllFilesAndDirectoriesWithoutRecursion(string dir) { var strings = from fileSystemInfo in GetAllFilesAndDirectoriesWithoutRecursion(dir) select fileSystemInfo.ToDisplayString(); foreach (string s in strings) Console.WriteLine(s); } public static void DisplayAllFilesWithExtensionWithoutRecursion(string dir, string extension) { var strings = from fileSystemInfo in GetAllFilesAndDirectoriesWithoutRecursion(dir) where fileSystemInfo is FileInfo && fileSystemInfo.FullName.Contains("Chapter 1") && (string.Compare(fileSystemInfo.Extension, extension, StringComparison.OrdinalIgnoreCase) == 0) select fileSystemInfo.ToDisplayString(); foreach (string s in strings) Console.WriteLine(s); } public static IEnumerable<FileSystemInfo> GetAllFilesAndDirectoriesWithoutRecursion( string dir) { DirectoryInfo dirInfo = new DirectoryInfo(dir); Stack<FileSystemInfo> stack = new Stack<FileSystemInfo>(); stack.Push(dirInfo); while (dirInfo != null || stack.Count > 0) { FileSystemInfo fileSystemInfo = stack.Pop(); DirectoryInfo subDirectoryInfo = fileSystemInfo as DirectoryInfo; if (subDirectoryInfo != null) { yield return subDirectoryInfo; foreach (FileSystemInfo fsi in subDirectoryInfo.GetFileSystemInfos()) stack.Push(fsi); dirInfo = subDirectoryInfo; } else { yield return fileSystemInfo; dirInfo = null; } } }
The “DirectoryInfo Class,” “FileInfo Class,” and “FileSystemInfo Class” topics in the MSDN documentation.
Use the static methods of the Path
class:
public static void DisplayPathParts(string path) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); string root = Path.GetPathRoot(path); string dirName = Path.GetDirectoryName(path); string fullFileName = Path.GetFileName(path); string fileExt = Path.GetExtension(path); string fileNameWithoutExt = Path.GetFileNameWithoutExtension(path); StringBuilder format = new StringBuilder(); format.Append($"ParsePath of {path} breaks up into the following pieces:" + $"{Environment.NewLine}"); format.Append($" Root: {root}{Environment.NewLine}"); format.Append($" Directory Name: {dirName}{Environment.NewLine}"); format.Append($" Full File Name: {fullFileName}{Environment.NewLine}"); format.Append($" File Extension: {fileExt}{Environment.NewLine}"); format.Append($" File Name Without Extension: {fileNameWithoutExt}" + $"{Environment.NewLine}"); Console.WriteLine(format.ToString()); }
If the string C: est empfile.txt
is passed to this method, the output looks like this:
ParsePath of C: est empfile.txt breaks up into the following pieces: Root: C: Directory Name: C: est Full File Name: tempfile.txt File Extension: .txt File Name Without Extension: tempfile
The Path
class contains methods that can be used to parse a given path. Using these classes is much easier and less error-prone than writing path- and filename-parsing code. If these classes are not used, you could also introduce security holes into your application if the information gathered from manual parsing routines is used in security decisions for your application. There are five main methods used to parse a path: GetPathRoot
, GetDirectoryName
, GetFileName
, GetExtension
, and GetFileNameWithoutExtension
. Each has a single parameter, path
, which represents the path to be parsed:
GetPathRoot
null
.GetDirectoryName
GetFileName
null
.GetExtension
null
.GetFileNameWithoutExtension
Be aware that these methods do not actually determine whether the drives, directories, or even files exist on the system that runs these methods. These methods are string parsers, and if you pass one of them a string in some strange format (such as \ZY:foo
), it will try to do what it can with it anyway:
ParsePath of \ZY:foo breaks up into the following pieces: Root: \ZY:foo Directory Name: Full File Name: foo File Extension: File Name Without Extension: foo
These methods will, however, throw an exception if illegal characters are found in the path.
To determine whether files or directories exist, use the static Directory.Exists
or File.Exists
method.
The “Path Class” topic in the MSDN documentation.
Say you need to drive the cmd.exe application to display the current time with the TIME /T
command (you could just run this command from the command line, but this way we can demonstrate an alternative method to drive an application that responds to standard input). The way to do this is to launch a process that is looking for input on the standard input stream. This is accomplished via the Process
class StartInfo
property, which is an instance of a ProcessStartInfo
class. StartInfo
has fields that control many details of the environment in which the new process will execute, and the Process.Start
method will launch the new process with those options.
First, make sure that the StartInfo.RedirectStandardInput
property is set to true
. This setting notifies the process that it should read from standard input. Then, set the StartInfo.UseShellExecute
property to false
, because if you were to let the shell launch the process for you, it would prevent you from redirecting standard input.
Once this is done, launch the process and write to its standard input stream as shown in Example 8-1.
public static void RunProcessToReadStandardInput() { Process application = new Process(); // Run the command shell. application.StartInfo.FileName = @"cmd.exe"; // Turn on command extensions for cmd.exe. application.StartInfo.Arguments = "/E:ON"; application.StartInfo.RedirectStandardInput = true; application.StartInfo.UseShellExecute = false; application.Start(); StreamWriter input = application.StandardInput; // Run the command to display the time. input.WriteLine("TIME /T"); // Stop the application we launched. input.WriteLine("exit"); }
Redirecting the input stream for a process allows you to programmatically interact with certain applications and utilities that you would otherwise not be able to automate without additional tools. Once the input has been redirected, you can write into the standard input stream of the process by reading the Process.StandardInput
property, which returns a StreamWriter
. Once you have that, you can send things to the process via WriteLine
calls, as shown earlier.
To use StandardInput
, you have to specify true
for the StartInfo
property’s RedirectStandardInput
property. Otherwise, reading the StandardInput
property throws an exception.
When UseShellExecute
is false
, you can use Process
only to create executable processes. Normally you can use the Process
class to perform operations on the file, such as printing a Microsoft Word document. Another difference when UseShellExecute
is set to false
is that the working directory is not used to find the executable, so you must be mindful to pass a full path or have the executable on your PATH
environment variable.
The “Process Class,” “ProcessStartInfo Class,” “RedirectStandardInput Property,” and “UseShellExecute Property” topics in the MSDN documentation.
To lock out other processes from accessing your file while you are using it, you use the Lock
method of the FileStream
class. The following code creates a file from the fileName
parameter and writes two lines to it. The entire file is then locked via the Lock
method. While the file is locked, the code goes off and does some other processing; when this code returns, the file is closed and thereby unlocked:
public static async Task CreateLockedFileAsync(string fileName) { if (string.IsNullOrWhiteSpace(fileName)) throw new ArgumentNullException(nameof(fileName)); FileStream fileStream = null; try { fileStream = new FileStream(fileName, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite, 4096, useAsync: true); using (StreamWriter writer = new StreamWriter(fileStream)) { await writer.WriteLineAsync("The First Line"); await writer.WriteLineAsync("The Second Line"); await writer.FlushAsync(); try { // Lock all of the file. fileStream.Lock(0, fileStream.Length); // Do some lengthy processing here... Thread.Sleep(1000); } finally { // Make sure we unlock the file. // If a process terminates with part of a file locked or closes // a file that has outstanding locks, the behavior is undefined // which is MS speak for bad things.... fileStream.Unlock(0, fileStream.Length); } await writer.WriteLineAsync("The Third Line"); fileStream = null; } } finally { if (fileStream != null) fileStream.Dispose(); } }
Note that in the CreateLockedFileAsync
method we are using the async
and await
operators. The async
operator allows you to indicate that a method is eligible for suspension at certain points, and the await
operator designates those suspension points in your code—which means that the compiler knows that the async
method can’t continue past that point until the await
ed asynchronous process is complete. While it waits, the caller gets control back. This helps your program in that the thread for the caller is not blocked and can perform other work, but the method will still act as if it was called synchronously.
If a file is opened within your application and the FileShare
parameter of the FileStream.Open
call is set to FileShare.ReadWrite
or FileShare.Write
, other code in your application can view or alter the contents of the file while you are using it. To handle file access with more granularity, use the Lock
method of the FileStream
object to prevent other code from overwriting all or a portion of your file. Once you are done with the locked portion of your file, you can call the Unlock
method on the FileStream
object to allow other code in your application to write data to that portion of the file.
To lock an entire file, use the following syntax:
fileStream.Lock(0, fileStream.Length);
To lock a portion of a file, use the following syntax:
fileStream.Lock(4, fileStream.Length - 4);
This line of code locks the entire file except for the first four characters. Note that you can lock an entire file and still open it multiple times, as well as write to it.
If another thread is accessing this file, you might see an IOException
thrown during the call to one of the WriteAsync
, FlushAsync
, or Close
methods. For example, the following code is prone to such an exception:
public static async Task CreateLockedFileWithExceptionAsync(string fileName) { FileStream fileStream = null; try { fileStream = new FileStream(fileName, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite, 4096, useAsync: true); using (StreamWriter streamWriter = new StreamWriter(fileStream)) { await streamWriter.WriteLineAsync("The First Line"); await streamWriter.WriteLineAsync("The Second Line"); await streamWriter.FlushAsync(); // Lock all of the file. fileStream.Lock(0, fileStream.Length); FileStream writeFileStream = null; try { writeFileStream = new FileStream(fileName, FileMode.Open, FileAccess.Write, FileShare.ReadWrite, 4096, useAsync: true); using (StreamWriter streamWriter2 = new StreamWriter(writeFileStream)) { await streamWriter2.WriteAsync("foo "); try { streamWriter2.Close(); // --> Exception occurs here! } catch { Console.WriteLine( "The streamWriter2.Close call generated an exception."); } streamWriter.WriteLine("The Third Line"); } writeFileStream = null; } finally { if (writeFileStream != null) writeFileStream.Dispose(); } } fileStream = null; } finally { if (fileStream != null) fileStream.Dispose(); } }
This code produces the following output:
The streamWriter2.Close call generated an exception.
Even though streamWriter2
, the second StreamWriter
object, writes to a locked file, it is only when the streamWriter2.Close
method is executed that the IOException
is thrown.
If the code for this recipe were rewritten as follows:
public static async Task CreateLockedFileWithUnlockAsync(string fileName) { FileStream fileStream = null; try { fileStream = new FileStream(fileName, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite, 4096, useAsync: true); using (StreamWriter streamWriter = new StreamWriter(fileStream)) { await streamWriter.WriteLineAsync("The First Line"); await streamWriter.WriteLineAsync("The Second Line"); await streamWriter.FlushAsync(); // Lock all of the file. fileStream.Lock(0, fileStream.Length); // Try to access the locked file... FileStream writeFileStream = null; try { writeFileStream = new FileStream(fileName, FileMode.Open, FileAccess.Write, FileShare.ReadWrite, 4096, useAsync: true); using (StreamWriter streamWriter2 = new StreamWriter(writeFileStream)) { await streamWriter2.WriteAsync("foo"); fileStream.Unlock(0, fileStream.Length); await streamWriter2.FlushAsync(); } writeFileStream = null; } finally { if (writeFileStream != null) writeFileStream.Dispose(); } } fileStream = null; } finally { if (fileStream != null) fileStream.Dispose(); } }
no exception is thrown. This is because the code unlocked the FileStream
object that initially locked the entire file. This action also freed all of the locks on the file that this FileStream
object was holding onto. In the example, the streamWriter2.WriteAsync("Foo")
method had written Foo
to the stream’s buffer but had not flushed it, so the string Foo
was still waiting to be flushed and written to the actual file. Keep this situation in mind when interleaving the opening, locking, and closing of streams. Sometimes mistakes in code are not immediately found during code reviews, unit testing, or formal quality assurance, and this can lead to some bugs that are more difficult to track down, so tread carefully when using file locking.
The “StreamWriter Class,” “FileStream Class,” and “Asynchronous Programming with Async and Await” topics in the MSDN documentation.
You need to be notified when a particular event occurs in the filesystem, such as the renaming of a file or directory, the increasing or decreasing of the size of a file, the deletion of a file or directory, the creation of a file or directory, or even the changing of a file’s or directory’s attribute(s). However, this notification must occur synchronously. In other words, the application cannot continue unless a specific action occurs to a file or directory.
The WaitForChanged
method of the FileSystemWatcher
class can be called to wait synchronously for an event notification. This is illustrated by the WaitForZipCreation
method shown in Example 8-2, which waits for an action—more specifically, the creation of the Backup.zip file somewhere on the C: drive—to be performed before proceeding to the next line of code, which is the WriteLine
statement. Finally, we spin off a task to do the actual work of creating the file. By doing this as a Task
, we allow the processing to occur on a separate thread when one becomes available and the FileSystemWatcher
to detect the file creation.
public static void WaitForZipCreation(string path, string fileName) { if (string.IsNullOrWhiteSpace(path)) throw new ArgumentNullException(nameof(path)); if (string.IsNullOrWhiteSpace(fileName)) throw new ArgumentNullException(nameof(fileName)); FileSystemWatcher fsw = null; try { fsw = new FileSystemWatcher(); string [] data = new string[] {path,fileName}; fsw.Path = path; fsw.Filter = fileName; fsw.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName; // Run the code to generate the file we are looking for // Normally you wouldn't do this as another source is creating // this file Task work = Task.Run(() => { try { // wait a sec... Thread.Sleep(1000); // create a file in the temp directory if (data.Length == 2) { string dataPath = data[0]; string dataFile = path + data[1]; Console.WriteLine($"Creating {dataFile} in task..."); FileStream fileStream = File.Create(dataFile); fileStream.Close(); } } catch (Exception e) { Console.WriteLine(e.ToString()); } }); // Don't await the work task finish, as we detect that // through the FileSystemWatcher WaitForChangedResult result = fsw.WaitForChanged(WatcherChangeTypes.Created); Console.WriteLine($"{result.Name} created at {path}."); } catch(Exception e) { Console.WriteLine(e.ToString()); } finally { // clean it up File.Delete(fileName); fsw?.Dispose(); } }
The WaitForChanged
method returns a WaitForChangedResult
structure that contains the properties listed in Table 8-1.
Property | Description |
---|---|
ChangeType |
Lists the type of change that occurred. This change is returned as a WatcherChangeTypes enumeration. The values of this enumeration can possibly be ORed together. |
Name |
Holds the name of the file or directory that was changed. If the file or directory was renamed, this property returns the changed name. Its value is set to null if the operation method call times out. |
OldName |
The original name of the modified file or directory. If this file or directory was not renamed, this property will return the same value as the Name property. Its value is set to null if the operation method call times out. |
TimedOut |
Holds a Boolean indicating whether the WaitForChanged method timed out (true ) or not (false ). |
The way we are currently making the WaitForChanged
call could possibly block indefinitely. To prevent the code from hanging forever on the WaitForChanged
call, you can specify a timeout value of three seconds as follows:
WaitForChangedResult result = fsw.WaitForChanged(WatcherChangeTypes.Created, 3000);
The NotifyFilters
enumeration allows you to specify the types of files or folders to watch for, as shown in Table 8-2.
Enumeration value | Definition |
---|---|
FileName |
Name of the file |
DirectoryName |
Name of the directory |
Attributes |
The file or folder attributes |
Size |
The file or folder size |
LastWrite |
The date the file or folder last had anything written to it |
LastAccess |
The date the file or folder was last opened |
CreationTime |
The time the file or folder was created |
Security |
The security settings of the file or folder |
The “FileSystemWatcher Class,” “NotifyFilters Enumeration,” and “WaitForChangedResult Structure” topics in the MSDN documentation.
You need to programmatically compare the version information of two executable modules. An executable module is a file that contains executable code, such as an .exe or .dll file. The ability to compare the version information of two executable modules can be very useful to an application in situations such as:
Trying to determine if it has all of the “right” pieces present to execute.
Deciding on an assembly to dynamically load through reflection.
Looking for the newest version of a file or .dll from many files spread out in the local filesystem or on a network.
Use the CompareFileVersions
method to compare executable module version information. This method accepts two filenames, including their paths, as parameters. The version information of each module is retrieved and compared. This file returns a FileComparison
enumeration, defined as follows:
public enum FileComparison { Error = 0, Newer = 1, Older = 2, Same = 3 }
The code for the CompareFileVersions
method is shown in Example 8-3.
private static FileComparison ComparePart(int p1, int p2) => p1 > p2 ? FileComparison.Newer : (p1 < p2 ? FileComparison.Older : FileComparison.Same); public static FileComparison CompareFileVersions(string file1, string file2) { if (string.IsNullOrWhiteSpace(file1)) throw new ArgumentNullException(nameof(file1)); if (string.IsNullOrWhiteSpace(file2)) throw new ArgumentNullException(nameof(file2)); FileComparison retValue = FileComparison.Error; // get the version information FileVersionInfo file1Version = FileVersionInfo.GetVersionInfo(file1); FileVersionInfo file2Version = FileVersionInfo.GetVersionInfo(file2); retValue = ComparePart(file1Version.FileMajorPart, file2Version.FileMajorPart); if (retValue != FileComparison.Same) { retValue = ComparePart(file1Version.FileMinorPart, file2Version.FileMinorPart); if (retValue != FileComparison.Same) { retValue = ComparePart(file1Version.FileBuildPart, file2Version.FileBuildPart); if (retValue != FileComparison.Same) retValue = ComparePart(file1Version.FilePrivatePart, file2Version.FilePrivatePart); } } return retValue; }
Not all executable modules have version information. If you load a module with no version information using the FileVersionInfo
class, you will not provoke an exception, nor will you get null
back for the object reference. Instead, you will get a valid FileVersionInfo
object with all data members in their initial state, which is null
for .NET objects.
Assemblies actually have two sets of version information: the version information available in the assembly manifest and the PE (portable executable) file version information. FileVersionInfo
reads the assembly manifest version information.
The first action this method takes is to determine whether the two files passed in to the file1
and file2
parameters actually exist. If so, the static GetVersionInfo
method of the FileVersionInfo
class is called to get version information for the two files.
The CompareFileVersions
method attempts to compare each portion of the file’s version number using the following properties of the FileVersionInfo
object returned by GetVersionInfo
:
FileMajorPart
FileMinorPart
FileBuildPart
FilePrivatePart
The full version number is composed of these four parts, making up an 8-byte number representing the file’s version number.
The CompareFileVersions
method first compares the FileMajorPart
version information of the two files. If these are equal, the FileMinorPart
version information of the two files is compared. This continues through the FileBuildPart
and finally the FilePrivatePart
version information values. If all four parts are equal, the files are considered to have the same version number. If either file is found to have a higher number than the other file, it is considered to be the latest version.
The “FileVersionInfo Class” topic in the MSDN documentation.
Use the various properties in the DriveInfo
class as shown here:
public static void DisplayAllDriveInfo() { DriveInfo[] drives = DriveInfo.GetDrives(); Array.ForEach(drives, drive => { if (drive.IsReady) { Console.WriteLine($"Drive {drive.Name} is ready."); Console.WriteLine($"AvailableFreeSpace: {drive.AvailableFreeSpace}"); Console.WriteLine($"DriveFormat: {drive.DriveFormat}"); Console.WriteLine($"DriveType: {drive.DriveType}"); Console.WriteLine($"Name: {drive.Name}"); Console.WriteLine("RootDirectory.FullName: " + $"{drive.RootDirectory.FullName}"); Console.WriteLine($"TotalFreeSpace: {drive.TotalFreeSpace}"); Console.WriteLine($"TotalSize: {drive.TotalSize}"); Console.WriteLine($"VolumeLabel: {drive.VolumeLabel}"); } else { Console.WriteLine($"Drive {drive.Name} is not ready."); } Console.WriteLine(); }); }
This code will display the results in the following format. Because each system is different, the results will vary:
Drive C: is ready. AvailableFreeSpace: 143210795008 DriveFormat: NTFS DriveType: Fixed Name: C: RootDirectory.FullName: C: TotalFreeSpace: 143210795008 TotalSize: 159989886976 VolumeLabel: Vol1 Drive D: is ready. AvailableFreeSpace: 0 DriveFormat: UDF DriveType: CDRom Name: D: RootDirectory.FullName: D: TotalFreeSpace: 0 TotalSize: 3305965568 VolumeLabel: Vol2 Drive E: is ready. AvailableFreeSpace: 4649025536 DriveFormat: UDF DriveType: CDRom Name: E: RootDirectory.FullName: E: TotalFreeSpace: 4649025536 TotalSize: 4691197952 VolumeLabel: Vol3 Drive F: is not ready
Of particular interest are the IsReady
and AvailableFreeSpace
properties. The IsReady
property determines if the drive is ready to be queried, written to, or read from but is not terribly reliable, as this state could quickly change. When using IsReady
, be sure to account for the case where the drive becomes not ready as well. The AvailableFreeSpace
property returns the free space on that drive in bytes.
The DriveInfo
class from the .NET Framework allows you to easily query information on one particular drive or on all drives in the system. To query the information from a single drive, use the code in Example 8-4.
DriveInfo drive = new DriveInfo("D"); if (drive.IsReady) Console.WriteLine($"The space available on the D:\ drive: " + $"{drive.AvailableFreeSpace}"); else Console.WriteLine("Drive D:\ is not ready.");
Notice that only the drive letter is passed in to the DriveInfo
constructor. The drive letter can be either uppercase or lowercase—it does not matter. The next thing you will notice with the code in the Solution to this recipe is that the IsReady
property is always tested for true
before either using the drive or querying its properties. If we did not test this property for true
and for some reason the drive was not ready (e.g., a CD was not in the drive at that time), a System.IO.IOException
would be returned stating “The device is not ready.” The DriveInfo
constructor was not used for the Solution to this recipe. Instead, the static GetDrives
method of the DriveInfo
class was used to return an array of DriveInfo
objects. Each DriveInfo
object in this array corresponds to one drive on the current system.
The DriveType
property of the DriveInfo
class returns an enumeration value from the DriveType
enumeration. This enumeration value identifies what type of drive the current DriveInfo
object represents. Table 8-3 identifies the various values of the DriveType
enumeration.
Enum value | Description |
---|---|
CDRom |
This can be a CD-ROM, CD writer, DVD-ROM, DVD, or Blu-ray writer drive. |
Fixed |
This is the fixed drive, such as an HDD. Note that USB HDDs fall into this category. |
Network |
A network drive. |
NoRootDirectory |
No root directory was found on this drive. |
Ram |
A RAM disk. |
Removable |
A removable storage device. |
Unknown |
Some other type of drive than those listed here. |
In the DriveInfo
class there are two very similar properties, AvailableFreeSpace
and TotalFreeSpace
. Both properties will return the same value in most cases. However, AvailableFreeSpace
also takes into account any disk-quota information for a particular drive. You can find disk-quota information by right-clicking a drive in Windows Explorer and selecting the Properties pop-up menu item. This displays the Properties page for the drive. Click the Quota tab on the Properties page to view the quota information for the drive. If the Enable Quota Management checkbox is unchecked, then disk-quota management is disabled, and the AvailableFreeSpace
and TotalFreeSpace
properties should be equal.
The “DriveInfo Class” topic in the MSDN documentation.
Use the System.IO.Compression.DeflateStream
or the System.IO.Compression.GZipStream
classes to read and write compressed data to a file using a “chunking” routine. The CompressFileAsync
, DecompressFileAsync
, and Decompress
methods shown in Example 8-5 demonstrate how to use these classes to compress and decompress files on the fly.
/// <summary> /// Compress the source file to the destination file. /// This is done in 1MB chunks to not overwhelm the memory usage. /// </summary> /// <param name="sourceFile">the uncompressed file</param> /// <param name="destinationFile">the compressed file</param> /// <param name="compressionType">the type of compression to use</param> public static async Task CompressFileAsync(string sourceFile, string destinationFile, CompressionType compressionType) { if (string.IsNullOrWhiteSpace(sourceFile)) throw new ArgumentNullException(nameof(sourceFile)); if (string.IsNullOrWhiteSpace(destinationFile)) throw new ArgumentNullException(nameof(destinationFile)); FileStream streamSource = null; FileStream streamDestination = null; Stream streamCompressed = null; int bufferSize = 4096; using (streamSource = new FileStream(sourceFile, FileMode.OpenOrCreate, FileAccess.Read, FileShare.None, bufferSize, useAsync: true)) { using (streamDestination = new FileStream(destinationFile, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None, bufferSize, useAsync: true)) { // read 1MB chunks and compress them long fileLength = streamSource.Length; // write out the fileLength size byte[] size = BitConverter.GetBytes(fileLength); await streamDestination.WriteAsync(size, 0, size.Length); long chunkSize = 1048576; // 1MB while (fileLength > 0) { // read the chunk byte[] data = new byte[chunkSize]; await streamSource.ReadAsync(data, 0, data.Length); // compress the chunk MemoryStream compressedDataStream = new MemoryStream(); if (compressionType == CompressionType.Deflate) streamCompressed = new DeflateStream(compressedDataStream, CompressionMode.Compress); else streamCompressed = new GZipStream(compressedDataStream, CompressionMode.Compress); using (streamCompressed) { // write the chunk in the compressed stream await streamCompressed.WriteAsync(data, 0, data.Length); } // get the bytes for the compressed chunk byte[] compressedData = compressedDataStream.GetBuffer(); // write out the chunk size size = BitConverter.GetBytes(chunkSize); await streamDestination.WriteAsync(size, 0, size.Length); // write out the compressed size size = BitConverter.GetBytes(compressedData.Length); await streamDestination.WriteAsync(size, 0, size.Length); // write out the compressed chunk await streamDestination.WriteAsync(compressedData, 0, compressedData.Length); // subtract the chunk size from the file size fileLength -= chunkSize; // if chunk is less than remaining file use // remaining file if (fileLength < chunkSize) chunkSize = fileLength; } } } } /// <summary> /// This function will decompress the chunked compressed file /// created by the CompressFile function. /// </summary> /// <param name="sourceFile">the compressed file</param> /// <param name="destinationFile">the destination file</param> /// <param name="compressionType">the type of compression to use</param> public static async Task DecompressFileAsync(string sourceFile, string destinationFile, CompressionType compressionType) { if (string.IsNullOrWhiteSpace(sourceFile)) throw new ArgumentNullException(nameof(sourceFile)); if (string.IsNullOrWhiteSpace(destinationFile)) throw new ArgumentNullException(nameof(destinationFile)); FileStream streamSource = null; FileStream streamDestination = null; Stream streamUncompressed = null; int bufferSize = 4096; using (streamSource = new FileStream(sourceFile, FileMode.OpenOrCreate, FileAccess.Read, FileShare.None, bufferSize, useAsync: true)) { using (streamDestination = new FileStream(destinationFile, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None, bufferSize, useAsync: true)) { // read the fileLength size // read the chunk size byte[] size = new byte[sizeof(long)]; await streamSource.ReadAsync(size, 0, size.Length); // convert the size back to a number long fileLength = BitConverter.ToInt64(size, 0); long chunkSize = 0; int storedSize = 0; long workingSet = Process.GetCurrentProcess().WorkingSet64; while (fileLength > 0) { // read the chunk size size = new byte[sizeof(long)]; await streamSource.ReadAsync(size, 0, size.Length); // convert the size back to a number chunkSize = BitConverter.ToInt64(size, 0); if (chunkSize > fileLength || chunkSize > workingSet) throw new InvalidDataException(); // read the compressed size size = new byte[sizeof(int)]; await streamSource.ReadAsync(size, 0, size.Length); // convert the size back to a number storedSize = BitConverter.ToInt32(size, 0); if (storedSize > fileLength || storedSize > workingSet) throw new InvalidDataException(); if (storedSize > chunkSize) throw new InvalidDataException(); byte[] uncompressedData = new byte[chunkSize]; byte[] compressedData = new byte[storedSize]; await streamSource.ReadAsync(compressedData, 0, compressedData.Length); // uncompress the chunk MemoryStream uncompressedDataStream = new MemoryStream(compressedData); if (compressionType == CompressionType.Deflate) streamUncompressed = new DeflateStream(uncompressedDataStream, CompressionMode.Decompress); else streamUncompressed = new GZipStream(uncompressedDataStream, CompressionMode.Decompress); using (streamUncompressed) { // read the chunk in the compressed stream await streamUncompressed.ReadAsync(uncompressedData, 0, uncompressedData.Length); } // write out the uncompressed chunk await streamDestination.WriteAsync(uncompressedData, 0, uncompressedData.Length); // subtract the chunk size from the file size fileLength -= chunkSize; // if chunk is less than remaining file use remaining file if (fileLength < chunkSize) chunkSize = fileLength; } } }
The CompressionType
enumeration is defined as follows:
public enum CompressionType { Deflate, GZip }
The CompressFileAsync
method accepts a path to the source file to compress, a path to the destination of the compressed file, and a CompressionType
enumeration value indicating which type of compression algorithm to use (Deflate or GZip). This method produces a file containing the compressed data.
The DecompressFileAsync
method accepts a path to the source compressed file to decompress, a path to the destination of the decompressed file, and a CompressionType
enumeration value indicating which type of decompression algorithm to use (Deflate or GZip).
The TestCompressNewFile
method shown in Example 8-6 exercises the CompressFileAsync
and DecompressFileAsync
methods defined in the Solution section of this recipe.
public static async void TestCompressNewFileAsync() { byte[] data = new byte[10000000]; for (int i = 0; i < 10000000; i++) data[i] = (byte)i; using(FileStream fs = new FileStream(@"C:NewNormalFile.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None, 4096, useAsync:true)) { await fs.WriteAsync(data, 0, data.Length); } await CompressFileAsync(@"C:NewNormalFile.txt", @"C:NewCompressedFile.txt", CompressionType.Deflate); await DecompressFileAsync(@"C:NewCompressedFile.txt", @"C:NewDecompressedFile.txt", CompressionType.Deflate); await CompressFileAsync(@"C:NewNormalFile.txt", @"C:NewGZCompressedFile.txt", CompressionType.GZip); await DecompressFileAsync(@"C:NewGZCompressedFile.txt", @"C:NewGZDecompressedFile.txt", CompressionType.GZip); //Normal file size == 10,000,000 bytes //GZipped file size == 84,362 //Deflated file size == 42,145 //Pre .NET 4.5 GZipped file size == 155,204 //Pre .NET 4.5 Deflated file size == 155,168 // 36 bytes are related to the GZip CRC }
When this test code is run, we get three files with different sizes. The first file, NewNormalFile.txt, is 10,000,000 bytes in size. The NewCompressedFile.txt file is 42,145 bytes. The final file, NewGzCompressedFile.txt, file is 84,362 bytes. As you can see, there is not much difference between the sizes for the files compressed with the DeflateStream
class and the GZipStream
class. The reason for this is that both compression classes use the same compression/decompression algorithm (i.e., the lossless Deflate algorithm as described in the RFC 1951: Deflate 1.3 specification).
In .NET 4.5, the GZipStream
and DeflateStream
classes have been updated to use the zlib library
behind the scenes to perform the compression, which has improved the compression ratios. You can see this if you run the older version of the CompressFile
and DecompressFile
methods on prior versions of the .NET Framework, as shown in Example 8-7.
/// <summary> /// Compress the source file to the destination file. /// This is done in 1MB chunks to not overwhelm the memory usage. /// </summary> /// <param name="sourceFile">the uncompressed file</param> /// <param name="destinationFile">the compressed file</param> /// <param name="compressionType">the type of compression to use</param> public static void CompressFile(string sourceFile, string destinationFile, CompressionType compressionType) { if (sourceFile != null) { FileStream streamSource = null; FileStream streamDestination = null; Stream streamCompressed = null; using (streamSource = File.OpenRead(sourceFile)) { using (streamDestination = File.OpenWrite(destinationFile)) { // read 1MB chunks and compress them long fileLength = streamSource.Length; // write out the fileLength size byte[] size = BitConverter.GetBytes(fileLength); streamDestination.Write(size, 0, size.Length); long chunkSize = 1048576; // 1MB while (fileLength > 0) { // read the chunk byte[] data = new byte[chunkSize]; streamSource.Read(data, 0, data.Length); // compress the chunk MemoryStream compressedDataStream = new MemoryStream(); if (compressionType == CompressionType.Deflate) streamCompressed = new DeflateStream(compressedDataStream, CompressionMode.Compress); else streamCompressed = new GZipStream(compressedDataStream, CompressionMode.Compress); using (streamCompressed) { // write the chunk in the compressed stream streamCompressed.Write(data, 0, data.Length); } // get the bytes for the compressed chunk byte[] compressedData = compressedDataStream.GetBuffer(); // write out the chunk size size = BitConverter.GetBytes(chunkSize); streamDestination.Write(size, 0, size.Length); // write out the compressed size size = BitConverter.GetBytes(compressedData.Length); streamDestination.Write(size, 0, size.Length); // write out the compressed chunk streamDestination.Write(compressedData, 0, compressedData.Length); // subtract the chunk size from the file size fileLength -= chunkSize; // if chunk is less than remaining file use // remaining file if (fileLength < chunkSize) chunkSize = fileLength; } } } } } /// <summary> /// This function will decompress the chunked compressed file /// created by the CompressFile function. /// </summary> /// <param name="sourceFile">the compressed file</param> /// <param name="destinationFile">the destination file</param> /// <param name="compressionType">the type of compression to use</param> public static void DecompressFile(string sourceFile, string destinationFile, CompressionType compressionType) { FileStream streamSource = null; FileStream streamDestination = null; Stream streamUncompressed = null; using (streamSource = File.OpenRead(sourceFile)) { using (streamDestination = File.OpenWrite(destinationFile)) { // read the fileLength size // read the chunk size byte[] size = new byte[sizeof(long)]; streamSource.Read(size, 0, size.Length); // convert the size back to a number long fileLength = BitConverter.ToInt64(size, 0); long chunkSize = 0; int storedSize = 0; long workingSet = Process.GetCurrentProcess().WorkingSet64; while (fileLength > 0) { // read the chunk size size = new byte[sizeof(long)]; streamSource.Read(size, 0, size.Length); // convert the size back to a number chunkSize = BitConverter.ToInt64(size, 0); if (chunkSize > fileLength || chunkSize > workingSet) throw new InvalidDataException(); // read the compressed size size = new byte[sizeof(int)]; streamSource.Read(size, 0, size.Length); // convert the size back to a number storedSize = BitConverter.ToInt32(size, 0); if (storedSize > fileLength || storedSize > workingSet) throw new InvalidDataException(); if (storedSize > chunkSize) throw new InvalidDataException(); byte[] uncompressedData = new byte[chunkSize]; byte[] compressedData = new byte[storedSize]; streamSource.Read(compressedData, 0, compressedData.Length); // uncompress the chunk MemoryStream uncompressedDataStream = new MemoryStream(compressedData); if (compressionType == CompressionType.Deflate) streamUncompressed = new DeflateStream(uncompressedDataStream, CompressionMode.Decompress); else streamUncompressed = new GZipStream(uncompressedDataStream, CompressionMode.Decompress); using (streamUncompressed) { // read the chunk in the compressed stream streamUncompressed.Read(uncompressedData, 0, uncompressedData.Length); } // write out the uncompressed chunk streamDestination.Write(uncompressedData, 0, uncompressedData.Length); // subtract the chunk size from the file size fileLength -= chunkSize; // if chunk is less than remaining file use remaining file if (fileLength < chunkSize) chunkSize = fileLength; } } } }
You may be wondering why you would pick one class over the other if they use the same algorithm. One good reason is that the GZipStream
class adds a CRC (cyclic redundancy check) to the compressed data to determine if it has been corrupted. If the data has been corrupted, an InvalidDataException
is thrown with the statement “The CRC in GZip footer does not match the CRC calculated from the decompressed data.” By catching this exception, you can determine if your data is corrupted.
In the Decompress
method, it’s possible for some InvalidDataException
instances to be thrown:
// read the chunk size size = new byte[sizeof(long)]; streamSource.Read(size, 0, size.Length); // convert the size back to a number chunkSize = BitConverter.ToInt64(size, 0); if (chunkSize > fileLength || chunkSize > workingSet) throw new InvalidDataException(); // read the compressed size size = new byte[sizeof(int)]; streamSource.Read(size, 0, size.Length); // convert the size back to a number storedSize = BitConverter.ToInt32(size, 0); if (storedSize > fileLength || storedSize > workingSet) throw new InvalidDataException(); if (storedSize > chunkSize) throw new InvalidDataException(); byte[] uncompressedData = new byte[chunkSize]; byte[] compressedData = new byte[storedSize];
The code is reading in a buffer that may have been tampered with, so we need to check not only for stability but also for security reasons. Since Decompress
will actually allocate memory based on the numbers derived from the buffer, it needs to be careful about what those numbers turn out to be, and we don’t want to unwittingly bring in other code that has been injected into the stream either. The very basic checks being done here are to ensure that:
The size of the chunk is not bigger than the file length.
The size of the chunk is not bigger than the current program working set.
The size of the compressed chunk is not bigger than the file length.
The size of the compressed chunk is not bigger than the current program working set.
The size of the compressed chunk is not bigger than the actual chunk size.
The “DeflateStream Class” and “GZipStream” topics in the MSDN documentation.