Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 15. File system essentials

Understanding the disk and file-system structure

Chapter 12 discussed storage management, which primarily focuses on storage technologies and techniques for configuring storage. As discussed in that chapter, disks can be apportioned in many ways but ultimately must be formatted with a particular file system. The file system provides the environment for working with files and folders. Windows Server 2012 R2 provides FAT and NTFS as the basic file-system types. These file systems and their various extensions, which include the Resilient File System (ReFS), are discussed in this chapter.

Understanding the disk and file-system structure

The basic unit of storage is a disk. Regardless of the partition style or disk type, Windows Server 2012 R2 reads data from disks and writes data to disks by using the disk input/output (I/O) subsystem. The I/O subsystem understands the physical and logical structures of disks, which enables it to perform read and write operations. The basic physical structure of a hard disk drive (HDD) includes the following items:

Platters
Cylinders
Tracks
Clusters
Sectors

Each hard disk drive has one or more platters. Platters are the physical media from which data is read and to which data is written. The disk head travels in a circular path over the platter. This circular path is called a track. Tracks are magnetically encoded when you format a disk. Tracks that reside in the same location on each platter form a cylinder. For example, if a hard disk drive has four platters, Cylinder 1 consists of Track 1 from all four platters.

Tracks are divided into sectors. Sectors represent a subsection within a track and consist of individual bytes. The number of sectors in a track depends on the hard disk drive type and the location of the track on the platter. Tracks closer to the outside of the platter can have more sectors than tracks near the center of the platter.

In contrast, solid-state drives (SSDs) have no moving parts because solid-state drives use flash memory modules rather than platters, and no disk heads need to travel over platters to read data. With solid-state drives, data is accessed directly from the flash memory over multiple internal flash buses. Typically, solid-state drives use NAND flash memory modules that have either multilevel cells (MLCs) storing two bits per cell or single-level cells (SLCs) storing one bit per cell.

Although solid-state drives that use multilevel cells might be cheaper than those with single-level cells, solid-state drives with single-level cells typically provide better reliability and performance. That said, the endurance of both types of solid-states drives isn’t as robust as hard disk drives but is improving thanks to wear-leveling algorithms and other techniques that distribute writes more evenly across memory modules, provide enhanced error correction, and might also compress data.

As with hard disks, solid-state drives rely on I/O subsystems that understand their physical and logical structure to perform read and write operations. Overall performance of solid-state drives depends on their controllers, firmware, and caching. Typically, solid-state drives are much faster than hard drives, especially for small block I/O with less than 32-kilobyte (KB) reads.

Inside OUT: When using SSDs makes sense

As solid-state drive capacities improve and prices come down, using solid-state drives in the enterprise makes more and more sense. Many solid-state-drive solutions are available for enterprise servers and enterprise workstations, including those with their own data protection similar to redundant array of independent disks (RAID) systems. Applications that require high performance for reads and have heavy read access can benefit from using solid-state drives. For example, you might want to use enterprise SSDs with applications for media streaming, web accelerators, video on demand, and frequently accessed data-storage warehouses. Solid-state drives might also make sense in storage-tiering scenarios. For example, you could use Tier 1 storage with solid-state drives for hot data and Tier 2 storage with capacity-optimized drives for cool data.

Solid-state hybrid drives bridge the gap between hard disk drives and solid-state drives. A typical hybrid drive combines a small solid-state drive with a large hard disk drive. The flash memory is used for critical operations, such as boot and initial application load, and the hard disk drive is used for all other operations. Special algorithms can be used to capture boot files and place them on the flash to ensure that the hybrid drive always boots from flash memory rather than from the spinning disk. Other algorithms also can be used to move applications and data that are initially loaded or frequently used with the system into the flash memory. The result is a drive that has some of the speed benefits of solid-state drives with a cost that is closer to that of a traditional hard disk drive.

Inside OUT: When using hybrid drives makes sense

Hybrid drives are ideal when you want to reduce boot and initial application load times but don’t want to equip your enterprise workstations with more-expensive solid-state drives. As such, you could use the hybrid drives as system boot drives and not necessarily as data drives. Keep in mind that it can take several boots for a hybrid drive to normalize the startup process and achieve the best efficiencies. The same is true for normalizing the startup of frequently used applications.

When you format a disk with a file system, the file system structures the disk using clusters, which are logical groupings of sectors. Both FAT and NTFS use the fixed sector size of the underlying disk (which can be either 512 bytes per physical sector with 512b disks or 4,096 bytes per physical sector with 512e disks) but allow the cluster size to be variable. For example, the cluster size might be 4,096 bytes, and if there are 512 bytes per sector, each cluster consists of eight sectors. ReFS is an exception. In current implementations, as of this writing, ReFS has a fixed cluster size of 64 KBs.

Table 15-1 provides a summary of the default cluster sizes for FAT, FAT32, exFAT, NTFS, and ReFS. You can specify the cluster size when you create a file system on a disk, or you can accept the default cluster size setting. Either way, the cluster sizes available depend on the type of file system you are using.

Four FAT file systems

There are four FAT file systems Windows platforms use: FAT12, FAT16, FAT32, and exFAT (FAT64). The difference among them is the number of bits used for entries in their file allocation tables—namely, 12, 16, 32, or 64 bits. From a user’s perspective, the main difference among these file systems is the theoretical maximum volume size, which is 16 megabytes (MBs) for a FAT12 volume, 4 gigabytes (GBs) for FAT16, 2 terabytes (TBs) for FAT32, and 256 TBs for FAT64. When the term FAT is used without an appended number, it always refers to FAT16.

Table 15-1. Default cluster sizes for Windows Server

Volume Size	Cluster Size
Volume Size	FAT16	FAT32	exFAT	NTFS	ReFS
7 MBs to 16 MBs	512 bytes	N/A	4 KBs	512 bytes	N/A
17 MBs to 32 MBs	512 bytes	N/A	4 KBs	512 bytes	N/A
33 MBs to 64 MBs	1 KB	512 bytes	4 KBs	512 bytes	N/A
65 MBs to 128 MBs	2 KBs	1 KB	4 KBs	512 bytes	N/A
129 MBs to 256 MBs	4 KBs	2 KBs	4 KBs	512 bytes	N/A
257 MBs to 512 MBs	8 KBs	4 KBs	32 KBs	512 bytes	N/A
513 MBs to 1024 MBs	16 KBs	4 KBs	32 KBs	1 KB	64 KBs
1025 MBs to 2 GBs	32 KBs	4 KBs	32 KBs	4 KBs	64 KBs
2 GBs to 4 GBs	64 KBs	4 KBs	32 KBs	4 KBs	64 KBs
4 GBs to 8 GBs	N/A	4 KBs	32 KBs	4 KBs	64 KBs
8 GBs to 16 GBs	N/A	8 KBs	32 KBs	4 KBs	64 KBs
16 GBs to 32 GBs	N/A	16 KBs	32 KBs	4 KBs	64 KBs
32 GBs to 2 TBs	N/A	*	128 KBs	4 KBs	64 KBs
2 TBs to 16 TBs	N/A	*	128 KBs	4 KBs	64 KBs
16 TBs to 32 TBs	N/A	*	128 KBs	8 KBs	64 KBs
32 TBs to 64 TBs	N/A	*	128 KBs	16 KBs	64 KBs
64 TBs to 128 TBs	N/A	*	128 KBs	32 KBs	64 KBs
128 TBs to 256 TBs	N/A	*	128 KBs	64 KBs	64 KBs

The important thing to know about clusters is that they are the smallest unit in which disk space is allocated. Each cluster can hold one file at most. So, if you create a 1 KB file and the cluster size is 4 KBs, 3 KBs of empty space in the cluster won’t be available to other files. That’s just the way it is. If a single cluster isn’t big enough to hold an entire file, the remaining file data will go into the next available cluster and then the next until the file is completely stored.

Although the disk I/O subsystem manages the physical structure of disks, Windows Server 2012 R2 manages the logical disk structure at the file-system level. The logical structure of a disk relates to the basic or dynamic volumes you create on a disk and the file systems with which those volumes are formatted. You can format both basic volumes and dynamic volumes by using FAT or NTFS. As discussed in the next section, each file system type has a different structure, and there are advantages and disadvantages of each.

Using FAT

FAT volumes use an allocation table to store information about disk space allocation. FAT can be used with both fixed disks and removable media. For both fixed disks and removable media, FAT is available in 16-bit and 32-bit versions, which are referred to as FAT16 and FAT32. For removable media, you can also use extended FAT (exFAT). The advantage of using exFAT with removable media instead of FAT is that exFAT can be used with any operating system or device that supports this file-system type.

File allocation table structure

Disks formatted using FAT are organized as shown in Figure 15-1. They have a boot sector that stores information about the disk type, starting and ending sectors, the active partition, and a bootstrap program that executes at startup and boots the operating system. This is followed by a reserve area that can be one or more sectors in length.

A diagram of the FAT16 volume structure.

Figure 15-1. Here is an overview of the FAT16 volume structure.

The reserve area is followed by the primary file allocation table, which provides a reference table for the clusters on the volume. Each reference in the table relates to a specific cluster and defines the cluster’s status as follows:

Available (unused)
In use (meaning a file is using it)
Bad (meaning it is marked as bad and won’t be written to)
Reserved (meaning it is reserved for the operating system)

If a cluster is in use, the cluster entry identifies the number of the next cluster in the file or indicates that it is the last cluster of a file—in which case, the end of the file has been reached.

FAT volumes also have the following features:

Duplicate file allocation table, which provides a backup of the primary file allocation table and can be used to restore the file system if the primary file allocation table becomes corrupted
Root directory table, which defines the starting cluster of each file in the root directory of the file system
Data area, which stores the actual data for user files and folders

When an application attempts to read a file, the operating system looks up the starting cluster of the file in the file allocation table and then uses the file allocation table to find and read all the clusters in the file.

FAT features

Although FAT supports basic file and folder operations, its features are rather limited. By using FAT, you have the following capabilities:

You can use Windows file sharing, and the share permissions you assign completely control remote access to files.
You can use long file names, meaning file and folder names containing up to 255 characters.
You can use FAT with floppy disks and removable disks.
You can use Unicode characters in file and folder names.
You can use uppercase and lowercase letters in file and folder names.

However, FAT has the following disadvantages:

You can’t control local access to files and folders by using Microsoft Windows file and folder access permissions.
You can’t use any advanced file-system features of NTFS, including compression, encryption, disk quotas, and remote storage.

In addition, although FAT16 and FAT32 support small cluster sizes, exFAT does not. Table 15-2 provides a summary of FAT16, FAT32, and exFAT.

Note

Although Windows Server 2012 R2 can read to or write from FAT32 volumes as large as 2 TBs, the operating system can only format FAT32 volumes up to 32 GBs in size.

Table 15-2. Comparison of FAT16, FAT32, and exFAT features

Feature	FAT16	FAT32	exFAT
File allocation table size	16-bit	32-bit	64-bit
Minimum volume size	See the following Inside Out tip	33 MBs	33 MBs
Maximum volume size	4 GBs; best at 2 GBs or less	2 TBs; limited in Windows Server to 32 GBs	256 TBs
Maximum file size	2 GBs	4 GBs	Same as volume size
Supports small cluster size	Yes	Yes	No
Supports NTFS features	No	No	No
Use on fixed disks	Yes	Yes	Yes
Use on removable disks	Yes	Yes	Yes
Supports network file sharing	Yes	Yes	Yes
Supports customized disk and folder views	Yes	Yes	Yes

By default, Windows Server sets the size of clusters and the number of sectors per cluster based on the size of the volume. Disk geometry also is a factor in determining cluster size because the number of clusters on the volume must fit into the number of bits the file system uses. The actual amount of data you can store on a single FAT volume is a factor of the maximum cluster size and the maximum number of clusters you can use per volume. This can be written out as a formula:

ClusterSize × MaximumNumberOfClusters = MaximumVolumeSize

FAT16 supports a maximum of 65,526 clusters and a maximum cluster size of 64 KBs. This is where the limitation of 4 GBs for volume size comes from. With disks less than 32 MBs but more than 16 MBs in size, the cluster size is 512 bytes and there is one sector per cluster with 512b disks. This changes as the volume size increases, up to the largest cluster size of 64 KBs with 128 sectors per cluster on 2 GB to 4 GB volumes.

FAT32 volumes using 512-byte sectors on 512b disks can be up to 2 TBs in size and can use clusters of up to 64 KBs. To control the maximum number of clusters allowed, the Windows operating system reserves the upper 4 bits, however, limiting FAT32 to a maximum 28 bits’ worth of clusters. With a maximum recommended cluster size of 32 KBs (instead of the maximum allowable 64 KBs), this means a FAT32 volume on the Windows operating system can be up to 32 GBs in size. Because the smallest cluster size allowed for FAT32 volumes is 512 bytes, the smallest FAT32 volume you can create is 33 MBs.

Inside OUT: Getting volume format and feature information

A quick way to check the file system type and available features of a volume is to type fsutil fsinfo volumeinfo DriveDesignator at the command prompt, where DriveDesignator is the drive letter of the volume followed by a colon, such as C:. For a FAT or FAT32 volume, you see output similar to the following:

Volume Name : LogData
Volume Serial Number : 0x70692a2e
Max Component Length : 255
File System Name : FAT32
Preserves Case of filenames
Supports Unicode in filenames

Using NTFS

NTFS is an extensible and recoverable file system that offers many advantages over FAT, FAT32, and exFAT. Because it is extensible, the file system can be extended over time with various revisions. As you’ll learn shortly, the version of NTFS that ships with Windows Server 2008 and Windows Server 2008 R2 was extended with new features, as was the version of NTFS that ships with Windows Server 2012, but all are designated as having the same internal version as the revision of the NTFS version that shipped with Microsoft Windows Server 2003. Because it is recoverable, volumes formatted with NTFS can be reconstructed if they contain structure errors. Typically, restructuring NTFS volumes is a task performed at startup.

NTFS structure

NTFS volumes have a very different structure and feature set from FAT volumes. The first area of the volume is the boot sector, which is located at sector 0 on the volume. The boot sector stores information about the disk layout, and a bootstrap program executes at startup and boots the operating system. A backup boot sector is placed at the end of the volume for redundancy and fault tolerance.

Instead of a file allocation table, NTFS uses a relational database to store information about files. This database is called the master file table (MFT). It stores a file record of each file and folder on the volume, pertinent volume information, and details on the MFT itself. The first 15 records in the MFT store NTFS metadata as summarized in Table 15-3.

Table 15-3. NTFS metadata

MFT Record	Record Type	File Name	Description
0	MFT	$Mft	Stores the base file record of each file and folder on the volume. As the number of files and folders grows, additional records are used as necessary.
1	MFT mirror	$MftMirr	Stores a partial duplicate of the MFT used for failure recovery. It’s also referred to as MFT2.
2	Log file	$LogFile	Stores a persistent history of all changes made to files on the volume, which can be used to recover files.
3	Volume	$Volume	Stores volume attributes, including the volume serial number, version, and number of sectors.
4	Attribute definitions	$AttrDef	Stores a table of attribute names, numbers, and descriptions.
5	Root file name index	$	Stores the details on the volume’s root directory.
6	Cluster bitmap	$Bitmap	Stores a table that details the clusters in use.
7	Boot sector	$Boot	Stores the bootstrap program on bootable volumes. Also includes the locations of the MFT and MFT mirror.
8	Bad cluster file	$BadClus	Stores a table mapping bad clusters.
9	Security file	$Secure	Stores the unique security descriptor for all files and folders on the volume.
10	Upcase table	$Upcase	Stores a table used to convert lowercase to matching uppercase Unicode characters.
11	NTFS extension file	$Extend	Stores information on enabled file-system extensions.
12–15	To be determined	To be determined	Reserved records for future use.

The MFT mirror stores a partial duplicate of the MFT that can be used to recover the MFT. If any of the records in the primary mirror become corrupted or are otherwise unreadable and there’s a duplicate record in the MFT mirror, NTFS uses the data in the MFT mirror and, if possible, uses this data to recover the records in the primary MFT. It is also important to note that the NTFS version that shipped with Windows Server 2003 and later (NTFS 5.1) has a slightly different metadata mapping from the version that originally shipped with Windows 2000 (NTFS 5.0). In NTFS 5.1, the $LogFile and $Bitmap metadata files are located in a different position on disk than they were originally. This gives a performance advantage of 5 to 8 percent to disks that are formatted under NTFS 5.1 and comes close to approximating the performance of FAT.

Note

For NTFS, you typically refer to major version numbers rather than the major version and the revision number. Technically, however, Shadow Copy is a feature of NTFS 5.1 or later. With NTFS 5.1, you gain some additional enhancements, primarily the ability to use shadow copies.

The rest of the records in the MFT store file and folder information. Each of these regular entries includes the file or folder name, security descriptor, and other attributes, including file data or pointers to file data. The MFT record size is set when a volume is formatted and can be 1,024 bytes, 2,048 bytes, or 4,096 bytes, depending on the volume size. If a file is very small, all its contents might be able to fit in the data field of its record in the MFT. When all of a file’s attributes, including its data, can be stored in the MFT record, the attributes are called resident attributes. Figure 15-2 shows an example of a small file with resident attributes.

If a file is larger than a single record, it has what are called nonresident attributes. Here, the file has a base record in the MFT that details where to find the file data. NTFS creates additional areas called runs on the disk to store the additional file data. The size of data runs depends on the cluster size of the volume. If the cluster size is 2 KBs or less, data runs are 2 KBs. If the cluster size is larger than 2 KBs, data runs are 4 KBs.

A diagram graphically depicting MFT and its records, showing an example of a small file with resident attributes.

Figure 15-2. This figure is a graphical depiction of the MFT and its records.

As Figure 15-3 shows, clusters belonging to the file are referenced in the MFT, using virtual cluster numbers (VCNs). VCNs are numbered sequentially, starting with VCN 0. The Data field in the file’s MFT record maps the VCNs to a starting logical cluster number (LCN) on the disk and details the number of clusters to read for that VCN. When these mappings use up all the available space in a record, additional MFT records are created to store the additional mappings.

A diagram graphically depicting a user file record with data runs, where clusters belonging to the file are referenced in the MFT, using virtual cluster numbers.

Figure 15-3. This figure shows a graphical depiction of a user file record with data runs.

In addition to the MFT, NTFS reserves a contiguous range of space past the end of the MFT called the MFT zone. By default, the MFT zone is approximately 12.5 percent of the total volume space. The MFT zone enables the MFT to grow without becoming fragmented. Typically, the MFT zone shrinks as the MFT grows.

The MFT zone is not used to store user data unless the remainder of the volume becomes full. Fragmentation can and still does occur, however. On volumes with lots of small files, the MFT can use up the MFT zone, and as additional files are added, the MFT has to grow into unreserved areas of the volume. On volumes with just a few large files, the unreserved space on a volume can be used up before the MFT, and in this case, the files start using the MFT zone space.

Inside OUT: The MFT zone can be optimized

By default, the MFT is optimized for environments that have a mix of large and small files. This setting works well if the average file size is 8 KBs or larger. It doesn’t work so well if a volume has many very small files, such as when the average size of files is less than 2 KBs or between 2 KBs and 7 KBs. Here, you might want to configure the volume so that it has a larger MFT zone than normal to help prevent the MFT from becoming fragmented. The MFT zone size is set as eighths of the disk.

You can determine the current MFT zone setting by typing the following command at the command prompt: fsutil behavior query mftzone. If this command returns “mftzone=0,” the MFT zone is using the default setting. The default setting, 0, specifies that the MFT zone should use one-eighth (12.5 percent) of the total volume space. This is the same as a setting of 1. You can also use a setting of 2, 3, or 4 to set the MFT zone to use two-eighths (25 percent), three-eighths (37.5 percent), or four-eighths (50 percent) of the total volume space.

You can configure the MFT zone by typing fsutil behavior set mftzone Value, where Value is the relative size setting to use, such as 2.

NTFS features

Several versions of NTFS are available. NTFS 5.1 is the version of NTFS that was first included in Windows XP and Windows Server 2003. NTFS 5.1 with Local File System (LFS) 2.0 was first included with Windows 8 and Windows Server 2012.

You have the following capabilities when you use NTFS 5.0:

Advanced file and folder access permissions
Data streams and change journals
Encrypting File System (EFS)
File sharing and full-control remote access to files and folders
Long file names, meaning file and folder names can contain up to 255 characters
Reparse points, remote storage, and shadow copies
Sparse files, disk quotas, and object identifiers
Unicode characters in file and folder names
Uppercase and lowercase letters in file and folder names

NTFS 5.1 provides some additional enhancements, primarily the ability to use shadow copies. In similar fashion, NTFS 5.1 with Local File System (LFS) 2.0 also provides some additional enhancements, primarily related to self-healing technology used with Check Disk (Chkdsk.exe) and the update sequence number (USN) change journal. Specifically, NTFS and ReFS use version 2.0 change-journal records by default, which contain 64-bit identifiers. ReFS also implements version 3.0 records, which contain 128-bit identifiers.

Windows Server automatically sets the size of clusters and the number of sectors per cluster based on the size of the volume. Cluster sizes range from 512 bytes to 64 KBs. As with FAT, NTFS has the following characteristics:

Disk geometry also is a factor in determining cluster size because the number of clusters on the volume must fit into the number of bits the file system uses.
The actual amount of data you can store on a single NTFS volume is a factor of the maximum cluster size and the maximum number of clusters you can use per volume.

Thus, although volumes have a specific maximum size, the cluster size used on a volume can be a limiting factor. For example, a dynamic volume with a 4 KB cluster size can have dynamic volumes up to 16 TBs, which is different from the maximum allowed dynamic volume size on NTFS.

Analyzing the NTFS structure

If you want to examine the structure of a volume formatted using NTFS, you can use the FSUtil FSinfo command to do this. Type fsutil fsinfo ntfsinfo DriveDesignator at the command prompt, where DriveDesignator is the drive letter of the volume followed by a colon. For example, if you want to obtain information on the C drive, you type

fsutil fsinfo ntfsinfo c:

The output would be similar to the following:

NTFS Volume Serial Number :       0xbcf4c873f4c82125
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000001d3c57ff
Total Clusters :                  0x0000000003a78aff
Free Clusters  :                  0x00000000035e477d
Total Reserved :                  0x000000000001e2b0
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000004e00000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c4e00
Mft Zone End   :                  0x00000000000cc820
Resource Manager Identifier :     CBDD98AD-E33F-11E1-95F2-C407271F80D4

As Table 15-4 shows, FSUtil FSinfo provides detailed information on the NTFS volume structure, including space usage and configuration.

Table 15-4. Details from FSUtil FSinfo

Field	Description
NTFS Volume Serial Number	The unique serial number of the selected NTFS volume.
NTFS Version	The internal NTFS version. Here, 3.1 refers to NTFS 5.1.
Number Sectors	The total number of sectors on the volume in hexadecimal.
Total Clusters	The total number of clusters on the volume in hexadecimal.
Free Clusters	The number of unused clusters on the volume in hexadecimal.
Total Reserved	The total number of clusters reserved for NTFS metadata.
Bytes Per Sector	The number of bytes per sector.
Bytes Per Cluster	The number of bytes per cluster.
Bytes Per FileRecord Segment	The size of MFT file records.
Clusters Per FileRecord Segment	The number of clusters per file record segment, which is valid only if the file record size is as large as or larger than the volume cluster size.
Mft Valid Data Length	The current size of the MFT.
Mft Start Lcn	The location of the first LCN on the disk the MFT uses.
Mft2 Start Lcn	The location of the first LCN on the disk the MFT mirror uses.
Mft Zone Start	The cluster number that marks the start of the region on the disk the MFT reserves.
Mft Zone End	The cluster number that marks the end of the region on the disk the MFT reserves.

Using FSUtil, you can also obtain detailed statistics on NTFS metadata and user file usage since a system was started. To view this information, type fsutil fsinfo statistics DriveDesignator at the command prompt, where DriveDesignator is the drive letter of the volume followed by a colon. For example, if you want to obtain information on the C drive, you type:

fsutil fsinfo statistics c:

The output is shown in two sections. The first section of the statistics details user file and disk activity and the overall usage of NTFS metadata. As shown in this example, the output shows the number of reads and writes and the number of bytes read or written:

File System Type :     NTFS
UserFileReads :        31441
UserFileReadBytes :    857374720
UserDiskReads :        31584
UserFileWrites :       6302
UserFileWriteBytes :   197198336
UserDiskWrites :       6505
MetaDataReads :        3168
MetaDataReadBytes :    21770240
MetaDataDiskReads :    4165
MetaDataWrites :       3883
MetaDataWriteBytes :   16805888
MetaDataDiskWrites :   4644

The second section of the statistics details usage of individual NTFS metadata files. As shown in this example, the output details the number of reads and writes and the number of bytes read or written for each NTFS metadata file:

MftReads :             2962
MftReadBytes :         12132352
MftWrites :            2460
MftWriteBytes :        10465280
Mft2Writes :           0
Mft2WriteBytes :       0
RootIndexReads :       0
RootIndexReadBytes :   0
RootIndexWrites :      0
RootIndexWriteBytes :  0
BitmapReads :          8
BitmapReadBytes :      8388608
BitmapWrites :         847
BitmapWriteBytes :     3796992
MftBitmapReads :       1
MftBitmapReadBytes :   65536
MftBitmapWrites :      107
MftBitmapWriteBytes :  442368
UserIndexReads :       1086
UserIndexReadBytes :   4448256
UserIndexWrites :      711
UserIndexWriteBytes :  3153920
LogFileReads :         8
LogFileReadBytes :     32768
LogFileWrites :        5895
LogFileWriteBytes :    36777984
LogFileFull :          0

Advanced NTFS features

NTFS has many advanced features that administrators should know about and understand. These features include the following:

Hard links
Data streams
Change journals
Object identifiers
Reparse points
Sparse files
Transactions

Each of these features is discussed in the sections that follow.

Hard links

Every file created on a volume has a hard link. The hard link is the directory entry for the file, and it is what enables the operating system to find files within folders. On NTFS volumes, files can have multiple hard links. This allows a single file to appear in the same directory with multiple names or to appear in multiple directories with the same name or different names. As with file copies, applications can open a file by using any of the hard links you’ve created and can modify the file. If you use another hard link to open the file in another application, the application can detect the changes.

Wondering why you’d want to use hard links? Hard links are useful when you want the same file to appear in several locations. For example, you might want a document to appear in a folder of a network share that is available to all users but have an application that requires the document to be in another directory so that it can be read and processed on a daily basis. Rather than moving the file to the application directory and giving every user in the company access to this protected directory, you decide to create a hard link to the document so that it can be accessed separately by both users and the application.

Regardless of how many hard links a file has, the related directory entries all point to the single file that exists in one location on the volume—and this is how hard links differ from copies. With a copy of a file, the file data exists in multiple locations. With a hard link, the file appears in multiple locations but exists in only one location. Thus, if you modify a file by using one of its hard links and save the file, and then someone opens the file using a different hard link, the changes are shown.

Note

Hard links have advantages and disadvantages. Hard links are not meant for environments in which multiple users can modify a file simultaneously. If Sandra opens a file using one hard link and is working on the file at the same time Bob is working on the file, there can be problems if they both try to save changes. Although this is a disadvantage of hard links, the big advantage of hard links shouldn’t be overlooked: if a file has multiple hard links, the file will not be deleted from the volume until all hard links are deleted. This means that if someone accidentally were to delete a file that had multiple hard links, the file wouldn’t actually be deleted. Instead, only the affected hard link would be deleted. Any other hard links and the file itself would remain.

Because there is only one physical copy of a file with multiple hard links, the hard links do not have separate security descriptors. Only the source file has security descriptors. Thus, if you were to change the access permissions of a file by using any of its hard links, you would actually change the security of the source file and all hard links that point to this file would have these security settings.

You can create hard links by using the FSUtil Hardlink command. Use the following syntax:

fsutil hardlink create NewFilePath CurrentFilePath

Here, NewFilePath is the file path for the hard link you want to create, and CurrentFilePath is the name of the existing file to which you are linking. For example, if the file ChangeLog.doc is found in the file path C:CorpDocs and you want to create a new hard link to this file with the file path C:UserDataLogsCurrentLog.doc, you would type

fsutil hardlink create C:UserDataLogsCurrentLog.doc C:CorpDocsChangeLog.doc

Hard links can be created only on NTFS volumes, and you cannot create a hard link on one volume that refers to another volume. Following this logic, you couldn’t create a hard link to the D drive for a file created on the C drive.

Data streams

Every file created on a volume has a data stream associated with it. A data stream is a sequence of bytes that contains the contents of the file. The main data stream for a file is unnamed and is visible to all file systems. On NTFS volumes, files can also have named data streams associated with them. Named data streams contain additional information about a file, such as custom properties or summary details. This enables you to associate additional information with a file but still manage the file as a single unit.

After you create a named data stream and associate it with a file, any applications that know how to work with named data streams can access the streams by their names and read the additional details. Many applications support named data streams, including Microsoft Office, Adobe Acrobat, and other productivity applications. This is how you can set summary properties for a Microsoft Word document—such as Title, Subject, and Author—and save that information with the file.

In fact, if you press and hold or right-click any file on an NTFS volume, select Properties, and then tap or click the Details tab, you can see information that is associated with the file using a data stream, as shown in Figure 15-4.

A screen shot of the Properties dialog box of a file on an NTFS volume, showing a named data stream on the Details tab.

Figure 15-4. Information entered on the Details tab is saved to a named data stream.

Generally speaking, the named data streams associated with a file are used to set the names of its property tabs and to populate the fields of those tabs. This is how other tabs can be associated with some document types and how the Windows operating system can store a thumbnail image within an NTFS file containing an image.

The most important thing to know about streams is that they aren’t supported on FAT. If you move or copy a file containing named streams to a FAT volume, you might see the warning prompt labeled “Confirm Stream Loss” telling you additional information is associated with the file and asking you to confirm that it’s okay that the file is saved without this information. If you tap or click Yes, only the contents of the file are copied or moved to the FAT volume—and not the contents of the associated data streams. If you tap or click No, the copy or save operation is canceled.

In a file’s Properties dialog box on the Details tab, you also have the option of removing properties and personal information associated with a file. You do this by tapping or clicking the Remove Properties And Personal Information link and then selecting a Remove Properties method. Windows accomplishes this task by removing the values from the related data streams associated with the file.

Change journals

An NTFS volume can use an update sequence number (USN) change journal. A change journal provides a complete log of all changes made to the volume. It records additions, deletions, and modifications regardless of who made them or how the additions, deletions, and modifications occurred. As with system logs, the change log is persistent, so it isn’t reset if you shut down and restart the operating system. The operating system writes records to the NTFS change log when an NTFS checkpoint occurs. The checkpoint tells the operating system to write changes that would enable NTFS to recover from failure to a particular point in time.

The change journal is enabled when you install certain services, including distributed file system (DFS). Domain controllers and any other computer in the domain that uses these services rely heavily on the change journal. The change journal enables these services to be very efficient at determining when files, folders, and other NTFS objects have been modified. Rather than checking time stamps and registering for file notifications, these services perform direct lookups in the change journal to determine all the modifications made to a set of files. Not only is this faster, it also uses system resources more efficiently.

You can gather summary statistics about the change journal by typing fsutil usn queryjournal DriveDesignator at the command prompt, where DriveDesignator is the drive letter of the volume followed by a colon. For example, if you want to obtain change journal statistics on the C drive, you type:

fsutil usn queryjournal c:

The output is similar to the following:

Usn Journal ID   : 0x01cd77459da4462a
First Usn        : 0x0000000000000000
Next Usn         : 0x0000000002573bf8
Lowest Valid Usn : 0x0000000000000000
Max Usn          : 0x7fffffffffff0000
Maximum Size     : 0x0000000020000000
Allocation Delta : 0x0000000000400000
Minimum record version supported : 2
Maximum record version supported : 2

The details show the following information:

Usn Journal ID. The unique identifier of the current change journal. A journal is assigned an identifier on creation and can be stamped with a new ID. NTFS and ReFS use this identifier for an integrity check.
First Usn. The number of the first record that can be read from the journal.
Next Usn. The number of the next record to be written to the journal.
Lowest Valid Usn. The first record that was written into this journal instance. If a journal has a First Usn value lower than the Lowest Valid Usn, the journal has been stamped with a new identifier since the last USN was written (and this could indicate a discontinuity where changes to some or all files or directories on the volume might have occurred but are not recorded in the change journal).
Max Usn. The highest USN that can be assigned.
Maximum Size. The maximum size in bytes that the change journal can use. On NTFS, if the change journal exceeds this value, older entries are overwritten by truncating the journal at the next NTFS checkpoint.
Allocation Delta. On NTFS, the size in bytes of disk memory that is added to the end and removed from the beginning of the change journal when it becomes full. This is not used with ReFS.
Minimum Record Version Supported. The minimum supported version of USN records, as supported by the file system.
Maximum Record Version Supported. The maximum supported version of USN records, as supported by the file system.

Individual records written to the change journal look like this:

File Ref#       :                                  0x18e90000000018e9
ParentFile Ref# :                                  0x17c00000000017c0
Usn             :                                  0x0000000000000000
SecurityId      :                                  0x00000119
Reason          :                                  0x00000000
Name (024)      :                                  ocmanage.dll

The most important information here is the name of the affected file and the security identifier of the object that made the change. You can get the most recent change journal entry for a file by typing fsutil usn readdata FilePath, where FilePath is the name of the file for which you want to retrieve change information. For example, if you want to obtain the most recent change journal information on a file with the path C:DomainComputers.txt, you type:

fsutil usn readdata c:domaincomputers.txt

The output is similar to the following:

Major Version    :                                 0x2
Minor Version    :                                 0x0
FileRef#         :                                 0x000800000001c306
Parent FileRef#  :                                 0x0005000000000005
Usn              :                                 0x00000000237cf7f0
Time Stamp       :                                 0x0000000000000000
Reason           :                                 0x0
Source Info      :                                 0x0
Security Id      :                                 0x45e
File Attributes  :                                 0x20
File Name Length :                                 0x26
File Name Offset :                                 0x3c
FileName         :                                 domaincomputers.txt

This data shows the file’s reference number in the root file index and that of its parent. It also shows the current USN associated with the file and the file attributes flag. The File Name Length element shows the total length in characters of the file’s long and short file names together. This particular file has a file name length of 38 (0×26). That’s because the file name has more than eight characters followed by a dot and a three-letter extension. This means the file is represented by NTFS, using long and short file names. The long file name is domaincomputers.txt. This is followed by an offset pointer that indicates where the short file name, domain~1.txt, can be looked up, which is where the total file name length of 38 characters comes from.

Note

You can examine a file’s short file name by typing dir /x FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as: dir /x c:domaincomputers.txt.

Important

Version 2 records will have a 64-bit FileReferenceNumber and a 64-bit ParentFileReferenceNumber. Version 3 records will have a 128-bit FileReferenceNumber and a 128-bit ParentFileReferenceNumber.

Object identifiers

Another feature of NTFS is the ability to use object identifiers. Object identifiers are 16 bytes in length and are unique on a per-volume basis. Any file that has an object identifier also has the following:

Birth volume identifier (BirthVolumeID), which is the object identifier for the volume in which the file was originally created
Birth object identifier (BirthObjectID), which is the object identifier assigned to the file when it was created
Domain identifier (DomainID), which is the object identifier for the domain in which the file was created

These values are also 16 bytes in length. If a file is moved within a volume or moved to a new volume, it is assigned a new object identifier, but information about the original object identifier assigned when the object was created can be retained by using the birth object identifier.

Several system services use object identifiers to identify files uniquely and identify the volumes with which they are associated. The Distributed Link Tracking (DLT) Client service uses object identifiers to track linked files that are moved within an NTFS volume, to another NTFS volume on the same computer, or to an NTFS volume on another computer.

Any file the DLT Client service uses has an object identifier field set containing values for the object ID, birth volume ID, birth object ID, and domain ID. The actual field set looks like this:

Object ID :                            52eac013e3d34445334345453533ab3d
BirthVolume ID :                       a23bc3243a5a3452d32424332c32343d
BirthObject ID :                       52eac013e3d34445334345453533ab3d
Domain ID :                            00000000000000000000000000000000

Here, the file has a specific object ID, birth volume ID, and birth object ID. The domain ID isn’t assigned, however, because this is not currently used. You can tell that the the DLT Client service uses the file because the birth volume ID and birth object ID have been assigned and these identifiers are used only by this service. Because the birth volume ID and birth object ID remain the same even if a file is moved, the DLT Client service uses these identifiers to find files no matter where they have been moved.

If you are trying to determine whether the DLT Client service uses a file, you could use the FSUtil ObjectID command to see whether the file has an object identifier field set. Type fsutil objectid query FilePath at the command prompt, where FilePath is the path to the file or folder you want to examine. If the file has an object identifier field set, it is displayed. If a file doesn’t have an object identifier field set, an error message appears, stating, “The specified file has no object ID.”

Reparse points

On NTFS volumes, a file or folder can contain a reparse point. Reparse points are file system objects with special attribute tags that are used to extend the functionality in the I/O subsystem. When a program sets a reparse point, it stores an attribute tag and a data segment. The attribute tag identifies the purpose of the reparse point and details how the reparse point is to be used. The data segment provides any additional data needed during reparsing.

Reparse points are used for directory junction points and volume mount points. Directory junctions enable you to create a single local namespace by using local folders, local volumes, and network shares. Mount points enable you to mount a local volume to an empty NTFS folder. Both directory junction points and volume mount points use reparse points to mark NTFS folders with surrogate names.

When a file or folder containing a reparse point used for a directory junction point or a volume mount point is read, the reparse point causes the path to be reparsed and a surrogate name to be substituted for the original name. For example, if you were to create a mount point with the file path C:Data that is used to mount a hard disk drive, the reparse point is triggered whenever the file system opens C:Data and points the file system to the volume you mounted in that folder. The actual attribute tag and data for the reparse point would look similar to the following:

Reparse Tag Value :  0xa0000003
Tag value : Microsoft
Tag value : Name Surrogate
Tag value : Mount Point
Substitute Name offset :   0
Substitute Name length :   98
Print Name offset :  100
Print Name Length :  0
Substitute Name :  ??Volume{3796c3c1-5106-11d7-911c-806d6172696f}
Reparse Data Length : 0x0000006e
Reparse Data :
0000 : 00 00 62 00 64 00 00 00  5c 00 3f 00 3f 00 5c 00  ..b.d....?.?..
0010 : 56 00 6f 00 6c 00 75 00  6d 00 65 00 7b 00 33 00  V.o.l.u.m.e.{.3.
0020 : 37 00 39 00 36 00 63 00  33 00 63 00 31 00 2d 00  7.9.6.c.3.c.1.-.
0030 : 35 00 31 00 30 00 36 00  2d 00 31 00 31 00 64 00  5.1.0.6.-.1.1.d.
0040 : 37 00 2d 00 39 00 31 00  31 00 63 00 2d 00 38 00  7.-.9.1.1.c.-.8.
0050 : 30 00 36 00 64 00 36 00  31 00 37 00 32 00 36 00  0.6.d.6.1.7.2.6.
0060 : 39 00 36 00 66 00 7d 00  5c 00 00 00 00 00        9.6.f.}......

The reparse attribute tag is defined by the first series of values, which identifies the reparse point as a Microsoft Name Surrogate Mount Point and specifies the surrogate name to be substituted for the original name. The reparse data follows the attribute tag values and, in this case, provides the fully expressed surrogate name.

Reparse points are also used by file-system filter drivers to mark files so that they are used with that driver. When NTFS opens a file associated with a file-system filter driver, it locates the driver and uses the filter to process the file as directed by the reparse information. Reparse points are used in this way to implement Remote Storage, which is discussed in the next section.

Sparse files

Often, scientific or other data collected through sampling is stored in large files that are primarily empty except for sparsely populated sections that contain the actual data. For example, a broad-spectrum signal recorded digitally from space might have only several minutes of audio for each hour of actual recording. In this case, a multiple-gigabyte audio file such as the one depicted in Figure 15-5 might have only a few gigabytes of meaningful information. Because there are large sections of empty space and limited areas of meaningful data, the file is said to be sparsely populated and can be referred to as a sparse file.

Figure 15-5. This figure shows sparse file usage.

Stored normally, the file would use 20 GBs of space on the volume. If you mark the file as sparse, however, NTFS allocates space only for actual data and marks empty space as unallocated. In other words, any meaningful or nonzero data is marked as allocated and written to disk, and any data composed of zeros is marked as unallocated and is not explicitly written to disk. In this example, this means the file uses only 5 GBs of space, which is marked as allocated, and has unallocated space of 15 GBs.

For unallocated space, NTFS records only information about how much unallocated space there is, and when you try to read data in this space, it returns zeros. This enables NTFS to store the file in the smallest amount of disk space possible while still being able to reconstruct the file’s allocated and unallocated space.

In theory, all this works great, but it is up to the actual program working with the sparse file to determine which data is meaningful and which isn’t. Programs do this by explicitly specifying the data for which space should be allocated. In Windows Server 2012 R2, several services use sparse files. One of these is the Indexing Service, which stores its catalogs as sparse files.

Using the FSUtil Sparse command, you can easily determine whether a file has the sparse attribute set. Type fsutil sparse queryflag FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as:

fsutil sparse queryflag c:datacatalog.wci0010002.ci

If the file has the sparse attribute, this command returns:

This file is set as sparse

You can examine sparse files to determine where the byte ranges that contain meaningful (nonzero) data are located by using FSUtil Sparse as well. Type fsutil sparse queryrange FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as:

fsutil sparse queryrange c:datacatalog.wci0010002.ci

The output is the byte ranges of meaningful data within the file, such as:

sparse range [0] [28672]

In this particular case, the output specifies that there’s meaningful data from the start of the file to byte 28672. You can mark files as sparse as well. Type fsutil sparse setflag FilePath at the command prompt, where FilePath is the path to the file you want to mark as sparse.

Transactional NTFS

Windows Server 2012 R2 supports transactional NTFS and Self-Healing NTFS. Transactional NTFS allows file operations on an NTFS volume to be performed transactionally. This means programs can use a transaction to group sets of file and registry operations so that all of them succeed or none of them succeed. While a transaction is active, changes are not visible outside the transaction. Changes are committed and written fully to disk only when a transaction is completed successfully. If a transaction fails or is incomplete, the program rolls back the transactional work to restore the file system to the state it was in prior to the transaction.

Transactions that span multiple volumes are coordinated by the Kernel Transaction Manager (KTM). The KTM supports the independent recovery of volumes if a transaction fails. The local resource manager for a volume maintains a separate transaction log and is responsible for maintaining threads for transactions separate from threads that perform the file work.

By using the FSUtil Transaction command, you can easily determine transactional information. You can list currently running transactions by typing fsutil transaction list at the command prompt. You can display transactional information for a specific file by typing fsutil transaction fileinfo FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as:

fsutil transaction fileinfo c:journalls-dts.mdb

Traditionally, you had to use the Check Disk tool to fix errors and inconsistencies in NTFS volumes on a disk. Because this process can disrupt the availability of Windows systems, Windows Server 2012 R2 uses Self-Healing NTFS to protect file systems without having to use separate maintenance tools to fix problems. Because much of the self-healing process is enabled and performed automatically, you might need to perform volume maintenance manually only when the operating system notifies you that a problem cannot be corrected automatically. If such an error occurs, Windows Server 2012 R2 notifies you about the problem and provides possible solutions.

That said, with Windows 8.1 and Windows Server 2012 R2, self-healing has been enhanced and extended to work better with Check Disk. These improvements enable you to use Check Disk to correct many types of inconsistencies and errors on live (online) volumes, whereas Check Disk previously could perform these types of corrections only with offline volumes.

By using Self-Healing NTFS, the file system is always available and does not need to be corrected offline (in most cases). Self-Healing NTFS does the following:

Attempts to preserve as much data as possible if corruption occurs, and reduces failed file-system mounting that previously could occur if a volume was known to have errors or inconsistencies. Self-Healing NTFS can repair a volume immediately so that it can be mounted.
Reports changes made to the volume during repair through existing Chkdsk.exe mechanisms, directory notifications, and USN journal entries. This feature also enables authorized users and administrators to monitor repair operations through status messages.
Can recover a volume if the boot sector is readable but does not identify an NTFS volume. In this case, you must run an offline tool that repairs the boot sector and then allow Self-Healing NTFS to initiate recovery.

Although Self-Healing NTFS can correct many types of inconsistencies and errors automatically, some issues can be resolved only by running Check Disk (Chkdsk.exe) and allowing Check Disk to work with NTFS to resolve the problems, as discussed earlier in this chapter under “NTFS features.”

Inside OUT: Understanding journaling and torn writes

NTFS relies on a journal of transactions to ensure consistency. NTFS updates metadata in place on the disk and uses a journal to track changes, which allows rollback to occur on errors and during recovery. Maintaining metadata in place offers advantages for read performance but can cause writes that are randomized. Updates to a disk can corrupt previously written metadata if power is lost at the time of the write. This is also known as a torn write.

Using ReFS

Resilient File System (ReFS), the next-generation file system available with Windows Server 2012 R2, is built on the foundation of NTFS and designed specifically for storage technologies. As such, many of its best features are available only when the file system is used with the new storage technology from Microsoft called Storage Spaces. Although ReFS is not available for Windows desktop operating systems at the time of this writing, Windows desktop operating systems can access data stored on ReFS volumes just as they do data shared from NTFS volumes.

ReFS features

As Table 15-5 shows, ReFS maintains compatibility with key aspects of NTFS, particularly when it comes to security features such as access permissions and share permissions. However, ReFS diverges when it comes to extended features, including support for compression, encryption, and disk quotas. Furthermore, you cannot boot from ReFS or use ReFS with removable media.

Table 15-5. Comparing NTFS and ReFS

Feature	NTFS	ReFS
Preserves and enforces access control lists (ACLs)	Yes	Yes
Preserves the case of file names	Yes	Yes
Supports ACLs	Yes	Yes
Supports BitLocker encryption	Yes	Yes
Supports booting from the file system	Yes	No
Supports case-sensitive file names	Yes	Yes
Supports disk quotas	Yes	No
Supports Encrypted File System	Yes	No
Supports extended attributes	Yes	No
Supports file-based compression	Yes	No
Supports hard links	Yes	No
Supports named streams	Yes	No
Supports object identifiers	Yes	No
Supports opening by FileID	Yes	Yes
Supports removable media	Yes	No
Supports reparse points	Yes	Yes
Supports shadow copies	Yes	Yes
Supports short names	Yes	No
Supports sparse files	Yes	Yes
Supports Unicode in file names	Yes	Yes
Supports user data transactions	Yes	No
Supports USN journal	Yes	Yes
Supports volume snapshots	Yes	Yes

Not only are the transactional and self-healing features of NTFS important components of ReFS, but ReFS extends these features in several ways to allow for the automatic verification and online correction of data. ReFS avoids the possibility of torn writes by not writing metadata in place and optimizes for extreme scale by using scalable structures. To provide full end-to-end resilience, ReFS integrates fully with Storage Spaces. This integration does the following:

Allows for large volume, file, and directory sizes
Provides data striping for performance and redundancy for fault tolerance
Provides disk scrubbing and salvage to provide online protection against latent disk errors
Ensures metadata integrity with checksums
Provides pooling and virtualizing storage with load balancing and sharing across servers
Provides optional user data integrity by using integrity streams
Uses copy on write for improved disk update performance

ReFS reuses the code that implements the file-system semantics of NTFS to ensure compatibility with existing file-system application programming interfaces (APIs). This ensures that the core of the file-system interface is the same and that file operations—including read, write, open, change notification, and close—work in exactly the same way. When working with ReFS, Windows maintains the in-memory file and volume state, enforces security, and maintains memory caching and the synchronization of file data in exactly the same way as with NTFS.

ReFS structures

Where NTFS and ReFS differ greatly is in the on-disk store engine underneath the file-system interface. The on-disk store engine is what implements the on-disk structures such as the MFT. As discussed earlier in the chapter, the MFT represents files and directories by storing a file record of each file and folder on the volume along with pertinent volume information and details on the MFT itself.

The on-disk store engine for NTFS is NTFS.SYS. The on-disk store engine for ReFS is REFS.SYS. REFS.SYS was designed specifically for ReFS.

ReFS uses B+ tree structures to represent all information on the disk. B+ trees scale well from very small, compact structures to very large, multilevel structures, and using B+ trees simplifies the architecture and reduces the size of the code base.

The on-disk store engine uses enumerable tables with sets of key-value pairs. Access into most tables is provided by a unique object identifier, which is stored in a special object table that forms the base of the B+ tree.

The object table at the base of the B+ tree contains a disk offset and checksum for each unique object ID. This makes the object table the root of all structures within the file system. The entries in the object table refer to directories and global system metadata.

As shown in Figure 15-6, directories are represented as tables rooted within the object table. Each directory has an object identifier that acts as a key in the object table, and it has a corresponding value that provides a disk offset for where the table is found on the volume along with a checksum. The directory table contains rows that identify the files in the directory by file name and metadata. File metadata, in turn, identifies file attributes and their actual values. Among these values is a table of offset mappings to file extents. This table contains rows identifying file extents, paired with values that provide the disk offset location for each file extent and an optional checksum. Each file extent contains a section of the data for the parent file.

Inside OUT: Metadata checksums in ReFS

To ensure data integrity, checksums are used with all ReFS metadata. The checksum is stored at the level of a B+ tree page and stored independently of the page itself. Storing the checksum in this way ensures that just about every form of disk corruption can be detected. In addition, by using optional integrity streams, checksums can be added to ensure the integrity of file contents. When integrity streams are enabled, ReFS uses an allocate-on-write approach by which file changes are always written to a location different from the original one. Allocate-on-write ensures that preexisting data is not lost due to a new write. The checksum is updated with the data write to ensure that a consistently verifiable version of a file is always available and that errors and disk corruption can be detected. Because integrity streams reallocate blocks every time file content is changed, they’re not appropriate for some applications, such as database systems. Why? Some applications maintain their own checksums of file content and can independently verify and correct data by using the APIs available for Storage Spaces.

A diagram of file structures in ReFS, where directories are represented as tables rooted within the object table.

Figure 15-6. This figure shows file structures in ReFS.

Put another way, directories are represented as tables in the file structure. Files are embedded within rows of a directory table and are themselves tables containing rows of file metadata. The file metadata also represented as a table has a row for each file attribute paired with the related value. Within the file metadata is an embedded table containing rows that identify file extents and provide offset locations to the extents on the volume along with optional checksums.

Other global structures are represented within the file system as tables as well. As an example, ACLs are represented as tables rooted within the object table.

ReFS advantages

ReFS supports file sizes up to 264 – 1 bytes, 264 files in a directory, 264 directories on a volume, and volume sizes up to 278 bytes using 16 KB cluster sizes (in contrast, Windows stack addressing allows 264 bytes). Because B+ trees scale with extreme efficiency, ReFS volumes can perform well whether they contain very large directories, very large files, or both. Disk space allocation is managed using a hierarchical allocator. This allocator represents free space as tables of free-space ranges. Each table has a different level of granularity so that large free-space ranges can be allocated as easily as medium or small free-space ranges, and all are relative to the volume size and available free space.

Note

ReFS supports large numbers of files and directories by using 128-bit file identifiers. ReFS returns a 128-bit file identifier associated with an opened handle along with the 64-bit volume identifier. For backward compatibility, a 64-bit file identifier can be obtained from the API, but applications making incorrect calls into this API might crash.

Important

ReFS uses hierarchical allocators to find optimal allocation quickly. Having a hierarchical allocation system allows related metadata blocks to be placed closer to one another naturally. By consulting the proper layer of the allocator hierarchy, ReFS can quickly determine the best possible placement for small, medium, or large allocations.

One of the disadvantages of NTFS is that metadata is maintained in place, and this can result in writes that are randomized and in torn writes. ReFS improves reliability and eliminates torn writes by using an allocate-on-write approach. Here, rather than updating metadata in place, the file system writes it to a different location. This technique sometimes is also referred to as shadow paging. The transaction architecture, derived from NTFS, is built on top of the allocate-on-write framework to provide failure recovery.

ReFS allocates metadata by using B+ tree structures that allow for fewer, larger reads and writes. It does this by combining related data, such as stream allocations, file attributes, file names, and directory pages. The approach offers read/write efficiencies whether hard disk drives or solid-state drives are used.

ReFS and Storage Spaces were designed to work together. Using mirroring or disk striping with parity, Storage Spaces can safeguard data against disk failures by maintaining copies of data on multiple disks. Whether you are using NTFS or ReFS, these multiple copies of data enable Storage Spaces to correct read failures by reading alternate copies of data, to correct write failures by reallocating data transparently, and to correct complete media loss on read/write. Storage Spaces gains efficiencies with ReFS when it comes to detecting data corruption and lost and misdirected writes. Here, ReFS can detect metadata corruption and lost and misdirected writes by using its checksums and then interface with Storage Spaces to read all the available copies of metadata and choose the correct one by validating the checksum. Next, ReFS instructs Storage Spaces to fix the bad metadata by using the good copies of the metadata. The error detection and correction happens transparently. When integrity streams are enabled for files, this automatic error detection and correction process is applied to each individual extent of a file as well.

Important

There is a small CPU overhead for computing checksums and a small additional overhead for storing updated checksums with new data. That said, ReFS uses checksums to detect data corruption and log related events that can help you identify the corruption. Redundant Storage Spaces can correct corruption ReFS detects by using good copies of data to repair bad copies of data.

ReFS integrity streams, data scrubbing, and salvage

ReFS supports two types of data streams: conventional streams and integrity streams. Conventional streams behave identically to NTFS streams but might have metadata associated with them that is integrity protected. Integrity streams are streams that are integrity protected, meaning data is checksummed and updates to data are handled using copy-on-write.

With ReFS, keep in mind that integrity is an attribute that can be applied to files and directories. When a file or directory has the (FILE_ATTRIBUTE_INTEGRITY_STREAM) integrity attribute, it uses integrity streams to protect against data corruption. Only Storage Spaces with redundancy has integrity streams enabled by default.

The integrity attribute is inheritable. When you enable the integrity attribute on a directory, the attribute is inherited by all files and directories created in the directory. Because of this, if you enable the integrity attribute on the root directory of a volume, you can ensure that every file and directory on the volume uses integrity streams by default.

You can enable integrity streams on the root directory of a volume when you format it. Use the following command syntax:

format /fs:refs /i:enable Volume

Here, Volume is the drive designator for the volume to format, such as:

format /fs:refs /i:enable m:

For empty files, the integrity attribute can be set and unset. For nonempty files, the integrity attribute can be removed only by moving the file to a file system that doesn’t support integrity, such as NTFS.

ReFS safeguards against data loss as a result of parts of a volume becoming corrupted over time by periodically scrubbing all metadata and integrity stream data. Data is scrubbed by reading all the redundant copies and validating their corrections by using checksums. If checksums do not match, bad copies are repaired using good copies. Typically, this automatic process occurs only with Storage Spaces that have redundancy enabled.

If metadata or data corruption cannot be automatically repaired, ReFS performs a salvage operation to remove the corrupt metadata or data from the namespace. The salvage operation ensures that irreparable corruption cannot adversely affect sound data. For example, the file system cannot open or delete a corrupt file or directory. By removing the corrupt file or directory, ReFS ensures that an administrator can recover the file or directory from backup or have an application re-create it. When ReFS is running on top of redundant Storage Spaces with integrity streams, an automatic error-detection and correction process, applied to each extent of a file, can recover file and directory data.

Inside OUT: ReFS and Storage Spaces

Note that when ReFS is running on top of Storage Spaces, Storage Spaces corrects bad sectors the disk subsystem detects. When ReFS is running on top of redundant Storage Spaces, ReFS detects other types of data corruption and Storage Spaces corrects them. With parity spaces, parity recomputes original data. With mirrored spaces, the mirror is used to recover the data. The entire process, from marking bad sectors (which are not used again) to allocating new blocks and copying the reconstructed data to these new blocks, happens transparently and automatically.

ReFS can heal B+ trees by using its data-scrubbing processes. Here, it scavenges for bad elements in the B+ trees. ReFS also stores a copy of the boot block that can help a system recover from a corrupt boot block.

Integrity can be enabled when the system is not running on Storage Spaces. When integrity is enabled and ReFS detects a checksum mismatch, ReFS logs an event and fails the read operation by default. If you don’t want the read operation to fail, you can configure ReFS to continue with the read operation. A related event will be logged regardless.

Using file-based compression

You can use file-based compression to reduce the number of bits and bytes in files so that they use less space on a disk. The Windows operating system supports two types of compression: NTFS compression, which is a built-in feature of NTFS, and compressed (zipped) folders, which is an additional feature of Windows available on any type of volume. ReFS does not support NTFS compression.

NTFS compression

Windows allows you to enable compression when you format a volume by using NTFS. When a drive is compressed, all files and folders stored on the drive are automatically compressed when they are created. This compression is transparent to users, who can open and work with compressed files and folders just as they do regular files and folders. Behind the scenes, Windows expands the file or folder when it is opened and compresses it again when it is closed. Although this can decrease a computer’s performance, it saves space on the disk because compressed files and folders use less space.

You can turn on compression after formatting volumes as well or, if desired, turn on compression only for specific files and folders. After you compress a folder, any new files added or copied to the folder are compressed automatically. If you move a compressed file to a folder on the same volume, it remains compressed. If you move a compressed file to a folder on a different volume, it inherits the compression attribute of the folder.

Moving uncompressed files to compressed folders affects their compression attribute as well. If you move an uncompressed file from a different drive to a compressed drive or folder, the file is compressed. However, if you move an uncompressed file to a compressed folder on the same NTFS drive, the file isn’t compressed. Finally, if you move a compressed file to a FAT16, FAT32, exFAT, or ReFS volume, the file is uncompressed because NTFS compression is not supported.

To compress or expand a drive, follow these steps:

Press and hold or right-click the drive that you want to compress or expand in File Explorer or in the Disk Management Volume List view and then select Properties. This opens the disk’s Properties dialog box, as shown in Figure 15-7.
Figure 15-7. You can compress entire volumes or perform selective compression for specific files and folders.
Select or clear the Compress This Drive To Save Disk Space check box as appropriate. When you tap or click OK, the Confirm Attribute Changes dialog box shown in Figure 15-8 opens.
Figure 15-8. Choose a compression option.
If you want to apply changes only to the root folder of the disk, select Apply Changes To Drive X Only. Otherwise, accept the default, which compresses the entire contents of the disk. Tap or click OK.

Caution

Although Windows Server 2012 R2 allows you to compress system volumes, this is not recommended because the operating system needs to expand and compress system files each time they are opened, which can seriously affect server performance. In addition, you can’t use compression and encryption together. You can use one feature or the other, but not both.

You can selectively compress and expand files and folders as well. The advantage here is that this affects only part of a disk, such as a folder and its subfolders, rather than the entire disk. To compress or expand a file or folder, follow these steps:

In File Explorer, press and hold or right-click the file or folder you want to compress or expand and then select Properties.
On the General tab of the related Properties dialog box, tap or click Advanced. This opens the Advanced Attributes dialog box shown in Figure 15-9. Select or clear the Compress Contents To Save Disk Space check box as appropriate. Tap or click OK twice.
Figure 15-9. Use the Advanced Attributes dialog box to compress or expand the file or folder.
If you are changing the compression attributes of a folder with subfolders, the Confirm Attribute Changes dialog box opens. If you want to apply the changes only to the files in the folder and not to files in subfolders of the folder, select Apply Changes To X Only. Otherwise, accept the default, which applies the changes to the files in the folder and its subfolders. Tap or click OK.

Windows Server 2012 R2 also provides command-line utilities for compressing and expanding your data. The compression utility is called Compact (Compact.exe). The expansion utility is called Expand (Expand.exe).

You can use Compact to determine quickly whether files in a directory are compressed. At the command line, change to the directory you want to examine and type compact without any additional parameters. If you want to check the directory and all subdirectories, type compact /s. The output lists the compression status and compression ratio on every file, and the final summary details tell you exactly how many files and directories were examined and found to be compressed, such as:

Of 15435 files within 822 directories
0 are compressed and 15435 are no compressed.
2,411,539,448 total bytes of data are stored in 2,411,539,448 bytes.
The compression ratio is 1.0 to 1.

Compressed (zipped) folders

Compressed (zipped) folders are another option for compressing files and folders. When you compress data by using this technique, you use zip compression technology to reduce the number of bits and bytes in files and folders so that they use less space on a disk. Compressed (zipped) folders are identified with a zipper on the folder icon and are saved with the .zip file extension.

Compressed (zipped) folders have several advantages over NTFS compression. Because zip technology is an extension of the operating system rather than of the file system, compressed (zipped) folders can be used on any type of volume. Zipped folders can be password protected to safeguard their contents and can be sent by email. They can also be transferred using File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP), or other protocols. An added benefit of zipped folders is that some programs can be run directly from compressed folders without having to be expanded. You can also open files directly from zipped folders.

You can create a zipped folder by selecting a file, folder, or group of files and folders in File Explorer, pressing and holding or right-clicking it, pointing to Send To, and tapping or clicking Compressed (Zipped) Folder. The zipped folder is named automatically by using the file name of the last item selected and adding the .zip extension. If you double-tap or double-click a zipped folder in File Explorer, you can access and work with its contents. As shown in Figure 15-10, the zipped folder’s contents are listed according to file name, type, and date. The file information also shows the packed file size, original file size, and compression ratio. Double-tapping or double-clicking a program in a zipped folder runs it (as long as it doesn’t require access to other files). Double-tapping or double-clicking a file in a zipped folder opens it for viewing or editing.

A screen shot of a compressed folder being accessed in File Explorer.

Figure 15-10. Compressed (zipped) folders can be accessed and used like other folders.

While you’re working with a zipped folder, you can perform tasks similar to those you can do with regular folders. You can do the following:

Add other files, programs, or folders to the zipped folder by dragging them to it.
Copy a file in the zipped folder and paste it into a different folder.
Remove a file from the zipped folder by using the Cut command so that you can paste it into a different folder.
Delete a file or folder by selecting it and tapping or clicking Delete.

You can also perform additional tasks that are unique to zipped folders. Press and hold or right-click and then choose Extract All to start the Extraction Wizard, which you can use to extract all the files in the zipped folder and copy them to a new location.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15. File system essentials

Create new playlist

Sign In

Sign Up

Chapter 15. File system essentials

Understanding the disk and file-system structure

Using FAT

File allocation table structure

FAT features

Note

Using NTFS

NTFS structure

Note

NTFS features

Analyzing the NTFS structure

Advanced NTFS features

Hard links

Note

Data streams

Change journals

Note

Important

Object identifiers

Reparse points

Sparse files

Transactional NTFS

Using ReFS

ReFS features

ReFS structures

ReFS advantages

Note

Important

Important

ReFS integrity streams, data scrubbing, and salvage

Using file-based compression

NTFS compression

Caution

Compressed (zipped) folders

Table of Contents for
15. File system essentials