Advanced NTFS Features

NTFS has many advanced features that administrators should know about and understand. These features include the following:

  • Hard links

  • Data streams

  • Change journals

  • Object identifiers

  • Reparse points

  • Remote storage

  • Sparse files

Each of these features is discussed in the sections that follow.

Hard Links

Every file created on a volume has a hard link. The hard link is the directory entry for the file and it is what allows the operating system to find files within folders. On NTFS volumes, files can have multiple hard links. This allows a single file to appear in the same directory with multiple names or to appear in multiple directories with the same name or different names. As with file copies, applications can open a file using any of the hard links you've created and modify the file. If you use another hard link to open the file in another application, the application can detect the changes.

Wondering why you'd want to use hard links? Hard links are useful when you want the same file to appear in several locations. For example, you might want a document to appear in a folder of a network share that is available to all users but have an application that requires the document to be in another directory so that it can be read and processed on a daily basis. Rather than moving the file to the application directory and giving every user in the company access to this protected directory, you decide to create a hard link to the document so that it can be accessed separately by both users and the application.

Regardless of how many hard links a file has, however, the related directory entries all point to the single file that exists in one location on the volume—and this is how hard links differ from copies. With a copy of a file, the file data exists in multiple locations. With a hard link, the file appears in multiple locations but exists in only one location. Thus, if you modify a file using one of its hard links and save, and then someone opens the file using a different hard link, the changes are shown.

Note

Hard links have advantages and disadvantages. Hard links are not meant for environments where multiple users can modify a file simultaneously. If Sandra opens a file using one hard link and is working on the file at the same time Bob is working on the file, there can be problems if they both try to save changes. Although this is a disadvantage of hard links, the really big advantage of hard links shouldn't be overlooked: If a file has multiple hard links, the file will not be deleted from the volume until all hard links are deleted. This means that if someone were to accidentally delete a file that had multiple hard links, the file wouldn't actually be deleted. Instead, only the affected hard link would be deleted and any other hard links and the file itself would remain.

Because there is only one physical copy of a file with multiple hard links, the hard links do not have separate security descriptors. Only the source file has security descriptors. Thus, if you were to change the access permissions of a file using any of its hard links, you would actually change the security of the source file and all hard links that point to this file would have these security settings.

You can create hard links by using the FSUTIL HARDLINK command. Use the following syntax:

fsutil hardlink create NewFilePath CurrentFilePath

where NewFilePath is the file path for the hard link you want to create and CurrentFilePath is the name of the existing file to which you are linking. For example, if the file ChangeLog.doc is found in the file path C:CorpDocs and you want to create a new hard link to this file with the file path C:UserDataLogsCurrentLog.doc, you would type

fsutil hardlink create C:UserDataLogsCurrentLog.doc C:CorpDocsChangeLog.doc

Hard links can be created only on NTFS volumes, and you cannot create a hard link on one volume that refers to another volume. Following this, you couldn't create a hard link to the D drive for a file created on the C drive.

Data Streams

Every file created on a volume has a data stream associated with it. A data stream is a sequence of bytes that contains the contents of the file. The main data stream for a file is unnamed and is visible to all file systems. On NTFS volumes, files can also have named data streams associated with them. Named data streams contain additional information about a file, such as custom properties or summary details. This allows you to associate additional information with a file but still be able to manage the file as a single unit.

Once you create a named data stream and associate it with a file, any applications that know how to work with named data streams can access the streams by their name and read the additional details. Many applications support named data streams, including Microsoft Office, Adobe Acrobat, and other productivity applications. This is how you can set summary properties for a Microsoft Word document, such as Title, Subject, and Author, and save that information with the file. In fact, if you were to right-click any file on an NTFS volume and select Properties and then click the Summary tab, you can view or set this same information as shown in Figure 20-4.

Information entered in the Summary tab is saved to a named data stream

Figure 20-4. Information entered in the Summary tab is saved to a named data stream

Generally speaking, the named data streams associated with a file are used to set the names of its property tabs and to populate the fields of those tabs. This is how some document types can have other tabs associated with them and how the Windows operating system can store a thumbnail image within an NTFS file containing an image.

The most important thing to know about streams is that they aren't supported on FAT. If you move or copy a file containing named streams to a FAT volume, you will see the warning prompt labeled "Confirm Stream Loss" telling you the file has additional information associated with it and asking you to confirm that it's okay that the file is saved without this information. If you click Yes, only the contents of the file are copied or moved to the FAT volume—and not the contents of the associated data streams. If you click No, the copy or save operation is canceled.

Change Journals

In Windows Server 2003, an NTFS volume can use an update sequence number (USN) change journal. A change journal provides a complete log of all changes made to the volume. It records additions, deletions, and modifications regardless of who made them or how the additions, deletions, and modifications occurred. As with system logs, the change log is persistent, so it isn't reset if you shut down and restart the operating system. The operating system writes records to the NTFS change log when an NTFS checkpoint occurs. The checkpoint tells the operating system to write changes that would allow NTFS to recover from failure to a particular point in time.

The change journal is enabled when you install any of the following services:

  • File Replication Service

  • Indexing Service

  • Remote Installation Services (RIS)

  • Remote Storage

Domain controllers and any other computer in the domain that uses these services rely heavily on the change journal. The change journal allows these services to be very efficient at determining when files, folders, and other NTFS objects have been modified. Rather than checking time stamps and registering for file notifications, these services perform direct lookups in the change journal to determine all the modifications made to a set of files. Not only is this faster, it uses system resources more efficiently as well.

You can gather summary statistics about the change journal by typing fsutil usn queryjournal DriveDesignator at the command prompt, where DriveDesignator is the drive letter of the volume followed by a colon. For example, if you want to obtain change journal statistics on the C drive, you'd type

fsutil usn queryjournal c:

The output is similar to the following:

Usn Journal ID   :  0x01c2ed7bd1b73670
First Usn        :  0x000000001b700000
Next Usn         :  0x00000000237ceb40
Lowest Valid Usn :  0x0000000000000000
Max Usn          :  0x00000fffffff0000
Maximum Size     :  0x0000000008000000
Allocation Delta :  0x0000000000100000

The details show the following information:

  • Usn Journal ID The unique identifier of the change journal.

  • First Usn The first USN in the change journal.

  • Next Usn The next USN that can be written to the change journal.

  • Lowest Valid Usn The lowest valid USN that can be written to the change journal.

  • Max Usn The highest USN that can be assigned.

  • Maximum Size The maximum size in bytes that the change journal can use. If the change journal exceeds this value, older entries are overwritten.

  • Allocation Delta The size in bytes of memory allocation that is added to the end and removed from the beginning of the change journal when it becomes full.

Individual records written to the change journal look like this:

File Ref#       :  0x18e90000000018e9
ParentFile Ref# :  0x17c00000000017c0
Usn             :  0x0000000000000000
SecurityId      :  0x00000119
Reason          :  0x00000000
Name (024)      :  ocmanage.dll

The most important information here is the name of the affected file and the security identifier of the object that made the change. You can get the most recent change journal entry for a file by typing fsutil usn readdata FilePath, where FilePath is the name of the file for which you want to retrieve change information. For example, if you want to obtain the most recent change journal information on a file with the path C:DomainComputers.txt, you'd type

fsutil usn readdata c:domaincomputers.txt

The output is similar to the following:

Major Version    :  0x2
Minor Version    :  0x0
FileRef#         :  0x000800000001c306
Parent FileRef#  :  0x0005000000000005
Usn              :  0x00000000237cf7f0
Time Stamp       :  0x0000000000000000
Reason           :  0x0
Source Info      :  0x0
Security Id      :  0x45e
File Attributes  :  0x20
File Name Length :  0x26
File Name Offset :  0x3c
FileName         :  domaincomputers.txt

This data shows the file's reference number in the root file index and that of its parent. It also shows the current USN associated with the file and the file attributes flag. The File Name Length element shows the total length in characters of the file's long and short file names together. This particular file has a file name length of 38 (0 × 26). That's because the file name has more than eight characters followed by a dot and a three-letter extension. This means the file is represented by NTFS using long and short file names. The long file name is domaincomputers.txt. This is followed by an offset pointer that indicates where the short file name, domain∼1.txt, can be looked up, which is where the total file name length of 38 characters comes from.

Tip

You can examine a file's short file name by typing dir /x FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as: dir /x c:domaincomputers.txt.

Object Identifiers

Another feature of NTFS is the ability to use object identifiers. Object identifiers are 16 bytes in length and are unique on a per-volume basis. Any file that has an object identifier also has the following:

  • Birth volume identifier (BirthVolumeID), which is the object identifier for the volume in which the file was originally created

  • Birth object identifier (BirthObjectID), which is the object identifier assigned to the file when it was created

  • Domain identifier (DomainID), which is the object identifier for the domain in which the file was created

These values are also 16 bytes in length. If a file is moved within a volume or moved to a new volume, it is assigned a new object identifier, but information about the original object identifier assigned when the object was created can be retained using the birth object identifier.

Object identifiers are used by the File Replication Service (FRS) and the Distributed Link Tracking (DLT) Client service to uniquely identify files and the volumes with which they are associated. FRS uses object identifiers to locate files for replication. The DLT Client service uses object identifiers to track linked files that are moved within an NTFS volume, to another NTFS volume on the same computer, or to an NTFS volume on another computer.

Any file used by FRS or the DLT Client service has an object identifier field set containing values for the object ID, birth volume ID, birth object ID, and domain ID. The actual field set looks like this:

Object ID :       52eac013e3d34445334345453533ab3d
BirthVolume ID :  a23bc3243a5a3452d32424332c32343d
BirthObject ID :  52eac013e3d34445334345453533ab3d
Domain ID :       00000000000000000000000000000000

Here, the file has a specific object ID, birth volume ID, and birth object ID. The domain ID isn't assigned, however, because this is not currently used. You can tell that the file is used by the DLT Client service because the birth volume ID and birth object ID have been assigned and these identifiers are used only by this service. Because the birth volume ID and birth object ID remain the same even if a file is moved, the DLT Client service uses these identifiers to find files no matter where they have been moved.

In contrast, FRS uses only the object ID, so the object identifier field set for a file used by FRS looks like this:

Object ID :       52eac013e3d34445334345453533ab3d
BirthVolume ID :  00000000000000000000000000000000
BirthObject ID :  00000000000000000000000000000000
Domain ID :       00000000000000000000000000000000

If you are trying to determine whether a file is used by FRS or the DLT Client service, you could use the FSUTIL OBJECTID command to see if the file has an object identifier field set. Type fsutil objectid query FilePath at the command prompt, where FilePath is the path to the file or folder you want to examine. If the file has an object identifier field set, it is displayed. If a file doesn't have an object identifier field set, an error message is displayed stating "The specified file has no object ID."

Reparse Points

On NTFS volumes, a file or folder can contain a reparse point. Reparse points are file system objects with special attribute tags that are used to extend the functionality in the I/O subsystem. When a program sets a reparse point, it stores an attribute tag as well as a data segment. The attribute tag identifies the purpose of the reparse point and details how the reparse point is to be used. The data segment provides any additional data needed during reparsing.

Reparse points are used for directory junction points and volume mount points. Directory junctions enable you to create a single local namespace using local folders, local volumes, and network shares. Mount points enable you to mount a local volume to an empty NTFS folder. Both directory junction points and volume mount points use reparse points to mark NTFS folders with surrogate names.

When a file or folder containing a reparse point used for a directory junction point or a volume mount point is read, the reparse point causes the pathname to be reparsed and a surrogate name to be substituted for the original name. For example, if you were to create a mount point with the file path C:Data that is used to mount a hard disk drive, the reparse point is triggered whenever the file system opens C:Data and points the file system to the volume you've mounted in that folder. The actual attribute tag and data for the reparse point would look similar to the following:

Reparse Tag Value :  0xa0000003
Tag value: Microsoft
Tag value: Name Surrogate
Tag value: Mount Point
Substitute Name offset:   0
Substitute Name length:   98
Print Name offset:       100
Print Name Length:       0
Substitute Name:         ??Volume{3796c3c1-5106-11d7-911c-806d6172696f}

Reparse Data Length: 0x0000006e
Reparse Data:
0000: 00 00 62 00 64 00 00 00 5c 00 3f 00 3f 00 5c 00  ..b.d....?.?..
0010: 56 00 6f 00 6c 00 75 00 6d 00 65 00 7b 00 33 00  V.o.l.u.m.e.{.3.
0020: 37 00 39 00 36 00 63 00 33 00 63 00 31 00 2d 00  7.9.6.c.3.c.1.-.
0030: 35 00 31 00 30 00 36 00 2d 00 31 00 31 00 64 00  5.1.0.6.-.1.1.d.
0040: 37 00 2d 00 39 00 31 00 31 00 63 00 2d 00 38 00  7.-.9.1.1.c.-.8.
0050: 30 00 36 00 64 00 36 00 31 00 37 00 32 00 36 00  0.6.d.6.1.7.2.6.
0060: 39 00 36 00 66 00 7d 00 5c 00 00 00 00 00        9.6.f.}......

The reparse attribute tag is defined by the first series of values, which identifies the reparse point as a Microsoft Name Surrogate Mount Point and specifies the surrogate name to be substituted for the original name. The reparse data follows the attribute tag values and in this case provides the fully expressed surrogate name.

Tip

Examine reparse points

Using the FSUTIL REPARSEPOINT command, you can examine reparse information associated with a file or folder. Type fsutil reparsepoint query FilePath at the command prompt, where FilePath is the path to the file or folder you want to examine.

Reparse points are also used by file system filter drivers to mark files so they are used with that driver. When NTFS opens a file associated with a file system filter driver, it locates the driver and uses the filter to process the file as directed by the reparse information. Reparse points are used in this way to implement Remote Storage, which is discussed in the next section.

Remote Storage

Remote Storage is Microsoft's implementation of Hierarchical Storage Management (HSM). By using Remote Storage, you can define a set of rules that allow infrequently used files to be moved automatically and transparently to long-term storage on tape or other media yet still be accessible to users. How this works is that Remote Storage moves the data for a file that meets your rule set to long-term storage and replaces the file with a stub file that contains a reparse point. When a file or folder containing this reparse point is read, the reparse point causes the pathname to be reparsed, and the actual location of the file in long-term storage is substituted for the original file path. This allows NTFS to retrieve the file from long-term storage.

To users, the retrieval process is fairly transparent. They simply access a file at its regular location on a disk or shared folder and Windows handles the task of retrieving the file from longterm storage. Because retrieval from tape storage is slower than from disk, the user will see a dialog box specifying that the file is being recalled from Remote Storage and asking the user to wait.

With the cheap price of hard disk drives today, Remote Storage isn't used as frequently as it once was. Still, it is worth consideration if you have already made an investment in a storage system, such as an autoloader tape system or an optical jukebox, that uses magneto optical disks. What Remote Storage allows you to do is to extend a disk-based volume onto the storage system. Say, for instance, you have a 60-GB disk-based volume and an autoloader tape system that can automatically mount any of its 16 tapes into one of the available tape drives. If each tape has a 100-GB capacity, as with AIT-3 tapes (uncompressed), you would have about 1600 GB of tape storage.

When you extend the volume onto the available tape storage space using Remote Storage, the total space available appears to be about 1660 GB. When users write data to the volume, the data is written first to the disk area, and then as the drive spaces fill, older files are moved to tape storage. Very handy, if users don't mind the delays they might encounter when trying to access files that have been moved to tape.

Sparse Files

Often scientific or other data collected through sampling is stored in large files that are primarily empty except for sparsely populated sections that contain the actual data. For example, a broad-spectrum signal recorded digitally from space might have only several minutes of audio for each hour of actual recording. In this case, a multiple-gigabyte audio file such as the one depicted in Figure 20-5 might have only a few gigabytes of meaningful information. Because there are large sections of empty space and limited areas of meaningful data, the file is said to be sparsely populated and can also be referred to as a sparse file.

Using sparse files

Figure 20-5. Using sparse files

Stored normally, the file would use 20 GB of space on the volume. If you mark the file as sparse, however, NTFS allocates space only for actual data and marks empty space as nonallocated. In other words, any meaningful or nonzero data is marked as allocated and written to disk, and any data composed of zeros is marked as nonallocated and is not explicitly written to disk. In this example, this means the file uses only 5 GB of space, which is marked as allocated, and has nonallocated space of 15 GB.

For nonallocated space, NTFS records only information about how much nonallocated space there is, and when you try to read data in this space, it returns zeros. This allows NTFS to store the file in the smallest amount of disk space possible while still being able to reconstruct the file's allocated and nonallocated space.

In theory, all this works great, but it is up to the actual program working with the sparse file to determine which data is meaningful and which isn't. Programs do this by explicitly specifying the data for which space should be allocated. In Windows Server 2003, several services use sparse files. One of these is the Indexing Service, which stores its catalogs as sparse files.

Using the FSUTIL SPARSE command, you can easily determine whether a file has the sparse attribute set. Type fsutil sparse queryflag FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as

fsutil sparse queryflag c:datacatalog.wci0010002.ci

If the file has the sparse attribute, this command returns

This file is set as sparse

You can examine sparse files to determine where the byte ranges that contain meaningful (nonzero) data are located by using FSUTIL SPARSE as well. Type fsutil sparse queryrange FilePath at the command prompt, where FilePath is the path to the file you want to examine, such as

fsutil sparse queryrange c:datacatalog.wci0010002.ci

The output is the byte ranges of meaningful data within the file, such as

sparse range [0] [28672]

In this particular case, the output specifies that there's meaningful data at the start of the file to byte 28672. You can mark files as sparse as well. Type fsutil sparse setflag FilePath at the command prompt, where FilePath is the path to the file you want to mark as sparse.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset