Linux Systems and Artifacts
Information in this Chapter
Modern Linux systems have come a long way from their humble roots as a free Unix-like system for home computers. Over the past 20 years, Linux has found its way into everything—to children’s toys and networking devices to the most powerful supercomputing clusters in the world. While we can’t teach you everything you will need to know to examine a supercomputing cluster, we can get you started with an exploration of standard Linux file system artifacts, directory structures, and artifacts of system and user activity.
At the time of this writing, most current Linux systems use the Ext3 file system. Ext3 is the successor of Ext2, which added journaling but retained Ext2’s underlying structure otherwise. In fact, an Ext3 volume will happily mount as Ext2 if the user issues the mount command appropriately. Many other file systems are available via the Linux kernel, including ReiserFS, XFS, and JFS. Because these file systems are not generally used in a default Linux installation, their presence may indicate a purpose-built system (as opposed to a general-use desktop system).
This section explores some of the Ext2 and 3 specific structures and forensically interesting information available, using the file system abstraction model described in Chapter 3 as a framework.
Ext file systems have two major components that make up their file system layer structures: the superblock and the group descriptor tables. The superblock is a data structure found 1024 bytes from the start of an Ext file system. It contains information about the layout of the file system and includes block and inode allocation information, and metadata indicating the last time the file system was mounted or read. The group descriptor table is found in the block immediately following the superblock. This table contains allocation status information for each block group found on the file system [1]. The fsstat tool in the Sleuth Kit can be used to parse the content of these data structures and display information about the file system.
To demonstrate, we will create a small 10-Megabyte Ext2 file system. First we need to generate a 10-Megabyte file to act as the container for our file system.
user@ubuntu:~/images$ dd if=/dev/zero of=testimage.img bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.033177 s, 309 MB/s
Next, we can build the file system using the mke2fs command.
user@ubuntu:~/images$ mke2fs testimage.img
mke2fs 1.41.11 (14-Mar-2010)
testimage.img is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
2512 inodes, 10000 blocks
500 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=10485760
2 block groups
8192 blocks per group, 8192 fragments per group
1256 inodes per group
Superblock backups stored on blocks:
8193
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Running the fsstat command against our newly created file system yields the following output:
user@ubuntu:~/images$ fsstat testimage.img
FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: Ext2
Volume Name:
Volume ID: 1c0806ef7431d187bb4c63d11ab0842e
Last Written at: Tue Oct 19 16:24:39 2010
Last Checked at: Tue Oct 19 16:24:39 2010
Last Mounted at: empty
Unmounted properly
Last mounted on:
Source OS: Linux
Dynamic Structure
Compat Features: Ext Attributes, Resize Inode, Dir Index
InCompat Features: Filetype,
Read Only Compat Features: Sparse Super,
…
Of particular interest in the previous section is the “Last Mounted At:” and “Last Mounted On:” displaying null/empty results. Because this file system has just been created, this is to be expected. For a heavily used file system, this would indicate an error or possibly intentional tampering.
Continuing with the fsstat output, we begin to see the information the file system layer has about lower layers.
METADATA INFORMATION
--------------------------------------------
Inode Range: 1 - 2513
Root Directory: 2
Free Inodes: 2501
The “Root Directory” entry provides the inode number of the root directory—this is the value the fls command uses by default. The next section of output details the layout of the blocks of the file system.
CONTENT INFORMATION
--------------------------------------------
Block Range: 0 - 9999
Block Size: 1024
Reserved Blocks Before Block Groups: 1
Free Blocks: 9585
BLOCK GROUP INFORMATION
--------------------------------------------
Number of Block Groups: 2
Inodes per group: 1256
Blocks per group: 8192
Group: 0:
Inode Range: 1 - 1256
Block Range: 1 - 8192
Layout:
Super Block: 1 - 1
Group Descriptor Table: 2 - 2
Data bitmap: 42 - 42
Inode bitmap: 43 - 43
Inode Table: 44 - 200
Data Blocks: 201 - 8192
Free Inodes: 1245 (99%)
Free Blocks: 7978 (97%)
Total Directories: 2
Group: 1:
Inode Range: 1257 - 2512
Block Range: 8193 - 9999
Layout:
Super Block: 8193 - 8193
Group Descriptor Table: 8194 - 8194
Data bitmap: 8234 - 8234
Inode bitmap: 8235 - 8235
Inode Table: 8236 - 8392
Data Blocks: 8393 - 9999
Free Inodes: 1256 (100%)
Free Blocks: 1607 (88%)
Total Directories: 0
In this output we have the majority of the information needed to extract raw data from the file system. We know that the file system is divided into two block groups, each with 8192 1024-byte blocks. We know which inodes are associated with which block groups, information that can be of use when recovering deleted data. We also know the location of the backup superblock, which can be used for sanity checking in the case of a corrupted or inconsistent primary superblock.
File names in Ext file systems are stored as directory entries. These entries are stored in directories, which are simply blocks filled with directory entries. Each directory entry contains the file name, the address of the inode associated with the file, and a flag indicating whether the name refers to a directory or a normal file.
Ext file systems allow multiple file names to point to the same file—these additional names are known as hard links. A hard link is an additional directory entry that points to the same inode. Each hard link increments the inode’s link count by one.
To demonstrate this, we can create a simple file, add some text to it, and examine the file.
user@ubuntu:~$ touch file1
user@ubuntu:~$ echo "i am file1" > file1
user@ubuntu:~$ cat file1
i am file1
user@ubuntu:~$ stat file1
File: ’file1’
Size: 11 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 452126 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:06:36.534649312 -0700
Modify: 2010-10-19 21:06:34.798639051 -0700
Change: 2010-10-19 21:06:46.694615623 -0700
Here we have created “file1” and added some identifying text. We can use the stat command to display the file’s inode information. Next, we use the ln command to create a “hard link” to file1.
user@ubuntu:~$ ln file1 file2
user@ubuntu:~$ stat file2
File: ’file2’
Size: 11 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 452126 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:06:36.534649312 -0700
Modify: 2010-10-19 21:06:34.798639051 -0700
Change: 2010-10-19 21:06:46.694615623 -0700
Note that file2 has the exact same inode number shown in the stat output of file1. Also note that the “Links” value is incremented.
user@ubuntu:~$ cat file2
i am file1
user@ubuntu:~$ stat file1
File: ’file1’
Size: 11 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 452126 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:06:56.798612306 -0700
Modify: 2010-10-19 21:06:34.798639051 -0700
Change: 2010-10-19 21:06:46.694615623 -0700
Dumping the content of file2 and reviewing the stat output of file1 one more time reinforce that these are effectively the same “file.” file1 and file2 are both simply file names that reference the same inode.
A second type of link exists on Ext file systems—soft links. A soft link is a special file that has a path to another file in place of the block pointers in its inode. The soft link then serves as an indirect reference to the actual file.
We can add a soft link to our link chain by using the -s flag to the ln command.
user@ubuntu:~$ ln -s file1 file3
user@ubuntu:~$ stat file1
File: ’file1’
Size: 11 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 452126 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:06:56.798612306 -0700
Modify: 2010-10-19 21:06:34.798639051 -0700
Change: 2010-10-19 21:06:46.694615623 -0700
Note that the stat information for file1 has remained unchanged—file1 is “unaware” that it is also “file3.”
user@ubuntu:~$ stat file3
File: ’file3’ -> ’file1’
Size: 5 Blocks: 0 IO Block: 4096 symbolic link
Device: 801h/2049d Inode: 452127 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:07:33.382618755 -0700
Modify: 2010-10-19 21:07:33.382618755 -0700
Change: 2010-10-19 21:07:33.382618755 -0700
By running stat against file3 we can get a better idea of what is occurring. The “Size” value is the number of bytes in the target file name (five). As a soft link, file3 has no data allocated so the “Blocks” value is zero. In addition, because file3 has its own inode, it gets it own independent set of time stamps.
Metadata for files on Ext file systems are stored in inodes. Forensically interesting items contained in Ext inodes include the file’s size and allocated blocks, ownership and permissions information, and time stamps associated with the file. In addition, an inode will contain a flag indicating whether it belongs to a directory or a regular file. As mentioned previously, each inode also has a link count, which is the number of file names that refer to this inode.
Ownership information includes User Identifier (UID) and Group Identifier (GID) values, which can be of importance in many different examinations. We will discuss more about mapping numeric UIDs and GIDs into their human-readable equivalent later.
Ext inodes store four time stamps, commonly referred to as MAC times.
• The (M)odified time stamp is updated when the content of the file or directory is written. So, if a file is edited or entries are added to or removed from a directory, this time stamp will update.
• The (A)ccessed time stamp is updated when the content of the file or directory is read. Any activity that opens a file for reading or lists the contents of a directory will cause this time stamp to be updated.
• The (C)hanged time stamp is updated when the inode is modified. Any permission changes or changes that cause the Modified time stamp to update will cause this time stamp to update as well.
• The (D)eleted time stamp is updated only when the file is deleted.
It is important to note that altering the modification or access time is quite simple using the touch command. Items from the touch command’s usage output that can be used to set specific time values can be seen in bold in the output that follows.
Usage: touch [OPTION]… FILE…
Update the access and modification times of each FILE to the current time.
A FILE argument that does not exist is created empty.
….
-a change only the access time
-c, --no-create do not create any files
-d, --date=STRING parse STRING and use it instead of current time
…
-m change only the modification time
-r, --reference=FILE use this file’s times instead of current time
-t STAMP use [[CC]YY]MMDDhhmm[.ss] instead of current time
--time=WORD change the specified time:
WORD is access, atime, or use: equivalent to -a
WORD is modify or mtime: equivalent to -m
…
While this is trivial to do, it is important to note that altering the C-time (inode change time) is not possible to do using the touch command—in fact, the C-time will be updated to record the time any time stamp alteration occurred! In a case where time stamps appear to have been modified, the C-time can end up being the “truest” time stamp available.
The inode also contains pointers to blocks allocated to the file. The inode can store the addresses of the first 12 blocks of a file; however, if more than 12 pointers are required, a block is allocated and used to store them. These are called indirect block pointers. Note that this indirection can occur two more times if the number of block addresses requires creating double and triple indirect block pointers.
Data units in Ext file systems are called blocks. Blocks are 1, 2, or 4K in size as denoted in the superblock. Each block has an address and is part of a block allocation group as described in the block descriptor table. Block addresses and groups start from 0 at the beginning of the file system and increment. As noted in the Metadata section, pointers to the blocks allocated to a file are stored in the inode. When writing data into a block, current Linux kernels will fill the block slack space with zeroes, so no “file slack” should be present. Note that the allocation strategy used by the Linux kernel places blocks in the same group as the inode to which they are allocated.
The core functional difference between Ext2 and Ext3 is the journal present in Ext3. Current Ext3 journal implementations only record metadata changes and are recorded at the block level. The journal is transaction based, and each transaction recorded has a sequence number. The transaction begins with a descriptor block, followed by one or more file system commit blocks, and is finalized with a commit block. See the jcat output that follows for an example of a simple metadata update excerpt.
…
4060: Allocated Descriptor Block (seq: 10968)
4061: Allocated FS Block 65578
4062: Allocated Commit Block (seq: 10968)
…
The usefulness of the information extracted from the journal is going to be highly dependent on the nature of your specific investigation, including the amount of time that has passed since the event of interest and the amount of file system activity that has occurred in the meantime. It is possible that old inode data may be present in the journal, which can provide a transaction log of old time stamps or old ownership information. Additionally, old inode information recovered from the journal may contain block pointers that have subsequently been wiped from a deleted inode.
As demonstrated earlier, for each directory entry that points to a given inode, that inode’s link count is incremented by one. When directory entries pointing to a given inode are removed, the inode’s link count is subsequently decremented by one. When all directory entries pointing to a given inode are removed, the inode has a link count of zero and is considered “deleted.” On Ext2 systems, this is where the process stops, so recovery in this case is fairly easy. On Ext3 systems, when the link count of an inode hits zero, the block pointers are also zeroed out. While the content is still present in the freed blocks (until these are reallocated and overwritten), the link between metadata and data has been scrubbed.
In Forensic Discovery, Dan Farmer and Wietse Venema make many interesting observations with regard to deletion of data. One item of note is the fact that deleting a block or inode effectively “freezes” that item until it is reused. If an attacker places their malware in a relatively low-use area of the file system and then later deletes it, it is quite possible that the deleted blocks and inode will remain preserved in digital amber, Jurassic Park–style, for quite some time [2].
This idea has some effect on data recovery. For example, if you are attempting to recover data that existed previously in the /usr/share/ directory and all files in that directory have blocks allocated in block group 45, restricting your carving attempts to unallocated blocks from group 45 may prove a time (and sanity) saver.
Some Linux systems may have one or more partitions allocated to the Logical Volume Manager (LVM). This system combines one or more partitions across one or more disks into Volume Groups and then divides these Volume Groups into Logical Volumes. The presence of an LVM-configured disk can be detected by looking for partition type 8e, which is identified as “Linux_LVM” in the fdisk command output shown here:
# fdisk –l
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0006159f
Device Boot Start End Blocks Id System
/dev/sda1 * 1 25 200781 83 Linux
/dev/sda2 26 1044 8185117+ 8e Linux LVM
To gain access to actual file systems contained inside of the LVM, we will need to first identify and activate the volume group(s) and then process any discovered logical volume(s). As LVMs are a Linux-specific technology, this can only be performed from a Linux system.
First, we will need to scan all disks and display the name associated with the LVM as shown here for a volume group named “VolGroup00.”
# pvscan
PV /dev/sda2 VG VolGroup00 lvm2 [7.78 GB / 32.00 MB free]
Total: 1 [7.78 GB] / in use: 1 [7.78 GB] / in no VG: 0 [0 ]
In order to access logical volumes contained within this Volume Group, it is necessary to activate VolumeGroup00 as shown here:
# vgchange -a y VolGroup00
2 logical volume(s) in volume group VolGroup00 now active.
# lvs
LV VG Attr Lsize Origin Snap% Move Log Copy%
LogVol00 VolGroup00 -wi-a- 7.25G
LogVol01 VolGroup00 -wi-a- 512.00M
At this point we can image each logical volume directly as if it were a normal volume on a physical disk.
# dd if=/dev/VolGroup00/LogVol00 bs=4k of=/mnt/images/LogVol00.dd
# dd if=/dev/VolGroup00/LogVol01 bs=4k of=/mnt/images/LogVol01.dd
Understanding the Linux boot process is important when performing an investigation of a Linux system. Knowledge of the files user during system startup can help the examiner determine which version of the operating system was running and when it was installed. Additionally, because of its open nature, a sufficiently privileged user can alter many aspects of the boot process so you need to know where to look for malicious modification. A complete review of the Linux boot process is outside the scope of this book, but a brief description of the process follows.
The first step of the Linux boot process is execution of the boot loader, which locates and loads the kernel. The kernel is the core of the operating system and is generally found in the /boot directory. Next, the initial ramdisk (initrd) is loaded. The initrd file contains device drivers, file system modules, logical volume modules, and other items required for boot but not built directly into the kernel.
Once the kernel and initial ramdisk are loaded, the kernel proceeds to initialize the system hardware. After this, the kernel begins executing what we recognize as the operating system, starting the /sbin/init process. Once init starts, there are two primary methods by which it will proceed to bring up a Linux operating system—System V style and BSD style. Linux distributions generally follow System V examples for most things, including init’s tasks and processing runlevels.
The System V init system is the most common init style across Linux distributions. Under System V, the init process reads the /etc/inittab file to determine the default “runlevel.” A runlevel is a numeric description for the set of scripts a machine will execute for a given state. For example, on most Linux distributions, runlevel 3 will provide a full multiuser console environment, while runlevel 5 will produce a graphical environment.
Note that each entry in a runlevel directory is actually a soft link to a script in /etc/init.d/, which will be started or stopped depending on the name of the link. Links named starting with “S” indicate the startup order, and links starting with “K” indicate the “kill” order. Each script can contain many variables and actions that will be taken to start or stop the service gracefully.
/etc/rc3.duser@ubuntu:/etc/rc3.d$ ls -l
total 4
-rw-r--r-- 1 root root 677 2010-03-30 00:17 README
lrwxrwxrwx 1 root root 20 2010-07-21 20:17 S20fancontrol -> ../init.d/fancontrol
lrwxrwxrwx 1 root root 20 2010-07-21 20:17 S20kerneloops -> ../init.d/kerneloops
lrwxrwxrwx 1 root root 27 2010-07-21 20:17 S20speech-dispatcher -> ../init.d/speech-dispatcher
lrwxrwxrwx 1 root root 24 2010-08-21 00:57 S20virtualbox-ose -> ../init.d/virtualbox-ose
lrwxrwxrwx 1 root root 19 2010-07-21 20:17 S25bluetooth -> ../init.d/bluetooth
lrwxrwxrwx 1 root root 17 2010-08-21 08:28 S30vboxadd -> ../init.d/vboxadd
lrwxrwxrwx 1 root root 21 2010-08-21 08:32 S30vboxadd-x11 -> ../init.d/vboxadd-x11
lrwxrwxrwx 1 root root 25 2010-08-21 08:32 S35vboxadd-service -> ../init.d/vboxadd-service
lrwxrwxrwx 1 root root 14 2010-07-21 20:17 S50cups -> ../init.d/cups
lrwxrwxrwx 1 root root 20 2010-07-21 20:17 S50pulseaudio -> ../init.d/pulseaudio
lrwxrwxrwx 1 root root 15 2010-07-21 20:17 S50rsync -> ../init.d/rsync
lrwxrwxrwx 1 root root 15 2010-07-21 20:17 S50saned -> ../init.d/saned
lrwxrwxrwx 1 root root 19 2010-07-21 20:17 S70dns-clean -> ../init.d/dns-clean
lrwxrwxrwx 1 root root 18 2010-07-21 20:17 S70pppd-dns -> ../init.d/pppd-dns
lrwxrwxrwx 1 root root 24 2010-07-21 20:17 S90binfmt-support -> ../init.d/binfmt-support
lrwxrwxrwx 1 root root 22 2010-07-21 20:17 S99acpi-support -> ../init.d/acpi-support
lrwxrwxrwx 1 root root 21 2010-07-21 20:17 S99grub-common -> ../init.d/grub-common
lrwxrwxrwx 1 root root 18 2010-07-21 20:17 S99ondemand -> ../init.d/ondemand
lrwxrwxrwx 1 root root 18 2010-07-21 20:17 S99rc.local -> ../init.d/rc.local
As you can see there are numerous places an intruder can set up a script to help them maintain persistent access to a compromised system. Careful review of all scripts involved in the boot process is suggested in an intrusion investigation.
The BSD-style init process is a bit less complex. BSD init reads the script at /etc/rc to determine what system services are to be run, configuration information is read from /etc/rc.conf, and additional services to run from /etc/rc.local. In some cases, this is the extent of init configuration, but other implementations may also read additional startup scripts from the /etc/rc.d/ directory. BSD style init is currently used by Slackware and Arch Linux, among others.
To be able to locate and identify Linux system artifacts, you will need to understand how a typical Linux system is structured. This section discusses how directories and files are organized in the file system, how users are managed, and the meaning of file metadata being examined.
Linux file systems operate from a single, unified namespace. Remember, everything is a file, and all files exist under the root directory, “/”. File systems on different local disks, removable media, and even remote servers will all appear underneath a single directory hierarchy, beginning from the root.
The standard directory structure Linux systems should adhere to is defined in the Filesystem Hierarchy Standard (FHS). This standard describes proper organization and use of the various directories found on Linux systems. The FHS is not enforced per se, but most Linux distributions adhere to it as best practice. The main directories found on a Linux system and the contents you should expect to find in them are shown in Table 5.1.
Table 5.1
Standard Linux Directories
/bin | essential command binaries (for all users) |
/boot | files needed for the system bootloader |
/dev | device files |
/etc | system configuration files |
/home | user home directories |
/lib | essential shared libraries and kernel modules |
/media | mount points for removable media (usually for automounts) |
/mnt | temporary mount points (usually mounted manually) |
/opt | add-on application packages (outside of system package manager) |
/root | root user’s home directory |
/sbin | system binaries |
/tmp | temporary files |
Understanding file ownership and permission information is key to performing a successful examination of a Linux system. Ownership refers to the user and/or group that a file or directory belongs to, whereas permissions refer to the things these (and other) users can do with or to the file or directory. Access to files and directories on Linux systems are controlled by these two concepts. To examine this, we will refer back to the test “file1” created earlier in the chapter.
user@ubuntu:~$ stat file1
File: ’file1’
Size: 11 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 452126 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2010-10-19 21:06:36.534649312 -0700
Modify: 2010-10-19 21:06:34.798639051 -0700
Change: 2010-10-19 21:06:34.798639051 -0700
The fifth line contains the information of interest—the “Access: (0644/-rw-r—r--)” item are the permissions, and the rest of the line is the ownership information. This file is owned by User ID 1000 as well as Group ID 1000. We will discuss users and groups in detail later in the chapter.
Linux permissions are divided among three groups, and three tasks. Files and directories can be read, written, and executed. Permissions to perform these tasks can be assigned based to the owner, the group, or the world (aka anyone with access to the system). This file has the default permissions a file is assigned upon creation. Reading from left to right, the owner (UID 1000) can read and write to the file, anyone with a GID of 1000 can read it, and anyone with an account on the system can also read the file.
In addition to standard read/write/execute permissions, Ext file systems support “attributes.” These attributes are stored in a special “attribute block” referenced by the inode. On a Linux system, these can be viewed using the lsattr command. Attributes that may be of investigative interest include
Remember that we are working outside of file system-imposed restrictions when we use forensic tools and techniques so these attributes do not impact our examination of data in question. The presence of specific attributes may be of investigative interest, however.
On Linux systems, files are “hidden” from normal view by beginning the file name with a dot (.). These files are known as dotfiles and will not be displayed by default in most graphical applications and command line utilities. Hidden files and directories are a very rudimentary way to hide data and should not be considered overtly suspicious, as many applications use them to store their nonuser-serviceable bits.
/tmp is the virtual dumping ground of a Linux system—it is a shared scratch space, and as such all users have write permissions to this directory. It is typically used for system-wide lock files and nonuser-specific temporary files. One example of a service that uses /tmp to store lock files is the X Window Server, which provides the back end used by Linux graphical user interfaces. The fact that all users and processes can write here means that the /tmp directory is a great choice for a staging or initial entry point for an attacker to place data on the system. As an added bonus, most users never examine /tmp and would not know which random files or directories are to be expected and which are not.
Another item to note with regard to the /tmp directory can be seen in the following directory listing:
drwxrwxrwt 13 root root 4.0K 2010-10-15 13:38 tmp
Note that the directory itself is world readable, writable, and executable, but the last permission entry is a “t,” not an “x” as we would expect. This indicates that the directory has the “sticky bit” set. Files under a directory with the sticky bit set can only be deleted by the user that owns them (or the root user), even if they are world or group writable. In effect, stickiness overrules other permissions.
The first place to begin looking for information related to user accounts is the “/etc/passwd” file. It contains a list of users and the full path of their home directories. The passwords for user accounts are generally stored in the “/etc/shadow” file.
A typical entry in the “/etc/passwd” file is shown here with a description of each field: forensics:x:500:500::/home/forensics:/bin/bash
2. hashed password field (deprecated)
5. The “GECOS” comment field. This is generally used for the user’s full name or a more descriptive name for a service account
6. The path of the user’s home directory
7. The program to run upon initial login (normally the user’s default shell)
The “/etc/passwd” file will usually be fairly lengthy, even on a single user system. A fairly old trick that is still occasionally seen in compromises in the wild is to add an additional “UID 0” user somewhere in the middle of these default accounts in an attempt to fade into the noise.
The “/etc/group” file has a format similar to /etc/passwd, but with fewer fields. Examples of typical entries can be seen here:
root:x:0:root
bin:x:1:root,bin,daemon
daemon:x:2:root,bin,daemon
wheel:x:10:root
The first field is the group name, second is the hash of the group password (password-protected groups are not typically used), the third is the group ID, and the fourth is a comma-separated list of members of the group. Additional unauthorized users in the root or wheel groups may be suspicious and warrant further investigation.
The “/etc/shadow” file is the third item required for basic Linux authentication. It contains hashed user passwords and password-related information.
root:$1$gsGAI2/j$jWMnLc0zHFtlBDveRqw3i/:13977:0:99999:7:::
bin:*:13826:0:99999:7:::
…
gdm:!!:13826:0:99999:7:::
user:$1$xSS1eCUL$jrGLlZPGmD7ia61kIdrTV.:13978:0:99999:7:::
The fields of the shadow file are as follows:
3. Number of days since the Unix epoch (1 Jan 1970) that the password was last changed
4. Minimum days between password changes
5. Maximum time password is valid
One item to note is that the “*” and the “!!” in the password fields for daemon accounts “bin” and “gdm” indicate that these accounts do not have encrypted passwords. Because these are not user accounts, they have a null or invalid password field to prevent them from being used for an interactive login. Any nonuser accounts that do have encrypted password fields should be investigated.
user@ubuntu:~/images$ ls /etc/passwd*
/etc/passwd /etc/passwd-
user@ubuntu:~/images$ diff /etc/passwd /etc/passwd-
diff: /etc/passwd-: Permission denied
user@ubuntu:~/images$ sudo diff /etc/passwd /etc/passwd-
37d36
< hacker:x:1001:1001::/home/hacker:/bin/sh
On a Linux system, user home directories serve pretty much the same purpose they do on any other operating system—they provide users a location to store data specific to them. Well-behaved processes and services specific to an individual user will also store automatically created data in subdirectories. These are the standard visible subdirectories found on a Linux system using the GNOME desktop environment:
• Desktop—The user’s Dektop directory. Any files present in this directory should be visible on the user’s desktop in interactive graphical sessions.
• Documents—The default directory for office-type document files—text, spreadsheets, presentations, and the like.
• Downloads—Default directory for files downloaded from remote hosts; GNOME-aware Web browsers, file-sharing clients, and the like should deposit their data here.
• Music—Default location for music files.
• Pictures—Default location for pictures. Note that scanned images or images from attached imaging devices (webcams, cameras) will likely end up here unless otherwise directed.
• Public—Files to be shared with others.
• Templates—Holds document templates. New files can be generated from a given template via a right click in GNOME. Note that this directory is empty by default so any additions may indicate a frequently used file type.
• Videos—Default location for videos. Locally recorded or generated video should end up in this directory unless redirected by the user.
In addition to these “user-accessible” directories, various hidden directories and files are present. Some of these can contain valuable forensic data generated automatically or as a secondary effect of a user’s activity.
The default command shell on most Linux distributions is the Bourne Again Shell, aka “BASH.” Commands typed in any shell sessions will usually be stored in a file in the user’s home directory called “.bash_history.” Shell sessions include direct virtual terminals, GUI terminal application windows, or logging in remotely via SSH. Unfortunately, the bash shell records history as a simple list of commands that have been executed, with no time stamps or other indication of when the commands were entered. Correlation of history entries and file system or log file time information will be important if the time a specific command was executed is important to your investigation.
The .ssh directory contains files related to the use of the Secure Shell (ssh) client. SSH is used frequently on Linux and Unix-like systems to connect to a remote system via a text console. SSH also offers file transfer, connection tunneling, and proxying capabilities. There may be client configuration files present, which can indicate a particular use case for SSH.
When a user connects to a remote host using the ssh program, the remote host’s hostname or IP address and the host’s public key are recorded in the “.ssh/known_hosts” file. Entries in this file can be correlated with server logs to tie suspect activity to a specific machine. A traditional known_hosts entry looks like the following:
$ cat .ssh/known_hosts
192.168.0.106 ssh-rsaAAAAB3NzaC1yc2EAAAADAQABAAABAQDRtd74Cp19PO44zRDUdMk0EmkuD/d4WAefzPaf55L5Dh5C06Sq+xG543sw0i1LjMN7CIJbz+AnSd967aX/BZZimUchHk8gm2BzoAEbp0EPIJ+G2vLOrc+faM1NZhDDzGuoFV7tMnQQLOrqD9/4PfC1yLGVlIJ9obd+6BR78yeBRdqHVjYsKUtJl46aKoVwV60dafV1EfbOjh1/ZKhhliKAaYlLhXALnp8/l8EBj5CDqsTKCcGQbhkSPgYgxuDg8qD7ngLpB9oUvV9QSDZkmR0R937MYiIpUYPqdK5opLVnKn81B1r+TsTxiI7RJ7M53pOcvx8nNfjwAuNzWTLJz6zr
Some distributions enable the hashing of entries in the known_hosts file—a hashed entry for the same host looks like this:
|1|rjAWXFqldZmjmgJnaw7HJ04KtAg=|qfrtMVerwngkTaWC7mdEF3HNx/o= ssh-rsaAAAAB3NzaC1yc2EAAAADAQABAAABAQDRtd74Cp19PO44zRDUdMk0EmkuD/d4WAefzPaf55L5Dh5C06Sq+xG543sw0i1LjMN7CIJbz+AnSd967aX/BZZimUchHk8gm2BzoAEbp0EPIJ+G2vLOrc+faM1NZhDDzGuoFV7tMnQQLOrqD9/4PfC1yLGVlIJ9obd+6BR78yeBRdqHVjYsKUtJl46aKoVwV60dafV1EfbOjh1/ZKhhliKAaYlLhXALnp8/l8EBj5CDqsTKCcGQbhkSPgYgxuDg8qD7ngLpB9oUvV9QSDZkmR0R937MYiIpUYPqdK5opLVnKn81B1r+TsTxiI7RJ7M53pOcvx8nNfjwAuNzWTLJz6zr
Note that in both cases the stored public key is identical, indicating that the hashed host |1| is the same machine as host 192.168.0.106 from the first known_hosts file.
Because each Linux system can be quite different from any other Linux system, attempting to create an exhaustive list of possible user artifacts would be an exercise in futility. That said, some additional files generated by user activity on a default GNOME desktop installation are worth exploring. Because these artifacts are usually plain text, no special tools are needed to process them. Simply looking in the right location and being able to understand the significance of a given artifact is all that is required for many Linux artifacts. We will discuss a few of these artifacts in this section.
The hidden “.gconf” directory contains various GNOME application configuration files under a logical directory structure. Of particular interest in this structure is “.gconf/apps/nautilus/desktop-metadata/,” which will contain subdirectories for any media handled by the GNOME automounter. If an icon for the volume appears on the user’s desktop, an entry will be present in this directory. Each volume directory will contain a “%gconf.xml” file. An example of the content found inside this file is shown here:
user@ubuntu:~$ cat .gconf/apps/nautilus/desktop-metadata/EXTDISK@46@volume/\%gconf.xml
<?xml version="1.0"?>
<gconf>
<entry name="nautilus-icon-position" mtime="1287452747" type="string">
<stringvalue>64,222</stringvalue>
</entry>
</gconf>
The “C-time” of the %gconf.xml file should correspond to the first time the volume was connected to the system in question. In the case of this file, the embedded icon-position “mtime” value also matches this time, as the icon was never respositioned.
File: ’.gconf/apps/nautilus/desktop-metadata/EXTDISK@46@volume/%gconf.xml’
Size: 157 Blocks: 8 IO Block: 8192 regular file
Device: 1ch/28d Inode: 23498767 Links: 1
Access: (0600/-rw-------) Uid: (1000/ user) Gid: ( 1000/ user)
Access: 2010-12-28 16:24:06.276887000 -0800
Modify: 2010-10-18 18:45:50.283574000 -0700
Change: 2010-10-18 18:45:50.324528000 -0700
The “.gnome2” subdirectory contains additional GNOME application-related artifacts. One of the items of interest here is “.gnome2/evince/ev-metadata.xml,” which stores recently opened file information for items viewed with “evince,” GNOME’s native file viewer. This can provide information about files viewed on external media or inside of encrypted volumes. A similar file that may be present is “.gnome2/gedit-metadata.xml,” which stores similar information for files opened in GNOME’s native text editor “gedit.”
The .recently-used.xbel file in the user’s home is yet another cache of recently accessed files. An XML entry is added each time the user opens a file using a GTK application, and it does not appear that this file is ever purged automatically. On a heavily used system this file may grow quite large. An example entry is shown here:
<bookmark href="file:///tmp/HOWTO-BootingAcquiredWindows.pdf" added="2010-04-16T18:04:35Z" modified="2010-04-16T18:04:35Z" visited="2010-04-16T19:51:34Z">
<info>
<metadata owner="http://freedesktop.org">
<mime:mime-type type="application/pdf"/>
<bookmark:applications>
<bookmark:application name="Evince Document Viewer" exec="'evince %u'" timestamp="1271441075" count="1"/>
</bookmark:applications>
</metadata>
</info>
</bookmark>
Linux applications can cache various bits of data in the user’s home under the appropriately named directory “.cache.” One item of note in this directory is the Ubuntu/GNOME on-screen display notification log, which contains a time-stamped history of items displayed to the user via the notify-osd daemon. This can include items such as network connections and disconnections, which can be useful to determine if a laptop system was moved from one location to another.
user@ubuntu:~$ cat .cache/notify-osd.log
[2010-10-15T02:36:54-00:00, NetworkManager ] Wired network
Disconnected
[2010-10-15T02:37:30-00:00, NetworkManager ] Wired network
Disconnected
[2010-10-15T13:38:15-00:00, NetworkManager ] Wired network
Disconnected - you are now offline
[2010-10-15T13:39:03-00:00, NetworkManager ] Wired network
Disconnected - you are now offline
The “.gtk-bookmarks” file in the user’s home directory is used by the GNOME file manager (Nautilus) to generate the “Places” drop-down list of locations. The default values are shown here:
-rw-r--r-- 1 user user 132 2010-10-13 18:21 .gtk-bookmarks
file:///home/user/Documents
file:///home/user/Music
file:///home/user/Pictures
file:///home/user/Videos
file:///home/user/Downloads
Any additional or altered values in this file may indicate a user-created shortcut to a frequently accessed directory. Additionally, this file may contain links to external or portable volumes that may not be immediately apparent or that exist inside of encrypted containers.
Log analysis on Linux is generally quite straightforward. Most logs are stored in clear text, with a single line per event. Identifying which logs contain data you are after may be challenging, but processing these logs once you’ve found them is usually less involved than on Windows systems. Unfortunately, the amount of log information and the ease of access cuts both ways—logs on Linux systems tend to “roll over” after 28–30 days by default, and deleting or modifying logs is one of the most basic tasks an attacker may perform.
We will examine two types of logs: logs generated by or track user activity and logs generated by system activity.
We discussed shell history files previously—these are a great source of information about user activity. Unfortunately, because they usually do not contain any time information, their usefulness may be limited. There are additional logs that hold information about user access to the system that do record time stamps, however. Direct records of user activity on a Linux system are stored in three primary files: “/var/run/utmp,” “/var/log/wtmp,” and “/var/log/lastlog.”
The “utmp” and “wtmp” files record user logons and logoffs in a binary format. The major difference between these files is that “utmp” only holds information about active system logons, whereas “wtmp” stores logon information long term (per the system log rotation period). These files can both be accessed via the last command using the –f flag.
user@ubuntu:~$ last -f wtmp.1
user pts/2 :0.0 Thu Oct 14 19:40 still logged in
user pts/0 :0.0 Wed Oct 13 18:36 still logged in
user pts/0 cory-macbookpro. Wed Oct 13 18:22 - 18:35 (00:12)
user tty7 :0 Wed Oct 13 18:21 still logged in
reboot system boot 2.6.32-24-generi Wed Oct 13 18:17 - 21:49 (12+03:32)
user pts/0 :0.0 Wed Oct 13 18:05 - 18:05 (00:00)
user tty7 :0 Wed Oct 13 18:04 - crash (00:13)
reboot system boot 2.6.32-24-generi Wed Oct 13 18:01 - 21:49 (12+03:48)
user tty7 :0 Sat Aug 21 09:46 - crash (53+08:15)
reboot system boot 2.6.32-24-generi Sat Aug 21 08:46 - 21:49 (65+13:03)
user pts/0 :0.0 Sat Aug 21 08:23 - 08:44 (00:21)
user tty7 :0 Sat Aug 21 08:21 - down (00:22)
wtmp.1 begins Sat Aug 21 08:21:52 2010
The “lastlog” is a binary log file that stores the last logon time and remote host for each user on the system. On a live system, this file is processed via the lastlog command. Simple Perl scripts exist for parsing the file offline [3], but these scripts need to be modified to match the format definition of “lastlog” for the given system. The structures for “utmp,” “wtmp,” and “lastlog” are all defined in the “/usr/include/bits/utmp.h” header file on Linux distributions.
The bulk of system logs on a Linux system are stored under the “/var/log” directory, either in the root of this directory or in various subdirectories specific to the application generating the logs. Syslog operates on client/server model, which enables events to be recorded to a remote, dedicated syslog server.
However, on a standalone Linux system, events are usually written directly to the files on the local host.
Syslog uses a “facility/priority” system to classify logged events. The “facility” is the application or class of application that generated the event. The defined syslog facilities are listed in Table 5.2.
Table 5.2
Syslog Facilities
auth | Authentication activity |
authpriv | Authentication and PAM messages |
cron | Cron/At/Task Scheduler messages |
daemon | Daemons/service messages |
kern | Kernel messages |
Lpr | Printing services |
Email (imap, pop, smtp) messages | |
news | Usenet News Server messages |
syslog | Messages from syslog |
user | User program messages |
Local* | Locally defined |
Syslog levels indicate the severity of the issue being reported. The available levels and the urgency they are intended to relay are displayed in Table 5.3.
Table 5.3
Syslog Severities
Emerg or panic | System is unusable |
Alert | Action must be taken immediately |
Crit | Critical conditions |
Err | Error conditions |
Warning | Warning conditions |
Notice | Normal but significant conditions |
Info | Informational messages |
Debug | Debugging level messages, very noisy |
None | Used to override (*) wildcard |
* | All levels except none |
Syslog events are a single line containing made up of five fields:
This uniform logging format makes searching for log entries of note on a Linux system relatively easy. It is important to note that most Linux systems implement some level of log rotation. For example, a default Ubuntu Linux desktop installation rotates logs every month, compressing the older log file with GZip for archival. Server systems will likely archive logs more rapidly and are more likely to delete logs from active systems after a shorter retention period.
Table 5.4 contains the default paths of some common logs of interest in many Linux examinations.
Table 5.4
Common Log Files of Interest
/var/log/messages | Catch-all, nonspecified logs |
/var/log/auth.log | User authentication successes/failures |
/var/log/secure | |
/var/log/sulog | “su” attempts/success |
/var/log/httpd/* | Apache Web Server |
/var/log/samba/smbd.log | Samba (Windows File Sharing) |
/var/log/samba/nmbd.log | |
/var/log/audit/audit.log | Auditd/SELinux |
/var/log/maillog | Mail servers (sendmail/postfix) |
/var/log/cups/access_log | CUPS Printer Services |
/var/log/cron | Anacron/cron |
/var/log/xferlog | FTP servers |
Linux system administrators are generally quite familiar with processing system log files using command line tools. Because Linux system logs are by and large plain text, this is done quite easily by chaining together a handful of text-processing and searching tools. The primary tools used for log file processing on Linux systems are sed, awk, and grep.
Sed is a stream editor. It is designed to take an input stream, edit the content, and output the altered result. It reads input line by line and performs specified actions based on matching criteria—usually line numbers or patterns to match. Sed operators are passed as single characters. Basic sed operators are:
For purposes of log analysis, sed is generally used to quickly eliminate log lines that are not of interest. For example, to delete all log lines containing the word “DEBUG,” we would use the following sed command:
sed /DEBUG/d logfile.txt
Awk is a more robust text processing utility, but this additional power is wrapped in additional complexity. Sed and awk are used together frequently in Linux and Unix shell scripts to perform text transformations. While sed is line based, awk can perform field operations, so it is useful when you want to compare multiple text fields in a single log line. The default field separator in awk is any white space, but this can be changed to any character using the -F argument. So, to print the fifth field from every line in a log file, you would use:
awk ’{print $5}’ logfile.txt
To print the third field from every line in a comma-separated log file, you would use:
awk -F, ’{print $3}’ logfile.txt
The grep command is a powerful text-searching utility. For log analysis, this is generally used to return lines that match specific criteria. Pattern matching in grep is handled by regular expressions (see the Regular Expressions sidebar for more information). For basic use, though, simply supplying a literal string to match is effective enough.
Using these three commands together can be quite effective. Let’s say you have a SSH brute force login attack that may have been successful. A fast or long-lived SSH brute force attack can generate thousands to hundreds of thousands of log lines. Using these utilities we can whittle our log down to relevant entries very quickly. First, we will use sed to eliminate all lines that aren’t generated by the SSH daemon (sshd). Next, we will use grep to extract only lines pertaining to accepted connections. Finally, we will use awk to reduce some of the extraneous fields in the log line.
user@ubuntu:/var/log$ sudo sed ’/sshd/!d’ auth.log | grep Accepted | awk ’{print $1, $2, $3, $9, $11}’
Dec 17 13:50:51 root 192.168.7.81
Dec 18 12:55:09 root 192.168.7.81
On Linux systems there are two main mechanisms for scheduling a job to be run in the future: at and cron. The at command is used to run a task once, at a specific point in the future. At jobs can be found under “/var/spool/cron.” The cron process is used to schedule repeating tasks—processes to be run every night, once a week, every other week, and so on. There are two locations where cron jobs will be stored. System cron jobs are found in a set of directories defined in the “/etc/crontab” file and are typically in the aptly named directories “/etc/cron.hourly,” “/etc/cron.daily,” “/etc/cron.weekly,” and “/etc/cron.monthly.” Any scheduled tasks added by users will be found in “/var/spool/cron,” such as jobs added by the at command. As you can surmise, cron jobs are a terrific way for an attacker to maintain persistence on a compromised system, so verifying these jobs will be critical in an intrusion investigation.
While the adoption of Linux as a desktop is still fairly sparse, many of the skills involved in processing a Linux system are applicable to other Unix-like systems, to include Mac OS X, which is discussed in Chapter 6. While it is not taking the desktop world by storm, Linux is becoming more and more popular on embedded devices such as mobile phones and tablet computers. Examiners capable of exploiting these data sources for artifacts of interest will be in high demand in the years to come.
1. Card Rémy, Pascal Blaise, Ts’o Theodore, Tweedie Stephen. Design and Implementation of the Second Extended Filesystem. In: http://web.mit.edu/tytso/www/linux/ext2intro.html, (accessed 9.10.10);.
2. Farmer D, Venema W. Forensic Discovery Upper Saddle River, NJ: Addison-Wesley; 2005.
3. Formatting and Printing Lastlog. In: http://www.hcidata.info/lastlog.htm, (accessed 9.11.10);.