The kdump mechanism is a Linux kernel feature, which allows you to create dumps if your kernel crashes. It produces an exact copy of the memory, which can be analyzed for the root cause of the crash.
SysRq is a feature supported by the Linux kernel, which allows you to send key combinations to it even when your system becomes unresponsive.
First, we'll set up kdump and SysRq, and afterwards, I'll show you how to use it to debug a dump.
Let's take a look at how this is installed and configured:
~]# yum install -y kexec-tools
crashkernel=auto
is present in the GRUB_CMDLINE_LINUX
variable declaration in the /etc/sysconfig/grub
file using this command:GRUB_CMDLINE_LINUX="rd.lvm.lv=system/usr rd.lvm.lv=system/swap vconsole.keymap=us rd.lvm.lv=system/root vconsole.font=latarcyrheb-sun16 crashkernel=auto"
kdump
by running the following:~]# systemctl start kdump
kdump
to start at boot, as follows:~]# sysctl enable kdump
~]# echo "kernel.sysrq = 1" >> /etc/sysctl.d/sysrq.conf ~]# systemctl -q -p /etc/sysctl.d/sysrq.conf
~]# dracut --force
~]# reboot
Although you'll find most of the information you're looking for in the vmcode-dmesg.txt
file, it can be useful sometimes to look into the bits and bytes of the vmcore
dump, even if it is just to know what the people at Red Hat do when they ask you to send you a vmcore
dump. Perform the following steps:
vmcore
dump via the following command:~]# yum install -y --enablerepo=*debuginfo crash kernel-debuginfo
vmcore
by executing the following:~]# find /var/crash -name 'vmcore' /var/crash/127.0.0.1-2015.10.31-12:03:06/vmcore
crash
to analyze the contents, as follows:~]# crash /var/crash/127.0.0.1-2015.10.31-12:03:06/vmcore /usr/lib/debug/lib/modules/<kernel>/vmlinux
Here, <kernel>
must be the same kernel as the one that the dump was created for:
vmcore-dmesg.txt
dump file) by running the following command:crash> log
Here's what the output should look like:
crash> bt
Here's what the output should look like:
crash> ps
Here's what the output should look like:
The default kdump configuration uses /var/crash
to dump its memory on. This MUST be on the root filesystem. Some systems are configured with a separate filesystem for /var
, so you need to change the location in /etc/kdump.conf
or use a different target type, such as raw
, nfs
, and so on. If your crash directory is located on a nonroot filesystem, the kdump service will fail!
Although the crash utility can provide a lot of details about the crash, usually you're set with the contents of the vmcore-dmesg.txt
file, which resides in the same directory as the vmcore
file. So, I suggest that you parse this file before digging into the bits and bytes of the memory dump.
SysRq, as stated before, allows you to control your system even if it is in a state that doesn't allow you to do anything at all. However, it does require you to have access to the system's console.
By default, kdump creates a dump and reboots your system. In the event that this doesn't happen and you don't want to push the power button on your (virtual) system, SysRq allows you to send commands through the console to your kernel.
The key combination needed to send the information differs a little from architecture to architecture. Take a look at the following table for reference:
Architecture |
Key combination |
---|---|
x86 |
|
Sparc |
|
Serial console (PC style only) |
This sends a Sending |
PowerPC |
|
So, on an x86 system, you would attempt to sync your disks before rebooting it by executing the following commands:
<Alt><SysRq><s> <Alt><SysRq><b>
Alternatively, if you still have access to your terminal, you can do the same by sending characters to /proc/sysrq-trigger
, as follows:
~]# echo s > /proc/sysrq-trigger ~]# echo b > /proc/sysrq-trigger
The following key commands are available:
Command key |
Function |
---|---|
|
This immediately reboots your system. It does not sync or unmount disks. This can result in data corruption! |
|
This performs a system crash by a |
|
This shows all the locks held. |
|
This sends a |
|
This calls |
| |
|
This shows help. (Memorize this option!) |
|
This sends a |
|
This freezes your filesystems with the |
|
This kills all the programs on the current virtual console. It enables a secure login from the console as this kills all malware attempting to grab your keyboard input, for example. |
|
This shows a stack trace for all active CPUs. |
|
This dumps the current memory info to your console. |
|
You can use this to make real-time tasks niceable. |
|
This shuts down your system and turns it off (if configured and supported). |
|
This dumps the current registers and flags to your console |
|
This will dump a list of all armed |
|
This turns off your keyboard's raw mode and sets it to |
|
This attempts to sync all your mounted filesystems, committing unwritten data to them. |
|
This dumps a list of current tasks and their information to your console. |
|
This attempts to remount all your filesystems as read-only volumes. |
|
This causes the ETM buffer to dump (this is ARM-specific). |
|
This dumps all the tasks that are in an uninterruptable (blocked) state. |
|
This is used by xmon on ppc/powerpc platforms. This shows the global PMU registers on SPARC64. |
|
This shows global CPU registers (this is SPARC64-specific). |
|
This dumps the |
|
This sets the console's log level, controlling which messages will be printed. The higher the number, the more the output. |
For more information about SysRq and systemd, refer to the following page: https://github.com/systemdaemon/systemd/blob/master/src/linux/Documentation/sysrq.txt
Red Hat has a complete crash dump guide at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Kernel_Crash_Dump_Guide/index.html.