Using kdump and SysRq

The kdump mechanism is a Linux kernel feature, which allows you to create dumps if your kernel crashes. It produces an exact copy of the memory, which can be analyzed for the root cause of the crash.

SysRq is a feature supported by the Linux kernel, which allows you to send key combinations to it even when your system becomes unresponsive.

How to do it…

First, we'll set up kdump and SysRq, and afterwards, I'll show you how to use it to debug a dump.

Installing and configuring kdump and SysRq

Let's take a look at how this is installed and configured:

  1. Install the necessary packages for kdump by executing the following command:
    ~]# yum install -y kexec-tools
    
  2. Ensure that crashkernel=auto is present in the GRUB_CMDLINE_LINUX variable declaration in the /etc/sysconfig/grub file using this command:
    GRUB_CMDLINE_LINUX="rd.lvm.lv=system/usr rd.lvm.lv=system/swap vconsole.keymap=us rd.lvm.lv=system/root vconsole.font=latarcyrheb-sun16 crashkernel=auto"
  3. Start kdump by running the following:
    ~]# systemctl start kdump
    
  4. Now, enable kdump to start at boot, as follows:
    ~]# sysctl enable kdump
    
  5. Configure SysRq to accept all commands via the following commands:
    ~]# echo "kernel.sysrq = 1" >> /etc/sysctl.d/sysrq.conf
    ~]# systemctl -q -p /etc/sysctl.d/sysrq.conf
    
  6. Regenerate your intramfs (initial RAM file system) to contain the necessary information for kdump by executing the following command:
    ~]# dracut --force
    
  7. Finally, reboot through the following command:
    ~]# reboot
    

Using kdump tools to analyze the dump

Although you'll find most of the information you're looking for in the vmcode-dmesg.txt file, it can be useful sometimes to look into the bits and bytes of the vmcore dump, even if it is just to know what the people at Red Hat do when they ask you to send you a vmcore dump. Perform the following steps:

  1. Install the necessary tools to debug the vmcore dump via the following command:
    ~]# yum install -y --enablerepo=*debuginfo crash kernel-debuginfo
    
  2. Locate your vmcore by executing the following:
    ~]# find /var/crash -name 'vmcore'
    /var/crash/127.0.0.1-2015.10.31-12:03:06/vmcore
    

    Note

    If you don't have a core dump, you can trigger this yourself by executing the following:

    ~]# echo c > /proc/sysrq-trigger
    
  3. Use crash to analyze the contents, as follows:
    ~]# crash /var/crash/127.0.0.1-2015.10.31-12:03:06/vmcore /usr/lib/debug/lib/modules/<kernel>/vmlinux
    

    Here, <kernel> must be the same kernel as the one that the dump was created for:

    Using kdump tools to analyze the dump
  4. Display the kernel message buffer (this can also be found in the vmcore-dmesg.txt dump file) by running the following command:
    crash> log
    

    Here's what the output should look like:

    Using kdump tools to analyze the dump
  5. Display the kernel stack trace through the following:
    crash> bt
    

    Here's what the output should look like:

    Using kdump tools to analyze the dump
  6. Now, show the processes at the time of the core dump, as follows:
    crash> ps
    

    Here's what the output should look like:

    Using kdump tools to analyze the dump

There's more…

The default kdump configuration uses /var/crash to dump its memory on. This MUST be on the root filesystem. Some systems are configured with a separate filesystem for /var, so you need to change the location in /etc/kdump.conf or use a different target type, such as raw, nfs, and so on. If your crash directory is located on a nonroot filesystem, the kdump service will fail!

Although the crash utility can provide a lot of details about the crash, usually you're set with the contents of the vmcore-dmesg.txt file, which resides in the same directory as the vmcore file. So, I suggest that you parse this file before digging into the bits and bytes of the memory dump.

SysRq, as stated before, allows you to control your system even if it is in a state that doesn't allow you to do anything at all. However, it does require you to have access to the system's console.

By default, kdump creates a dump and reboots your system. In the event that this doesn't happen and you don't want to push the power button on your (virtual) system, SysRq allows you to send commands through the console to your kernel.

The key combination needed to send the information differs a little from architecture to architecture. Take a look at the following table for reference:

Architecture

Key combination

x86

<Alt><SysRq><command key>

Sparc

<Alt><Stop><command key>

Serial console (PC style only)

This sends a BREAK and, within 5 seconds, the command key.

Sending BREAK twice is interpreted as a normal BREAK.

PowerPC

<Alt><Print Screen>(or <F13>)<command key>

So, on an x86 system, you would attempt to sync your disks before rebooting it by executing the following commands:

<Alt><SysRq><s>
<Alt><SysRq><b>

Alternatively, if you still have access to your terminal, you can do the same by sending characters to /proc/sysrq-trigger, as follows:

~]# echo s > /proc/sysrq-trigger
~]# echo b > /proc/sysrq-trigger

The following key commands are available:

Command key

Function

b

This immediately reboots your system. It does not sync or unmount disks. This can result in data corruption!

c

This performs a system crash by a NULL pointer dereference. A crashdump is taken if kdump is configured.

d

This shows all the locks held.

e

This sends a SIGTERM signal to all your processes, except for init.

f

This calls oom_kill to kill any process hogging the memory.

g

This is used by the kernel debugger (kgdb).

h

This shows help. (Memorize this option!)

i

This sends a SIGKILL signal to all your processes, except for init.

j

This freezes your filesystems with the FIFREEZE ioctl.

k

This kills all the programs on the current virtual console.

It enables a secure login from the console as this kills all malware attempting to grab your keyboard input, for example.

l

This shows a stack trace for all active CPUs.

m

This dumps the current memory info to your console.

n

You can use this to make real-time tasks niceable.

o

This shuts down your system and turns it off (if configured and supported).

p

This dumps the current registers and flags to your console

q

This will dump a list of all armed hrtimers (except for timer_list timers) per CPU together with detailed information about all clockevent devices.

r

This turns off your keyboard's raw mode and sets it to XLATE.

s

This attempts to sync all your mounted filesystems, committing unwritten data to them.

t

This dumps a list of current tasks and their information to your console.

u

This attempts to remount all your filesystems as read-only volumes.

v

This causes the ETM buffer to dump (this is ARM-specific).

w

This dumps all the tasks that are in an uninterruptable (blocked) state.

x

This is used by xmon on ppc/powerpc platforms. This shows the global PMU registers on SPARC64.

y

This shows global CPU registers (this is SPARC64-specific).

z

This dumps the ftrace buffer.

0 - 9

This sets the console's log level, controlling which messages will be printed. The higher the number, the more the output.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset