7.1. Working with core files

A core file is generated when various errors occur in an application and the process aborts. The reasons can be errors such as memory-address violations, illegal instructions, bus errors, and user-generated quit signals. The core file contains a memory image of the aborted process. For a detailed definition of the contents of a core file, please refer to AIX 5L Version 5.2 Files Reference.

Even if the application does not stop, it might be useful to have a core file as an image of the memory state of the process in a specific time. Core files can be used for debugging and examination by yourself on your system or by remote technical support specialists on their system. For the distribution of a core file, there is a special way to collect all needed information.

7.1.1. Core file naming

Before AIX 5L Version 5.1, a core file was always stored in a file named core. If the same or another application generated another core file before you renamed the previous one, the original content was lost.

Beginning with AIX 5L Version 5.1, you can enable a unique naming of core files, but be aware that the default behavior is to name the files core. You apply the new enhancement by setting the environment variable CORE_NAMING to a non-NULL value, for example:

CORE_NAMING=yes

After setting CORE_NAMING, you can disable this feature by setting the variable to the NULL value. For example, if you are using the Korn shell, do the following:

export CORE_NAMING=

After setting CORE_NAMING, all new core will be stored in files of the format core.pid.ddhhmmss, where:

pidProcess ID
ddDay of the month
hhHours
mmMinutes
ssSeconds

In the following example, two core files are generated by a process identified by PID 30480 at different times:

$ ls -l core*
-rw-r--r--   1 ausres01 itsores      8179 Jan 28 2003    core.30480.28232347
-rw-r--r--   1 ausres01 itsores      8179 Jan 28 2003    core.30482.28232349

The time stamp used is in GMT[1] and your time zone will not be used.

[1] Greenwich Mean Time

7.1.2. Creating core files with assert()

The assert() macro, provided by the assert.h header file, is a common way to produce a core file, when you are sure where the application logic goes wrong. For example, if you are sure the integer variable cnt should not be greater than 100 in the certain point of your code, you can insert the following line into your source code:

#include <assert.h>
int func(void)
{
    ...
    /* You are sure that cnt should not be greater than 100 in here. */
    assert(cnt <= 100);
    ...
    /* Application logic continues. */
}

If the variable cnt is greater than 100 in the highlighted line in the above example, the assert() macro calls the abort() routine and generates a core file and the process dies.

7.1.3. Creating core files with coredump()

If you have an application behaving unexpectedly, you can have a core file without terminating the application process by adding the coredump() sub-routine in your source code. The generated core file, which contains the memory image of the process, can be used for debugging the problem with dbx.

In a multi-threaded process, only one thread at a time should attempt to call coredump(). Subsequent calls to coredump() while a core dump (initiated by another thread) is in progress will fail.

To use coredump(), you must compile your source code with the -bM:UR options; otherwise, the routine will fail with an error code of ENOTSUP.

The syntax of coredump() is:

#include <core.h>
int coredump(struct coredump info *coredump infop)

By specifying a file name and its string length in the coredumpinfo structure parameter, you can specify any file name as the generating core file. For example, if we compile the source file shown in Example 7-1 with cc -bM:UR coredump.c, then run a.out, then the core file, mycore, will be generated as shown in the following example:

$ cc -bM:UR coredump.c
$ ./a.out; echo $?
0
$ ls -l mycore
-rw-r--r--   1 ausres01 itsores      8183 Feb 05 10:55 mycore

Example 7-1. coredump.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <core.h>

int
main(int argc, char *argv[])
{
    int rc;
    struct coredumpinfo cdinfo;

    cdinfo.name = "mycore";
    cdinfo.length = strlen("mycore");

    if ((rc = coredump(&cdinfo)) == -1) {
        perror("coredump()");
    }

    exit(0);
}

For more information about coredump(), please refer to AIX 5L Version 5.2 Technical Reference: Base Operating System and Extensions.

7.1.4. Including shared memory information in the core file

On AIX, by default, the detailed information about shared memory segments and thread stacks are not collected when generating core files. If you need to have core files that contain detailed information (called full-core), there are two ways to do so.

System wide full-core setting

Use the chdev command to change the full core attribute setting system wide. The default value is not to take full core dump, as shown in the following example:

# lsattr -El sys0 -a fullcore
fullcore false Enable full CORE dump True

To change the setting, issue the chdev -l sys0 -a fullcore=true command as the root user.

Add a signal handler with the SA_FULLDUMP flag

You can modify your application source code by adding a signal handler with the SA_FULLDUMP flag set for the signal that will cause the core file. To register a signal handler, use the sigaction().

Note

The full-core setting is required to debug multi-threaded applications with most debugging methods.


7.1.5. Gathering core files

All associated information of a core file can be packed and archived in a pax file, which can be stored on disk or tape or sent to another system for investigation. At the time of the writing of this redbook, the Distributed Debugger only supports debugging of core files on the machine that created them.

The snapcore command

Use the snapcore command to gather the core file, its program binary executable file, and its dependent library files. The syntax of the command is:

snapcore core_filename [program_filename]

Specify the full path names for core file and program file name. Without defining the program name, snapcore will read the program name out of the core file and search it in the directories defined by PATH.

The command will produce a compressed pax file in the /tmp/snapcore directory by default. Use the -d directory option to specify an alternative directory. In the following example, a core file generated by prog1 and its dependent library files are archived in a compressed pax file, /tmp/snapcore/core.31442.pax.Z:

$ snapcore core.31374.29164438 prog1
Core file "core.31374.29164438" created by "prog1"
pass1() in progress ....
                Calculating space required .
                Total space required is 6578 kbytes ..
                Checking for available space ...
                Available space is 119148 kbytes
pass1 complete.
pass2() in progress ....
                Collecting fileset information .
                Collecting error report of CORE_DUMP errors ..
Creating readme file ..
                Creating archive file ...
                Compressing archive file ....
pass2 completed.
Snapcore completed successfully. Archive created in /tmp/snapcore.
$ ls -l /tmp/snapcore
total 5960
-rw-r--r--   1 ausres01 itsores     3049565 Jan 29 10:52 snapcore_31442.pax.Z

You can check what files are gathered in the archive file, as shown in the following example:

$ uncompress -c /tmp/snapcore/snapcore_31442.pax.Z | pax
core.31374.29164438
README
lslpp.out
errpt.out
prog1
./usr/lib/libc.a
./usr/lib/libcrypt.a
./usr/ccs/lib/libc.a

For more information about the snapcore command refer to AIX 5L Version 5.2 Reference Documentation: Commands Reference.

The check_core command

You can determinate the program that caused the core and the dependent libraries for an existing core file by using the check_core[2] command.

[2] The check_core command is included in the bos.rte.serv_aid fileset.

The following example shows that the core file, core.31374.29164438, was generated by the program file, prog1, and its dependent libraries:

$ /usr/lib/ras/check_core core.31374.29164438
/usr/lib/libc.a
/usr/lib/libcrypt.a
prog1

7.1.6. AIX error log entry

Each core dump creates a new entry in the AIX error log. It can be useful for identifying an application that dumps. Use the following command for examining all entries caused by an core dump:

errpt -aJ CORE_DUMP

Use the -s mmddhhmmyy option to filter error log entries starting after the given time (mm = month, dd = day, hh = hours, mm = minutes, yy = year).

If you use the -A option instead of -a, you will see a more condensed output. Example 7-2 shows that the program prog1 identified with PID 31242 generated a core file because of the signal 11 (SIGSEGV) delivery.

Example 7-2. AIX error log: CORE_DUMP
$ errpt -A -J CORE_DUMP -s 0129130003
----------------------------------------------------------------
LABEL:          CORE_DUMP
Date/Time:       Wed Jan 29 13:20:35 CST
Type:            PERM
Resource Name:   SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Detail Data
SIGNAL NUMBER
          11
USER'S PROCESS ID:
       31242
FILE SYSTEM SERIAL NUMBER
          13
INODE NUMBER
        2048
PROGRAM NAME
prog1
ADDITIONAL INFORMATION
main 10C
main F8
__start 8C

7.1.7. Lightweight core file support

Besides the standard core file format, you can use the lightweight core file format, which complies with the Parallel Tools Consortium Lightweight Core File Format. If a multi-threaded program crashes or hangs, it is nearly impossible to find out how far the execution got. Dumping out the memory information of many processors requires time and disk space. The resulting information in a standard core file is of little use for this case. The lightweight core file format is a platform independent format with a snapshot of the current location of an application. The lightweight core file is a high level, symbolic collection of the program state. It is an ASCII file, readable by humans and analysis programs. Running prog2 from Example 7-3 on page 257, we receive the lightweight core file named lwcore:

$ more lwcore
+++PARALLEL TOOLS CONSORTIUM LIGHTWEIGHT COREFILE FORMAT version 1.0
+++LCB 1.0 Wed Feb 12 08:19:07 2003 Generated by IBM AIX 5.1
#
+++ID Node 0 Process 30820 Thread 1
***FAULT "SIGSEGV - Segmentation violation"
+++STACK
main : 0x0000004c
---STACK
---ID Node 0 Process 30820 Thread 1
---LCB
+++PARALLEL TOOLS CONSORTIUM LIGHTWEIGHT COREFILE FORMAT version 1.0
+++LCB 1.0 Web Feb 12 08:31:09 2003 Generated by IBM AIX 5.1
#
+++ID Node 0 Process 37548 Thread 1
***FAULT "SIGSEGV - Segmentation violation"
+++STACK
main : 0x0000004c
---STACK
---ID Node 0 Process 37548 Thread 1
---LCB

Creating a lightweight core file with install_lwcf_handler()

The install_lwcf_handler() subroutine provides a lightweight core file instead of a standard core file when an application crashes. It is part of the PTools Library (libptools_ptr.a). There are two ways to create the lightweight core file:

  • Call the install_lwcf_handler() subroutine directly in your application to register a signal handler (Example 7-3 on page 257).

  • Use the linker option -binitfini:install_lwcf_handler, so that the function will be called by starting the program automatically. In this way, you do not have to change your application code.

The default file name for the lightweight core file is lw_core. Use the LIGHTWEIGHT_CORE environment variable to change it to your desired file name, or set it to STDERR to redirect the lightweight core file content to the standard error.

Example 7-3. prog2.c with install_lwcf_handler() subroutine
#include <stdlib.h>
#include <stdio.h>

void install_lwcf_handler (void);

main(int argc, char *argv[])
{
        int i,j;
        char s[10];

        install_lwcf_handler();
        printf("I will start counting!
");
        while (i >= j) {
                s[j] = j;
                j--;
                i++;
        }
        printf("My result is %s
",s);
        exit(0);
}
$ cc prog2.c -o prog2 -lptools_ptr
$ export LIGHTWEIGHT_CORE="lwcore";
$ prog2
I will start counting!
$

Please refer to AIX 5L Version 5.2 Technical Reference: Base Operating System and Extensions for more information.

Creating a lightweight core file with mt_trce()

The mt_trce() subroutine provides a lightweight core file with the trace back information of all threads allocated in the process space. Threads, except for the calling thread, are suspended during the execution of the mt_trce() subroutine.

Refer to the AIX 5L Version 5.2 Technical Reference: Base Operating System and Extensions for a complete overview of this topic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset