#37 Dead Code Locator

There's an urban legend about a group of programmers who were working on a government contract changing some code from one version of Jovial to another. One of them came to a function with obscure and very confused logic, so he decided that instead of just mechanically translating the code, he would see how the function was used and then perhaps write a better one.

Imagine his surprise when he discovered that the function was not called at all.

So he went to his boss and said, "This function is never used. We can eliminate it."

"We already know that," responded the boss. "But the cost of doing the paperwork to eliminate this function is far greater than the cost of converting it. So go back and update it."

The programmer went back to his job with a wiser understanding of how government contracts really work.

Back in the real world, in most cases it is better to delete unused code than it is to maintain it. But how do you know what's used and what's not? That's where Perl comes in.

The Code

   1 use strict;
   2 use warnings;
   3
   4 my %symbols;
   5
   6 open IN_FILE, "nm @ARGV|" or
   7 die("Could not connect to nm command");
   8
   9 my $cur_file;   # File we are looking at
  10
  11 while (<IN_FILE>) {
  12     if (/(.*):$/) {
  13         $cur_file = $1;
  14         next;
  15     }
  16     if (length($_) < 12) {
  17         next;   # Blank line or other junk
  18     }
  19
  20     my $type = substr($_, 9, 1);
  21     my $name = substr($_, 11);
  22     chomp($name);
  23
  24     if ($type eq "U") {
  25         $symbols{$name}->{'undefined'} = $cur_file;
  26     } else {
  27         $symbols{$name}->{'defined'} = $cur_file;
  28     }
  29 }
  30
  31 foreach my $cur_symbol (sort keys %symbols) {
  32     if (not defined($symbols{$cur_symbol}->{undefined})) {
  33         print "Not used.
";
  34         print "  Symbol: $cur_symbol
";
  35         print "  Defined in: $symbols{$cur_symbol}->{'defined'}
";
  36     }
  37 }

Running the Script

The script takes a set of object files as input. Any symbols in the files defined as external but not used in another object file will be printed:

$ dead.pl test-prog.o test-sub.o

The Results

Not used.
  Symbol: bar
  Defined in: test-sub.o
Not used.
  Symbol: main
  Defined in: test-code.o

How It Works

The program starts by running every program through the nm command. This command lists the global symbols defined and used by each object file. More important, it also lists the symbol type. The symbol type can be "U" for an undefined symbol definition. (The code letter tells us what sort of definition it is, but for this program we don't care. Defined is defined and type does not matter.)

For example, let's look at what happens nm is run on some test files:

$ nm test-prog.o test-sub.o
test-code.o:
         U foo
00000000 T main

test-sub.o:
00000004 C bar
00000004 C foo

The file test-code.o uses the symbol foo and defines the symbol main. The file test-sub.o defines the symbols foo and bar.

The Perl script reads in the output of the nm command and figures out where each symbol is defined and used. Any symbol that is defined but not used is considered dead code.

Let's take a look at the process in detail: The first thing the script does is open an input pipe to the output of the nm command:

   6 open IN_FILE, "nm @ARGV|" or
   7 die("Could not connect to nm command");

Next, each line is processed in the input stream. The first thing you check for is a filename line. These lines all end in a colon (:) and are the only lines that do. If you find one, you set the current filename:

  12     if (/(.*):$/) {
  13         $cur_file = $1;
  14         next;
  15     }

Next you check for blank lines (or any other type of short line). These are ignored:

  16     if (length($_) < 12) {
  17         next;   # Blank line or other junk
  18     }

At this point you have a line that contains symbol information. The first eight characters of the line are the value of the symbol (if any). A type character is located in character number 10 (position number = 9) and the symbol name begins in column number 12 (position = 11).

The program extracts the type and symbol name from the line:

  20     my $type = substr($_, 9, 1);
  21     my $name = substr($_, 11);
  22     chomp($name);

If the symbol type is "U", then the symbol is undefined in the current file. That means that it's used. Any other symbol type code indicates a definition. The use or definition of the symbol is recorded:

  24     if ($type eq "U") {
  25         $symbols{$name}->{'undefined'} = $cur_file;
  26     } else {
  27         $symbols{$name}->{'defined'} = $cur_file;
  28     }

Once all the information has been processed, all you have to do is identify the dead code and print the results. A dead symbol is one that's defined but not used; in other words, one for which there is no undefined entry:

  31 foreach my $cur_symbol (sort keys %symbols) {
  32     if (not defined($symbols{$cur_symbol}->{undefined})) {
  33         print "Not used.
";
  34         print "  Symbol: $cur_symbol
";
  35         print "  Defined in: $symbols{$cur_symbol}->{'defined'}
";
  36     }
  37 }

The result is a list of symbols that are not used and are candidates for potential elimination.

Hacking the Script

Currently the script is designed to handle individual object files, not libraries. Libraries are a little tricky because only the files that are needed are actually included in the final executable, so you'd have to add logic to ignore files.

This program illustrates how Perl can be used on object files for data mining. Dead code is just one type of information that can be obtained. You can also find other information, such as module dependencies and how many modules use a global symbol.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset