#20 Fixing Bad Filenames

In the beginning there was the command line—and the filename had form and consistency. Then came the GUI-based file manager. And people could put just about anything they wanted to in a filename. This may look nice in the GUI, but it creates real problems for those of us who still use the command line.

For example, I've had to deal with files with names that looked like this:

Fibber&Molly [10-1-47] "Fibber's lost $" (vg snd!).mp3

Now I count no fewer than 17 nasty characters in that string that require special handling. So if I want to play from the command line I must type this:

$ mpg123 Fibber&Molly [10-1-47] "Fibber's lost $" (v\g snd!).mp3

It would be nice if there was a program that would take mean filenames and get rid of all the mean characters. That is what this script does.

The Code

  1 #!/usr/bin/perl
  2 foreach my $file_name (@ARGV)
  3 {
  4     # Compute the new name
  5     my $new_name = $file_name;
  6
  7     $new_name =~ s/[ 	]/_/g;
  8     $new_name =~ s/[()[]<>\]/x/g;
  9     $new_name =~ s/['']/=/g;
 10     $new_name =~ s/&/_and_/g;
 11     $new_name =~ s/$/_dol_/g;
 12     $new_name =~ s/;/:/g;
 13
 14     # Make sure the names are different
 15     if ($file_name ne $new_name)
 16     {
 17         # If a file already exists by that name
 18         # compute a new name.
 19         if (-f $new_name)
 20         {
 21             my $ext = 0;
 22
 23             while (-f $new_name.".".$ext)
 24             {
 25                 $ext++;
 26             }
 27             $new_name = $new_name.".".$ext;
 28         }
 29         print "$file_name -> $new_name
";
 30         rename($file_name, $new_name);
 31     }
 32 }
 33

Running the Script

To run the script, just specify the bad filenames on the command line:

$ fix-names.pl Fibb*

(Wildcards work very nicely when it comes to dealing with rotten filenames. This wildcard matches the bad filename used as an example.)

The Results

Fibber&Molly [10-1-47] "Fibber's lost $" (vg snd!).mp3 ->
Fibber_and_Molly_x10-1-47x_"Fibber=s_lost__dol_"_xvxg_snd!x.mp3

How It Works

The script loops through each file on the command line:

  2 foreach my $file_name (@ARGV)

It then computes a new filename by replacing all the bad stuff in the name with something typeable. For example, the first substitution changes all spaces and tabs to _. An underscore may not be a space, but it looks like one:

  7     $new_name =~ s/[ 	]/_/g;

A similar edit is applied for all the other bad things you see in filenames:

  8     $new_name =~ s/[()[]<>]/x/g;
  9     $new_name =~ s/['']/=/g;
 10     $new_name =~ s/&/_and_/g;
 11     $new_name =~ s/$/_dol_/g;
 12     $new_name =~ s/;/:/g;

Next, make sure that the name actually changed. If it didn't, there's no work to be done since the filename is already sane.

 14     # Make sure the names are different
 15     if ($file_name ne $new_name)
 16     {

Renaming will fail if a file with the new name already exists. To avoid this problem, check to see if you are about to have a name collision, and if one is eminent, change your filename. This is done by adding a numerical extension to the name.

In other words, if you are renaming the file to the_file and the_file exists, try the_file.0, the_file.1, the_file.2 until you find a name that won't cause trouble:

 17         # If a file already exists by that name
 18         # compute a new name.
 19         if (-f $new_name)
 20         {
 21             my $ext = 0;
 22
 23             while (-f $new_name.".".$ext)
 24             {
 25                 $ext++;
 26             }
 27             $new_name = $new_name.".".$ext;
 28         }

You've gone through all the transformations; now you're ready to do the renaming:

 29         print "$file_name -> $new_name
";
 30         rename($file_name, $new_name);

The filename is fixed and you're ready for the next one.

Hacking the Script

This script doesn't get rid of all the bad characters. It just eliminates the ones I've seen in the files I've downloaded. You can easily add to the script to take care of any bad stuff you find. I've also tried to leave as much of the original filename as intact as possible—for example, mapping $ to _dol_. If you want a different mapping, feel free to change the script.

During my college days, I got into a contest with one of my fellow computer science students. My goal was to create a file in his directory that he could not delete. And I created some files with some mean names, such as "delete.me " (note the trailing space), "-f", and others with special characters in them. Eventually he learned how to delete them all.

In the end, I exploited a system bug that allowed me to stick the file seven levels deep on a system in which the directory nesting was limited to six. The operating system refused to let him even look at the file, much less delete it. (The OS was the DecSystem-10, if you're interested.)


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset