Metadata diff of binary files

Binary files can be hard to diff, depending on the type of the file. Often, the only option is to load two instances of the program to show the files and check the differences visually. In this recipe we'll see how we can use EXIF metadata to diff images in the repository.

Getting ready

We'll use the same repository as we did in the last example and either re-clone it or checkout the exif branch:

$ git clone https://github.com/dvaske/attributes_example.git
$ cd attributes_example
$ git checkout exif

How to do it...

In order to use the EXIF data while diffing binary files, we need to set up a filter to tell Git what to do when a file of *.jpg is to be diffed. EXIF data is metadata embedded in images and is often used by digital cameras to record timestamps, the size of an image, and so on.

We'll write the following line to .gitattributes:

*.jpg diff=exif-diff

This only tells Git that JPG files should use the exif-diff filter; we still need to set it up. To extract the EXIF metadata, there are different programs such as exiftool, jhead, and so on. In this example, we're using exiftool, so make sure you have it installed and available on your PATH. To set up the exiftool diff filter, we create the following Git config:

git config diff.exif-diff.textconv exiftool

From now on, every time jpg is to be diffed, you'll just see a comparison of exifdata. To see the actual change in the image, you still have to show the two images and visually compare them.

How it works…

Now that the filter is set up, we can try to check the output of it. The last two commits in the repository on the exif branch contain pictures that have had their size changed; let's see how they looks with the exif-diff filter. First, check log for the last two commits:

$ git log --name-status -2
commit 0beb82c65d8cd667e1ffe61860a42a106be3c1a6
Author: Aske Olsson <[email protected]>
Date:   Sat May 3 14:55:50 2014 +0200

    Changes sizes of images

M       europe_needles.jpg
M       hello_world.jpg
M       pic_credits.txt

commit a25d0defc70b9a1842463c1e9894a88dfb897cd8
Author: Aske Olsson <[email protected]>
Date:   Sun Apr 27 16:02:51 2014 +0200

    Adds pictures to repository

    Picture credits found in pic_credits.txt

M       README.md
A       europe_needles.jpg
A       hello_world.jpg
A       pic_credits.txt

Let's look at the diff between the two commits (the output you get might not match the following output 1:1 and depends on the exiftool version, OS, and so on. The following output is generated with exiftool 9.61 on OS X 10.9.3):

$ git diff HEAD^..HEAD
diff --git a/europe_needles.jpg b/europe_needles.jpg
index 7291028..44e98e3 100644
--- a/europe_needles.jpg
+++ b/europe_needles.jpg
@@ -1,11 +1,11 @@
 ExifTool Version Number         : 9.54
-File Name                       : Gnepvw_europe_needles.jpg
-Directory                       : /var/folders/3r/6f35b4t11rv2nmbrx5x32t4w0000gn/T
-File Size                       : 813 kB
+File Name                       : europe_needles.jpg
+Directory                       : .
+File Size                       : 328 kB
 File Modification Date/Time     : 2014:05:03 22:08:05+02:00
-File Access Date/Time           : 2014:05:03 22:08:05+02:00
+File Access Date/Time           : 2014:05:03 22:08:06+02:00
 File Inode Change Date/Time     : 2014:05:03 22:08:05+02:00
-File Permissions                : rw-------
+File Permissions                : rw-r--r--
 File Type                       : JPEG
 MIME Type                       : image/jpeg
 JFIF Version                    : 1.01
@@ -79,8 +79,8 @@ Sub Sec Time Original           : 00
 Sub Sec Time Digitized          : 00
 Flashpix Version                : 0100
 Color Space                     : sRGB
-Exif Image Width                : 1620
-Exif Image Height               : 1080
+Exif Image Width                : 1024
+Exif Image Height               : 683
...

There's more…

It is also possible to set up diffing of binary files for other types than images. As long as some useful data from the file type can be extracted, it's possible to create a custom diff filter for that file type. With the catdoc program, Microsoft Word files can, for example, be translated from the .doc format to plain text, which makes it easy to diff the text content in two files, but not their formatting.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset