Binary files can be hard to diff, depending on the type of the file. Often, the only option is to load two instances of the program to show the files and check the differences visually. In this recipe we'll see how we can use EXIF metadata to diff images in the repository.
We'll use the same repository as we did in the last example and either re-clone it or checkout the exif
branch:
$ git clone https://github.com/dvaske/attributes_example.git $ cd attributes_example $ git checkout exif
In order to use the EXIF data while diffing binary files, we need to set up a filter to tell Git what to do when a file of *.jpg
is to be diffed. EXIF data is metadata embedded in images and is often used by digital cameras to record timestamps, the size of an image, and so on.
We'll write the following line to .gitattributes
:
*.jpg diff=exif-diff
This only tells Git that JPG files should use the exif-diff
filter; we still need to set it up. To extract the EXIF metadata, there are different programs such as exiftool
, jhead
, and so on. In this example, we're using exiftool
, so make sure you have it installed and available on your PATH
. To set up the
exiftool
diff filter, we create the following Git config
:
git config diff.exif-diff.textconv exiftool
From now on, every time jpg
is to be diffed, you'll just see a comparison of exifdata
. To see the actual change in the image, you still have to show the two images and visually compare them.
Now that the filter is set up, we can try to check the output of it. The last two commits in the repository on the exif
branch contain pictures that have had their size changed; let's see how they looks with the exif-diff
filter. First, check log
for the last two commits:
$ git log --name-status -2 commit 0beb82c65d8cd667e1ffe61860a42a106be3c1a6 Author: Aske Olsson <[email protected]> Date: Sat May 3 14:55:50 2014 +0200 Changes sizes of images M europe_needles.jpg M hello_world.jpg M pic_credits.txt commit a25d0defc70b9a1842463c1e9894a88dfb897cd8 Author: Aske Olsson <[email protected]> Date: Sun Apr 27 16:02:51 2014 +0200 Adds pictures to repository Picture credits found in pic_credits.txt M README.md A europe_needles.jpg A hello_world.jpg A pic_credits.txt
Let's look at the diff between the two commits (the output you get might not match the following output 1:1 and depends on the exiftool
version, OS, and so on. The following output is generated with exiftool 9.61
on OS X 10.9.3
):
$ git diff HEAD^..HEAD diff --git a/europe_needles.jpg b/europe_needles.jpg index 7291028..44e98e3 100644 --- a/europe_needles.jpg +++ b/europe_needles.jpg @@ -1,11 +1,11 @@ ExifTool Version Number : 9.54 -File Name : Gnepvw_europe_needles.jpg -Directory : /var/folders/3r/6f35b4t11rv2nmbrx5x32t4w0000gn/T -File Size : 813 kB +File Name : europe_needles.jpg +Directory : . +File Size : 328 kB File Modification Date/Time : 2014:05:03 22:08:05+02:00 -File Access Date/Time : 2014:05:03 22:08:05+02:00 +File Access Date/Time : 2014:05:03 22:08:06+02:00 File Inode Change Date/Time : 2014:05:03 22:08:05+02:00 -File Permissions : rw------- +File Permissions : rw-r--r-- File Type : JPEG MIME Type : image/jpeg JFIF Version : 1.01 @@ -79,8 +79,8 @@ Sub Sec Time Original : 00 Sub Sec Time Digitized : 00 Flashpix Version : 0100 Color Space : sRGB -Exif Image Width : 1620 -Exif Image Height : 1080 +Exif Image Width : 1024 +Exif Image Height : 683 ...
It is also possible to set up diffing of binary files for other types than images. As long as some useful data from the file type can be extracted, it's possible to create a custom diff filter for that file type. With the catdoc
program, Microsoft Word files can, for example, be translated from the .doc
format to plain text, which makes it easy to diff the text content in two files, but not their formatting.