While investigating some test failures on a project involving ImageMagick (a parser for the metadata that ImageMagick extracts out of documents), I noticed different versions of ImageMagick were producing outputs in different formats. Not only the formats were differing but also some of the data was missing, or the values were different.

I only knew about two versions that were affected:

  • 6.7.7-10 2014-03-06 Q16
  • 6.9.2-4 Q16 x86_64 2015-11-27

On the newer one I got failures, so I quickly solved them. But I went a bit further, because I was interested to know what other versions in between these two were failing any tests.

I found the ImageMagick git repository for Debian. I cloned it, and looked at the branches. There were multiple branches, but the ones I was interested in were origin/debian/*.

Next I wrote a bash script to build all versions between 6.7.7.9-1 all the way up to 6.9.2.3-1.

#!/bin/bash
GIT_REPO=https://anonscm.debian.org/git/collab-maint/imagemagick.git
GIT_BRANCHES=()
GIT_BRANCHES+=('remotes/origin/debian/6.7.7.9-1')
# ...
GIT_BRANCHES+=('remotes/origin/debian/6.9.2.3-1')

# Clone the debian repository
git clone $GIT_REPO imagemagick

# We're only interested in the identify binary so we aim
# at making a custom build for each branch. We're not
# building any documentation, we're also not going to let
# the IM build run any tests, we only want the binaries.
for BRANCH in "${GIT_BRANCHES[@]}"; do
    BRANCH_VER=$(echo "$BRANCH" | perl -ne 'm{(\d+(?:\.\d+)+.*)$} && print "$1"')
    IM_VER=../custom-build/im-$BRANCH_VER/
    pushd .
    cd imagemagick
    # remove any unversioned files created by the previous build
    git clean -dfx .
    # undo any changes to versioned files
    git reset --hard HEAD
    # checkout a new branch for the build
    git checkout $BRANCH
    # building a minimal imagemagick for testing purposes
    autoreconf -i
    ./configure --without-perl --without-magick-plus-plus --disable-docs --prefix="$PWD/$IM_VER"
    # speed up the build with 4 parallel jobs
    make -j4
    mkdir -p "$IM_VER"
    make install
    popd
done

It took one hour to build 57 different 1 versions of ImageMagick, resulting in a directory (4.2 GB) with each version's binaries.

Now I wanted to know on which versions of ImageMagick, the unit-tests were failing. So I wrote another bash script to run the tests in parallel for each version of ImageMagick. To parallelize the unit tests I've only made use of xargs -P which I read about some time ago 2 (there are other ways to do that too, but I found this one to be the easiest in this situation).

#!/bin/bash
BUILD_PATHS=$(find custom-build/ -maxdepth 1 -mindepth 1)
OUT_DIR=$(mktemp -d)
for P in $BUILD_PATHS; do
    # extract IM version from the directory
    DIR_VERSION=$(echo "$P" | perl -ne 'm{(\d+(?:\.\d+)+[^\/]*)\/?} && print "$1"')
    # build output path
    OUT_FILE="$OUT_DIR/$DIR_VERSION"'.txt'
    # run the tests using a custom built version of imagemagick
    # (prepend custom built path so it can be used by the tests)
    NEW_PATH="$PWD/$P/bin:$PATH"
    CMD="PATH=\"" "$NEW_PATH" "\" ./run-tests.sh >$OUT_FILE 2>&1;"
    echo "$CMD"
done | xargs -I% -P 6 /bin/bash -c '%';

echo "$OUT_DIR"
echo "Test reports with failures:"
grep -lir FAIL "$OUT_DIR"

It took under 33 seconds to run through all the versions, and I got this result

Test reports with failures:
/tmp/tmp.8f4zL7ImPd/6.8.0.7-1.txt
/tmp/tmp.8f4zL7ImPd/6.8.1.5-1.txt
/tmp/tmp.8f4zL7ImPd/6.8.1.0-1.txt
/tmp/tmp.8f4zL7ImPd/6.8.0.10-1.txt

Now I was down to 4 ImageMagick versions that I had to look into. On looking further into those failed tests, they were testing properties of a tree built from looking at PDF metadata. For those versions, IM only outputs data for the first page of the PDF, not for each page of the PDF (as is the case with most other versions) so this is the reason why they were failing.

I then used unittest.skipIf to skip those tests if the IM version used was one of the four ones listed above.

This made for an interesting analysis, and I was happy to learn more about the way IM works.

Footnotes:

1

There's a more complicated scenario in which all dependencies (libraries IM depends on) would taken into consideration, not just IM's source code. The assumption here is that IM alone is responsible for the output of the identify utility. And that's a fair assumption, at least for the format of the output.