Breaking the MintEye image CAPTCHA in 34 lines of Python

Several people have suggested that importing OpenCV is cheating and as such the claim that MintEye was broken in 23 lines of Python is disingenuous. So, here’s a solution in 34 lines of Python:

import sys
import os
import matplotlib.pyplot as plt
import math
from PIL import Image

for dir in range(1,14):
    dir = str(dir)

    total_images = len(os.listdir(dir))+1

    points_sob = []
    for i in xrange(1,total_images):
        image ='/'+str(i)+'.jpg')
        im = image.load()

        #convert to grayscale (ITU-R 601-2 luma transform)
        grey = [[None for _ in range(image.size[1])] for _ in range(image.size[0])]
        for x in xrange(image.size[0]):
            for y in xrange(image.size[1]):
                grey[x][y] = im[x,y][0]*299/1000 + im[x,y][1]*587/1000 + im[x,y][2]*114/1000

        sum_of_sob = 0
        for x in xrange(1,image.size[0]-1):
            for y in xrange(1,image.size[1]-1):
                gx = -grey[x-1][y-1] + grey[x+1][y-1] - 2*grey[x-1][y] + 2*grey[x+1][y] - grey[x-1][y+1] + grey[x+1][y+1]
                gy = -grey[x-1][y-1] - 2*grey[x][y-1] - grey[x+1][y-1] + grey[x-1][y+1] + 2*grey[x][y+1] + grey[x+1][y+1]
                sum_of_sob = sum_of_sob + math.sqrt(gx*gx + gy*gy)

        print sum_of_sob

    res = points_sob.index(min(points_sob)) + 1
    x = xrange(1,total_images)
    plt.plot(res,points_sob[res-1], marker='o', color='r', ls='')
    plt.plot(x, points_sob)


The point I’m trying to make is that Sobel is a very simple operator – that’s why it was created, as a crude approximation of the derivative of a 2D signal. In fact, the only non-trivial maths imported from OpenCV previously are the DCTs to decode the JPEGs.

Breaking the MintEye image CAPTCHA in 23 lines of Python

As an avid reader of HAD I was intrigued by this post explaining how someone had broken MintEye’s audio based CAPTCHA.  The image version of the CAPTCHA looked interesting and so I thought it might be fun to try and break it.

For those unfamiliar with MintEye, the image based CAPTCHAs look as follows:

You must adjust a slider to select the undistorted version of the image.  Several somewhat naive approaches (in my opinion) were proposed in the HAD comments to solve this captcha, based on looking for straight lines.  However, such solutions are likely to fall down for images containing few straight lines (e.g. the CAPTCHA above).

After a little thought (and unfruitful musings with optical flow) I found a good, robust and remarkably simple solution. Here it is:

import cv2
import sys
import numpy as np
import os
import matplotlib.pyplot as plt

if __name__ == '__main__':

    for dir in range(1,14):
        dir = str(dir)

        total_images = len(os.listdir(dir))+1
        points_sob = []

        for i in range(1,total_images):
            img = cv2.imread(dir+'/'+str(i)+'.jpg')
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

            sob = cv2.Sobel(gray, -1, 1, 1)

        x = range(1,total_images)
        res = np.argmin(points_sob)+1
        print res
        plt.plot(res,points_sob[res-1], marker='o', color='r', ls='')
        plt.plot(x, points_sob)


(Note the majority of the code is image loading and graph plotting. Automatically fetching the images and returning the answer is left as an exercise for the dirty spammers)

The theory is this: the more you ‘swirl’ up the image, the longer the edges in the image become. You can see this in the example above, but a simpler example is more obvious:

See how the length of the square box clearly increases? To exploit this, we want to sum the length of the edges in the picture.  This simplest way of doing this to to take the derivative of the image (in the Python above, by using the Sobel operator) and sum the result (take a look at the Wikipedia article to see how Sobel picks out the edges).  We then select the image with the lowest ‘sum of edges’ as the correct answer.

The results are excellent  as you can see below.  100% of the 13 test CAPTCHAs I downloaded were successfully solved.  The following graphs show image number on the x axis and ‘sum of edges’ on the y.  The red dot is the selected answer:

An interesting feature is that the completely undistorted image is often a peak in the graphs. This means we usually select one image to the right or left of the correct image (which is still happily accepted as the correct answer by MintEye).  This seems to be because the undistorted image is somewhere sharper than the distorted images and hence has sharper gradients resulting in larger derivative values.  Obviously it would be trivial to do a local search for this peak, but it isn’t required to break this CAPTCHA.

In conclusion, it would seem this method of image based CAPTCHA is fundamentally flawed.  A simple ‘swirl’ operation will always be detectable by this method, no matter the image being swirled.  The increased sharpness also gives the game away – an FFT or autocorrelation could easily be used to detect this change in sharpness, just like autofocus algorithms.