As an avid reader of HAD I was intrigued by this post explaining how someone had broken MintEye’s audio based CAPTCHA. The image version of the CAPTCHA looked interesting and so I thought it might be fun to try and break it.
For those unfamiliar with MintEye, the image based CAPTCHAs look as follows:
You must adjust a slider to select the undistorted version of the image. Several somewhat naive approaches (in my opinion) were proposed in the HAD comments to solve this captcha, based on looking for straight lines. However, such solutions are likely to fall down for images containing few straight lines (e.g. the CAPTCHA above).
After a little thought (and unfruitful musings with optical flow) I found a good, robust and remarkably simple solution. Here it is:
import numpy as np
import matplotlib.pyplot as plt
if __name__ == '__main__':
for dir in range(1,14):
dir = str(dir)
total_images = len(os.listdir(dir))+1
points_sob = 
for i in range(1,total_images):
img = cv2.imread(dir+'/'+str(i)+'.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
sob = cv2.Sobel(gray, -1, 1, 1)
x = range(1,total_images)
res = np.argmin(points_sob)+1
plt.plot(res,points_sob[res-1], marker='o', color='r', ls='')
(Note the majority of the code is image loading and graph plotting. Automatically fetching the images and returning the answer is left as an exercise for the dirty spammers)
The theory is this: the more you ‘swirl’ up the image, the longer the edges in the image become. You can see this in the example above, but a simpler example is more obvious:
See how the length of the square box clearly increases? To exploit this, we want to sum the length of the edges in the picture. This simplest way of doing this to to take the derivative of the image (in the Python above, by using the Sobel operator) and sum the result (take a look at the Wikipedia article to see how Sobel picks out the edges). We then select the image with the lowest ‘sum of edges’ as the correct answer.
The results are excellent as you can see below. 100% of the 13 test CAPTCHAs I downloaded were successfully solved. The following graphs show image number on the x axis and ‘sum of edges’ on the y. The red dot is the selected answer:
An interesting feature is that the completely undistorted image is often a peak in the graphs. This means we usually select one image to the right or left of the correct image (which is still happily accepted as the correct answer by MintEye). This seems to be because the undistorted image is somewhere sharper than the distorted images and hence has sharper gradients resulting in larger derivative values. Obviously it would be trivial to do a local search for this peak, but it isn’t required to break this CAPTCHA.
In conclusion, it would seem this method of image based CAPTCHA is fundamentally flawed. A simple ‘swirl’ operation will always be detectable by this method, no matter the image being swirled. The increased sharpness also gives the game away – an FFT or autocorrelation could easily be used to detect this change in sharpness, just like autofocus algorithms.