Feb 20, 2017

Machine Learning in Cyber Security Domain - 5: Captcha Bypassing


Before we explain how captcha mechanism can be bypassed, we want to give you a brief introduction about what captcha mechanism is and how it works.
The main purpose of captcha mechanism is to provide secure authentication for users with asking some questions which are easy for human, however tough for bots. It is imperative to render the process of solving a captcha challenge as effortless as possible for legitimate users, while remaining robust against automated solvers. Thus, bots can not try to enter systems automatically.
Firstly created mechanism was using single image and want to enter numbers or characters which are located in this image to a textbox. Sample images are given in the right of the paragraph. Maybe you have already noticed that line noise in the images. There is a reason for the presence of those lines. Digital numbers or characters are detectable easily using image processing techniques. Therefore images which are used in captcha mechanism are transformed to more complex type in order to make it more difficult to break. First times, these transformations are done by adding noise to image. Such images are given below.

Nevertheless, these images are broken in time.  All of these type of images are crackable now with 100 percent success rate. Then of course, it is started to use more complex images in captcha mechanisms. Such images are given below. Cracking these images using image processing is harder than cracking pictures above.
As the complexity of the images increases, image processing techniques are developed, too. As improving system to get more secure authentication mechanism, cracking systems are also improving timely in order to crack these new systems.
ReCaptcha mechanism which is developed by Google is one of the well known system. With the developed new technique, this system took the captchas one step further. In the continuation of this chapter, the brief introduction of ReCaptcha working mechanism is given. After that part, the ways to crack the mechanism is also explained. There is a news about this topic which is published in April, 2016. If you want to read more detailed information about reCaptcha working mechanism and how it can be cracked, we recommend you to take a close look at this paper. In the section, we have utilized greatly from this work.
Recaptcha mechanism has two main verification module. First of them requires only single click from user for authentication process. In this module, system analyzes user’s cookies and browser characteristics which are located in their browser. In the analyzing part, confidence scores are calculated for every user. This score shows that request is originated from an honest user which is not suspicious or originated from a bot.  For high confidence scores, the user is only required to click within a checkbox. For lower scores, the user may be presented with a new challenges. In the second module, the user will have to deal with difficult questions which are based on image or text.
There is three type of challenges which are varying from user to user. These challenges are; (1) No captcha reCaptcha, (2) Image reCaptcha, (3) Text Based reCaptcha. No captcha reCaptcha is used in first module. In the second module, users which have low level confidence score are encountered two new type of challenges; Image reCaptcha and Text Based reCaptcha.

No captcha reCaptcha (Checkbox Captcha):
 The new user-friendly version is designed to remove the difficulty of solving captchas completely. Upon clicking the checkbox in the widget, if the advanced risk analysis system considers that the user have high reputation, the challenge will be consider solved and no action will be required from the user.
Image reCaptcha: This new version is built on identifying images with similar content. The challenge contains a sample image and 9 candidate images, and the user is requested to select those that are similar to the sample. The challenge usually contains a keyword describing the content of the images that the user is required to select. The number of correct images varies between 2 and 4.


Text reCaptcha: Examples for this type are given below. These distorted texts are returned when the advanced risk analyses consider the user having a lower reputation. (e) is fallback captcha which will be selected when the User-Agent fails certain browser checks, the widget automatically fetches and presents a challenge of this type, before the checkbox is clicked. Over the period of the following 6 months, text captchas appeared to be gradually “phased out”, with the image captcha now being the default type returned, as these captchas are harder for humans to solve despite being solvable by bots. TextBased reCaptchas can be cracked with nearly %100 accuracy rate using Deep Learning technique. ( If you want to read how it can be possible, take a look at this blog.)
(Note: Definition of captcha types are copied from referenced paper.)
If we remember, reCaptcha has two completely different module.  First of them works tracking cookies and checking browser characteristics, the latter one asks question which is based image or text to user.  Because there are two different modules operating in the system, captcha cracking methods can be applied in two different ways which correspond to each module. Shortly, two modules have been developed to crack captcha mechanism. First component is doing that creating artificial cookie and browser characteristics to mislead module one (checkbox captcha), so that it can influence the risk analysis process.  In the reference paper, it has been stated that creating cookie for 9 days is fair enough to bypass module one. Of course, creating cookie represents normal user activities and must be undetectable, so it must be created intelligently.


Text-based captcha is not widely used anymore. Image based captchas took text based captchas place. Because of this, second module has designed to crack only image based reCaptcha. Before we begin to explain how can we crack this module, take a quick look at the sample image and try to figure out what system want from user.
In the example, image which is given above, the question is “Select all wine below.”. A real user can easily understand what system wants, and select all pictures related to wine. Now the question is: “How automated system can do it?”. Processing an image for identifying objects and assigning semantic information to it, is considered a complex computer vision problem. To do this job, initially system has to understand the things which are desired. Keywords for the question can be identified with NLP(Natural Language Processing) or sample image can be processed by some tools to detect what is in it. Google has a tool for to do this work with name GRIS (Google Reverse Image Search). There are so much successful online tool for information retrieval from image. After detecting keywords, same work is applied to all question images. As a consequence, we have images with tags. Finally, sample image tags and question image tags are compared. Relevant question images are marked, other images are kept unmarked.
Success rate of this type of systems are strongly depend on image processing tools. As we said before, information retrieval from image is a difficult task in image processing problems.  Academic studies show that Deep Learning based approaches has significantly high accuracy rate for information retrieval problems. There are many tool which services online with the purpose of information retrieval using deep learning. There are also several free online services and libraries that offer relevant functionality, ranging from assigning tags (keywords) to providing free-form descriptions of images. Some example output from these tools are given in figure below. (If you want to try this tools, here are the links, GRIS, Alchemy, Clarifai, TDL, NeuralTalk, Caffe)
Brief information about the tools are given below;
GRIS has ability to conduct a search-based on an image. If the search is successful it may return a “best guess” description of the image. Alchemy is also built upon deep learning, and offers an API for image recognition. For each submitted image, the service returns a set of tags and a confidence score for each tag.  Claifai is built on the deconvolutional neural networks (so using deep learning), and returns a set of 20 tags describing the image. TDL has released as an app for demonstrating the image classification capabilities of their deep learning system. NeuralTalk is developed for generating free-form descriptions of an image’s contents using  a Recurrent Neural Network architecture. Caffe has been released as a deep learning framework, which we also leverage for processing images locally. Caffe returns a set of 10 labels; 5 with the highest confidence scores and 5 that are more specific as keywords but may have lower confidence scores.

As a consequence, with using explained methods above, referenced study has %70.78 successfully solving rate on image reCaptcha challenges doing this work automatically. And this system also applied to Facebook image captcha challenges, and %83,5 success rate has been achieved.