Professional Documents
Culture Documents
Research On The Security of Microsofts Two-Layer Captcha
Research On The Security of Microsofts Two-Layer Captcha
Abstract— Captcha is a security mechanism designed to recognize English letters and Arabic numerals, thus text-based
differentiate between computers and humans, and is used to Captcha has few localization issues; at last, text-based Captcha
defend against malicious bot programs. Text-based Captchas are was the earliest form of Captchas and people are more willing
the most widely deployed differentiation mechanism, and almost
all text-based Captchas are single layered. Numerous successful to undertake this challenge compared to other Captcha forms.
attacks on the single-layer text-based Captchas deployed by Security is the most significant concern for text-based
Google, Yahoo!, and Amazon have been reported. In 2015, Captchas. There have been numerous examples of Captcha
Microsoft deployed a new two-layer Captcha scheme. This failure, and many attacks have been proposed, noted
appears to be the first application of two-layer Captchas. It is, in [3], [7], [8]. Researchers have recently claimed that their
therefore, natural to ask a fundamental question: is the two-
layer Captcha as secure as its designers expected? Intrigued simple generic attacks have broken a wide range of text-
by this question, we have for the first time systematically based Captchas in a single step [9], [10]. Although many
analyzed the security of the two-layer Captcha in this paper. text-based Captchas have been proven insecure, the most
We propose a simple but an effective method to attack the two- recent studies [11], [12] suggest that Captcha is still a useful
layer Captcha deployed by Microsoft, and achieve a success rate security mechanism, and the security of text-based Captchas
of 44.6% with an average speed of 9.05 s on a standard desktop
computer (with a 3.3-GHz Intel Core i3 CPU and 2-GB RAM), is currently a hot topic in the academic field.
thus demonstrating clear security issues. We also discuss the With each failed Captcha scheme, Captcha designers accu-
originality and applicability of our attack, and offer guidelines mulated experience. Then they designed better schemes,
for designing Captchas with better security and usability. which aimed to improve both usability and security. These
Index Terms— Captcha, security, text-based, two-layer. prior attempts also promoted a scientific understanding of
Captcha’s robustness. As [6] suggested, the robustness of
I. I NTRODUCTION text-based Captchas should rely on the difficulty of finding
where each character is (segmentation), rather than what
C APTCHA (Completely Automated Public Turing Test
to Tell Computers and Humans Apart) is used to
prevent automated registration, spam or malicious bot
each character is (recognition). Current machine learning
algorithms, such as Neural Networks or K-Nearest Neigh-
programs [1], [2]. It automatically generates and evaluates a bors (KNN), can easily recognize distorted or rotated single
test, difficult for computers to solve, but easy for humans. characters correctly [6]. Therefore, text-based Captchas should
If the success rate of solving a Captcha for humans reaches be segmentation-resistant, and this principle has become the
90% or higher, and computer programs only achieve a success basis for designing text-based Captchas.
rate of less than 1%, the Captcha can be considered secure [3]. Based on these findings, Microsoft deployed a two-layer
Current Captchas can be divided into three categories: text- Captcha in 2015. It connected the sides of each character,
based, image-based and audio-based. Text-based Captcha is across both top-to-bottom and right-to-left, in order to increase
usually based on English letters and Arabic numerals, and the difficulty of detecting where each character is. This was
uses sophisticated distortion, rotation or noise interference to the first application of two-layer Captchas, and it appeared
prevent the recognition of a machine. Compared to the latter more secure than previous schemes. All attacks proposed
two Captcha categories, text-based Captcha is the most widely were unsuccessful, including these powerful generic methods
used scheme [4]. This wide-spread usage is due to its obvious proposed in [10] and [11]. It is therefore natural to ask a
advantages [5], [6]: users’ task is a text recognition problem fundamental question: is the two-layer Captcha as secure as its
which is intuitive to users worldwide; globally, people can designers expected? This open question precipitated our study.
In this paper, we examine the security of the two-layer
Manuscript received May 11, 2016; revised October 11, 2016, Captcha. We attack this scheme by a series of processing
December 13, 2016, and February 7, 2017; accepted March 1, 2017. Date steps. First, a novel two-dimensional segmentation approach
of publication March 15, 2017; date of current version April 13, 2017. This
work was supported in part by the National Natural Science Foundation of is proposed to separate a Captcha image along both vertical
China under Grant 61472311 and in part by the Fundamental Research Funds and horizontal directions, which helps create many single
for the Central Universities. The associate editor coordinating the review characters and is unlike traditional segmentation techniques.
of this manuscript and approving it for publication was Dr. Karen Renaud.
(Corresponding author: Haichang Gao.) Second, we improved LeNet-5 [13], a radical Convolutional
The authors are with the Institute of Software Engineering, Xidian Univer- Neural Network (CNN) model, as our recognition engine.
sity, Xi’an 710071, China (e-mail: hchgao@xidian.edu.cn). Our CNN model, as compared with the traditional template-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. based KNN, shows improved performance and effectiveness.
Digital Object Identifier 10.1109/TIFS.2017.2682704 Third, a Captcha generating imitator was designed which is
1556-6013 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1672 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
able to automatically produce Captchas similar to the real- segmentation techniques for attacking a number of text-based
world ones deployed by Microsoft. We adopt it to generate Captchas, which included the earlier mechanisms designed and
a significant amount of data for our training sets to train the deployed by Microsoft, Yahoo! and Google.
CNN. Then, the trained CNN is used to recognize the Captchas Elie Bursztein et al. [3] carried out a systematic study of
deployed in the wild. Our attack has achieved a success rate of existing visual Captchas based on distorted characters. They
44.6% with an average speed of 9.05 seconds on a standard found that 13 of the 15 current Captcha schemes from popular
desktop computer (with a 3.3 GHz Intel Core i3 CPU and websites were vulnerable to automated attacks, but Google and
2GB RAM). Judging by both criteria commonly used in the reCAPTCHA resisted their attack attempts.
Captcha community [3], [5], we have successfully broken the In 2013, Gao et al. [19] proposed a generic method to
two-layer Captcha deployed by Microsoft. break a family of hollow Captchas, and this is the first
This appears to be the first systematic analysis of application of solving Captchas in a single step. They used the
two-layer Captchas. Our two-dimensional segmentation tech- characteristics of hollow fonts to extract character strokes, and
nique is innovative; it can be applied as a basis for other then tested different combinations of adjacent character strokes
successful attacks. Additionally, our data preparation method, to form individual characters. They tested their method on five
imitating the Captcha generation process to build training sets, hollow Captchas, and the success rates they achieved ranged
and using it to train the recognition engine to recognize the from 36% to 89%.
objects in the real world, is the first attempt in the field of Elie Bursztein et al. [9] reported another attack that solved
analyzing Captchas’ robustness. Our experiments demonstrate Captchas in a single step by addressing the segmentation and
that it decreases time spent on data preparation and reduces the recognition problems simultaneously. It analyzed all pos-
manual labor. Also, this approach can be adapted for other sible ways of segmenting a Captcha, and then used ensemble
projects requiring a large amount of data to train machine learning to identify among each sequence of segments the best
learning systems, and for people working in Captcha design, possible one as the result. This attack had broken a wide range
our work provides guidelines for designing Captchas with of text-based Captchas with success rates ranging between
better security and usability. 5.33% and 55.22%.
The remainder of the paper is organized into the following In 2015, Karthik et al. [20] proposed two methods to
sections. A review of related work is provided in Section II. automatically classify Microsoft Captcha samples. One was
Then, Section III introduces and analyzes Microsoft’s based on a fine-grain segmentation combined with template
two-layer Captcha. Following that, the whole attack process is matching, and the second was based on CNN and used
presented in Section IV, and Section V details the experiment state-of-the-art recognition techniques. The former achieved a
process and presents attack results. In Section VI, we talk success of 5.56%, and the latter achieved a success of 57.05%.
about other design choices, the applicability and novelty of this More recently at NDSS’16, Gao et al. [10] reported a simple
work, and also provide guidelines for designing Captchas with generic attack that was the first to use Gabor filters to extract
better security and usability. Finally, Section VII concludes the character components along four different directions. It is the
paper with a discussion of the implications of our work. most recent research on the analysis of Captcha robustness,
and is effective for many text-based Captchas, including
II. R ELATED W ORK character isolated Captchas, hollow character Captchas, and
The Automated Turing Tests were first proposed by Crowding Characters Together (CCT) Captchas. The success
Naor [14], but they did not provide a formal definition. rates they received ranged from 5% to 77%.
Lillibridge et al. [15] developed the first practical Automated There has been a lot of fine research done on text-based
Turing Test to prevent bots from automatically registering web Captchas prior to this paper. However, two-layer Captchas
pages. This system was effective for a while and then was have never before been discussed, and they are distinctly
defeated by common Optical Character Recognition (OCR) different from the other text-based Captchas discussed to date.
technology.
Text-based Captcha based on English letters and Arabic
III. T WO -L AYER C APTCHAS
numerals, is still the most widely deployed scheme. Many
research communities focus on developing attack approaches According to font styles and positional relationships
for existing text-based Captchas, and then explore guidelines between adjacent characters, most of the current text-based
for better designs. Captchas can be classified into three categories: charac-
In [16], Mori and Malik used sophisticated object recogni- ter isolated Captcha, hollow character Captcha and CCT
tion algorithms to break Gimpy and EZ-Gimpy with a success Captcha [10] (see Figures 1(a), (b), and (c) respectively). Some
rate of 33% and 92% respectively. In [17], Simard broke a Captchas use a combination of the above-mentioned styles,
number of Captchas with a success rate ranging from 4.89% while others also use some noise arcs or a complex background
to 66.2%. Other attack efforts also include the PWNmad [18]. to enhance their security (Figures 1(d) and (e)). No matter
In 2006, Yan and El Ahmad [7] broke most visual schemes what form of Captcha, there were no two-layer Captchas until
provided at Captchaservice.org by simply counting the number Microsoft first deployed one.
of pixels of each segmented character, even though these The original Captcha used by Microsoft was the character
schemes are resistant to the best OCR software on the market. isolated Captcha (See Figure 2(a)). This character isolated
In 2008, Yan and El Ahmad [8] developed new character scheme had been used for many years before the two-layer
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1673
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1674 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1675
Fig. 7. Process of separating the solid characters and filled hollow characters.
Fig. 6. Noise removal: (a) Edge pixels and components pixels in noise
components, (b) Pixels within the bounding box but belong to light-gray
background or other color components in character component and noise
component, (c) Image after CFS. Fig. 8. The generation of envelope lines and borderline: (a) Two curves and
inflexions, (b) Connected inflexions, (c) Envelope lines, (d) Borderline.
Due to the various shapes and thin strokes of hollow English
letters and Arabic numerals, their color components always recover it by dilation. Finally, by removing the solid characters
have a relatively larger C and a smaller S. Noise components, part from the image received by CFS (Figure 6(c)), the filled
however, are usually simple with a plump shape. Besides, the hollow characters part is found, as shown in Figure 7. The
surroundings of character components always have a large outputs, the filled hollow characters part and solid characters
section of pixels belonging to a light-gray background or part, marked with red rectangles in Figure 7, are the inputs of
other components, whereas the remnant noise components are next process.
surrounded by character strokes and with little (or no) such It may seem that this step can be skipped and that we could
pixels (see Figure 6(a) and (b)). Therefore, with properly consider these filled hollow characters as solid ones equally.
chosen threshold values a and b, we have the following: However, it is necessary in our attack. This process, as the
if δ < a or θ < b for a component, we consider it as noise and first step of our two-dimensional segmentation algorithm,
remove it. The values a and b are determined by analyzing a helps to receive as many individual characters as possible.
small sample set of data. Reference [6] suggested that the challenge of breaking a
After deleting all of the noise components, the color of the Captcha depends on the difficulty of finding where each
character components is changed to black, and Figure 6(c) character is. Thus we add this step to our attack in order to
shows the final result of CFS. The two major contributions of increase segmentation accuracy.
CFS are: first, it reduces the burden on the recognition engine 4) Separating Upper and Lower Layers: The two-layer
since the diversity of characters significantly decreases the structure is used as a defense measure to disguise where a
effectiveness of the recognition engine; second, it also avoids single character is, and it is necessary to separate a two-layer
failed cases that possibly occur in later extraction of character challenge image into two one-layer images. We have found
components, where we adopted the state-of-the-art method that the borderline of upper layer and lower layer ranges
in [10] to extract components from connected characters based across challenges, but it waves along the variation trend of
on Gabor filters. [10] noted that they received the lowest the envelope lines of a challenge. Based on this characteristic
success rate on a hollow scheme deployed by Yahoo!. This of Microsoft’s two-layer Captcha, we start by finding the top
extraction method breaks a long hollow text string into a large and bottom envelope lines for each challenge image, and then
number of tiny components, and further produces a huge set of generate the middle line of these two envelope lines as the
possible combinations during partition and recognition, both borderline of a challenge image’s upper and lower layers.
decreasing the success rate and slowing down the attack speed. Then, we apply the borderline to the filled hollow characters
Learning from this prior experience, we have converted hollow and the solid characters parts, and finally separate the upper
characters into solid characters in advance. and lower layers.
3) Separating Filled Hollow Characters From Solid Char- Generating these two envelope lines and the borderline is
acters: This step is considered to achieve the highest number the key step of this process and works as follows:
of complete individual characters. It makes use of the charac- i) Generate two curves by following the top pixels and
teristics of hollow characters formed by thin contour lines that bottom pixels of the challenge image (as the red and green
can be easily dislodged. First, we erode the binarized image pixels shown in Figure 8(a)).
(Figure 6(c)) to remove the hollow characters part. The solid ii) Find all inflexion points of the curves (those pixels
characters part is also damaged during this step, but we simply marked with green in Figure 8(a)). Note that inflexion
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1676 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
points include both crest and trough pixels. We define 5) Rotation: For the Microsoft’s Captcha whose characters
crest pixels as those pixels whose left curve pixel’s are obviously rotated and the rotation angle of each character
y-ordinate is equal to or larger, while the right curve is different, we rotate the Captcha in pre-processing and make
pixel’s y-ordinate is larger. Similarly, trough pixels are the characters upright. The three lines, two envelope lines and
those pixels whose left curve pixel’s y-ordinate is equal the borderline that we found in previous step, also plays an
to or smaller, while the right curve pixel’s y-ordinate is important role in this step. First, we copy the two upper lines
smaller. to the two upper layer images found from the prior step, and
iii) Smooth curves and generate envelope lines. We first copy the two lower lines to the two lower layer images. Then
simply define the slope α of two inflexion points as use the guide line projection technique to find the most likely
follows: rotation angle [27]. Finally, rotate the image using this rotation
|y1 − y2 | angle (see Figure 9).
α= (1) Thus a challenge image has been divided into four images
|x 1 − x 2 |
with all characters rotated: upper solid characters part, lower
where (x 1 , y1 ) and (x 2 , y2 ) are the coordinates of these
solid characters part, upper filled hollow characters part, and
two inflexion points. Then we examine the α value of
lower filled hollow characters part.
each pair of inflexion points. With a properly chosen
threshold t, this works as follows: if α > t and these two
B. Extracting Character Components
inflexions both are crest pixels or trough pixels, connect
them with a smooth curve directly (as the case of inflexion After pre-processing, we have many separated individual
points 3 and 7 in Figure 8(a)); if α > t but one of these characters, e.g. ‘V’ in the example image. For images like
two inflexion points is crest pixel, and another is trough Figure 2(b), where hollow characters and solid characters
pixel, connect the current inflexion point and the latter are alternatively located, we can successfully separate each
inflexion point’s next inflexion point with a smooth curve individual character only via our two-dimensional segmenta-
(as the case of inflexion points 1 and 2 in Figure 8(a)). tion approach. We then utilize CNN engine to recognize the
The whole sub-step process is iterated twice - one for separated single characters, and for other images containing
the top curve and another for the bottom curve. Those connected characters, we adopt the method presented in [10]
connected curves are marked with blue in Figure 8(b). to break them, similar to solving a simple single-layer Captcha
Figure 8(c) depicts the final envelope lines which are with three connected characters at most. Note that, we have
marked with red. Those connected curves are converted improved the approach in [10] with regard to many details,
to red as a part of the envelope lines. If an envelope especially the graph search algorithm and utilizing CNN to
line’s end doesn’t connect with the edge of the image, replace the template-based KNN.
we extended it to the edge. First, the approach uses Log-Gabor filters [28] to extract
iv) Create the borderline. We find the middle line between connected characters information along four directions by
two envelop lines and use it as the borderline of the convolving a Captcha image with each of the filters. Then,
upper layer and lower layer, as shown by the green line it binarizes the resulting images to create black and white
in Figure 8(d). character components. Table I shows the extraction results
We create the borderline of the upper and lower layers, along each of the four directions for each part of the example
and apply it to the filled hollow characters and the solid Captcha. This step only applies to those with connected
characters parts, and then separate the upper and lower layers characters, as in the example image: the upper solid characters
of them(see Figure 9). Note that if solid characters part or and the lower filled hollow characters. Extracted components
hollow characters part only appears in either upper layer or are stored in four separate images of the same dimension.
lower layer in a challenge, it is possible to generate a blank We leverage CFS [8] to collect these components from each
image without characters, as the case of our example image. image. Then sort them according to the coordinates (x, y),
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1677
TABLE I
E XTRACTION R ESULTS
Fig. 10. The updated equivalent graphs of our example: (a) The final
equivalent graph of the upper solid characters part; (b) The final equivalent
graph of the lower filled hollow characters part.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1678 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1679
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1680 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
TABLE IV
S UCCESS R ATES VS . R ECOGNITION E NGINES
Fig. 13. A modified LeNet-5. this similarity value. In order to decrease the importance of
matching background pixels in decision making, it assigns a
larger weight to similar black pixels, but a smaller weight to
convolutional layer and each subsampling layer contains similar white pixels. If two corresponding pixels do not match,
32 feature maps. Every convolutional layer has a convolution a negative value will be added to the similarity calculation.
filter of 4×4, and the slide step of each subsampling layer In essence, the KNN used in [10] is based on template
is 2. matching.
For the attempt using KNN, we mine 300 random Captchas
C. Results from Microsoft’s website as the training set to use as tem-
We have implemented all the algorithms, attacks and the plates, and use the same test for CNN as mentioned before.
Captcha generating imitator in C# on Microsoft Visual Studio. The results achieved by KNN are compared with CNN in
The modified LeNet-5 is developed on Caffe [29], a deep Table IV. It shows that KNN only receives a success rate
learning framework created by a UC Berkeley team. We train of 13.4%, much lower than the success rate finally achieved
LeNet-5 network with a GPU (NVIDIA TITAN X GPU, 32GB by our CNN engine. But the average attack speed of CNN is a
RAM) which significantly reduced the training time, and tested little slower than KNN (9.05 seconds vs. 8.39 seconds). Note
our attack on a desktop computer with a 3.3 GHz Intel Core that, we also consider using a larger training set for KNN,
i3 CPU and 2GB RAM. however, this causes the attack speed to drop off sharply.
1) Success Rate: Our attack has achieved a success To compare the effectiveness of CNN and KNN in more
rate of 44.6% on the test set. Reference [5] proposed a detail, we conduct another group of experiments to evaluate
commonly accepted standard for Captchas robustness in the effects of the size of training sets on recognition engines’
preventing automated attacks from achieving higher than speed and recognition results. We utilize five training sets
0.01% success, but it was considered too ambitious by including 100, 200, 300, 400 and 500 random Captcha images
some researchers. Reference [3] suggested that a Captchas generated by our Captcha generating imitator respectively, and
scheme is broken if an automated attack achieves a suc- 500 random Captchas mined from Microsoft’s website are
cess rate higher than 1%. According to either criterion, the used as the test set. The sample images extracted from the
high success rate we have received clearly indicates that training sets are used for making templates for KNN and for
our attack has broken the two-layer Captcha deployed by training the CNN model. We not only test the size of training
Microsoft. Therefore, we can definitely announce that the sets’ effect on full Captcha images’ success rate and average
two-layer Captcha is not as robust as its designers expected. attack speed, but also verify their influence on recognizing
Our results also prove that the novel data preparation approach well separated individual character images extracted from the
we proposed, with mass data automatically generated by our test set manually.
Captcha generation imitator to train the deep learning system Table V lists the recognition rates and recognition speeds
then use it to recognize real-word objects, is feasible and received by recognizing individual character images and the
effective. success rates and average attack speeds of full Captcha images
2) Attack Speed: The average attack speed of our attack with different sized training sets. It shows that both the
on the test set is 9.05 seconds, demonstrating its efficiency. recognition rates of CNN and KNN improve with the increase
The usability requirement demands each human user to solve size of training sets. Their corresponding success rates on
a Captcha in less than 30 seconds. Some Captchas deployed full Captcha images are also increased. However, as noted
in the wild reported an average solving speed of more than previously, the average recognition speed of KNN drops off
46 seconds [30]. sharply with the enlargement of training sets, whereas CNN’s
speed remains relatively stable. Using the same training set,
the recognition rates and success rates of CNN is much higher
VI. D ISCUSSION
than KNN, which may be the major factor in understanding
A. Other Design Choices the gap between the final success rates received by CNN and
1) Recognition Engines: Before we chose CNN as our KNN. It seems that KNN can utilize a larger training set to
recognition engine, we tested the KNN reported in [10] as improve the recognition rate, but the decrease in speed is also
a candidate. The KNN used in [10] works by measuring the an issue. Therefore, the effectiveness of KNN is limited.
similarity between corresponding pixels of two images. The Even though both experiments use a training set of 300,
confidence level of a recognition result is also derived from the success rate achieved by KNN in Table V is a little lower
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1681
TABLE V
T HE S IZE OF T RAINING S ETS E FFECT ON R ECOGNITION E NGINES
than listed in Table IV (9.8% vs. 13.4%). This is because the TABLE VI
former uses Captcha images generated by our imitator as the ATTACK S PEEDS VS . G RAPH S EARCH A LGORITHMS
training set, whereas the latter utilizes real-world Captchas.
According to our experience, the more similar the training
and test images, the better the recognition results will be. This
comparison also indicates that our final attack success rate can
be improved if real-world Captchas are used for training.
Of course, these results do not mean that CNN is better
than KNN in all aspects. CNN requires a large amount of
be more likely for DP [10] to find the path with the most
data to train, and this training process generally relies on
steps as the final result, even though it is not the case where
advanced hardware support (e.g. GPU), whereas KNN has a
the Captcha image has the largest Captcha string length.
lesser requirement.
Our algorithm has improved on [10] to avoid this problem.
2) Graph Search Algorithms: Several other graph search However, the speed of DP [10] is close to the speed we
algorithms are tried in order to find the most likely results, achieved, as their search space is same.
and are compared with the algorithm proposed in Section IV. We compare the average attack speed of our DP algorithm,
a) Depth-first search (DFS): was used by Gao at al. DFS, IP and DP [10] in Table VI and find that our DP
in [19]. It starts from node 1 in the graph and explores algorithm (marked with red in Table VI) is optimal.
each branch until the path length reaches the Captcha string
length before backtracking. DFS finds all paths whose steps
are within the range of the Captcha string length, and then a B. The Latest Generation of Microsoft’s Two-Layer Captcha
path ending at node n+1 with the largest confidence level sum The two-layer Captcha that we analyzed previously is the
is chosen. DFS is not optimal, since it often explores paths that first generation deployed by Microsoft in January, 2015. There
can’t reach the node n+1, and re-explores previously visited are now several newer generations and here we have chosen
nodes even though the path that it is aiming to find has already the latest one to analyze its robustness.
been discovered. We collected the latest generation of Microsoft’s two-layer
b) Integer partition algorithm (IP): is another graph Captcha from its account creation page in November, 2016.
search algorithm we considered. It works by finding the most As the first image in Figure 14 shows, the Captcha has been
likely way of forming m characters using n character compo- improved: all characters are hollow; each layer contains a
nents, where m is the Captcha string length. All combinations random number of characters (3 to 6 characters in each layer
were recorded, and the one with the largest confidence level and 6 to 12 characters in total); and it uses 45 character
sum was selected. This algorithm reduces the search space categories, far more than the 23 categories used in the first
compared with DFS, since it skipped all paths that do not generation.
reach node n+1. However it was still time-consuming, as it To evaluate the security of the latest two-layer Captcha,
has to find all combinations. we follow the main steps presented in Section IV. Since all
c) Dynamic programming graph search algorithm characters are hollow in the new scheme, we skip the process
proposed in [10] (DP [10]): prunes out redundant nodes of separating filled characters from solid ones after CFS, and
on the graph which not only saves time spent on character separate the upper and lower layers directly (see Figure 14).
recognition but also decreases the search space of the graph. Our attack has achieved a success rate of 28.0% on the latest
Thus, it is much more effective than the above two algorithms. generation of Microsoft’s two-layer Captchas with an average
DP [10] is similar to ours, but instead of searching for a target attack speed of 12.56 seconds. The result shows that, even
path with the largest confidence level sum, we search for the though the new scheme has applied several new resistance
path with the average confidence level as our final result. mechanisms to enhance its security, it is still weak in the face
From our experience, the confidence level sum of a path with of our attack. What is also important to mention is that the
more steps (characters) is often larger than the confidence new generation uses many similar characters, e.g. ‘6’ and ‘b’,
level sum of the path with fewer steps (characters).When ‘a’ and ‘9’. It indeed increases the burden of the recogni-
a Captcha scheme has a varied string length, it seems to tion engine. Unfortunately, it is clear that it also decreases
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1682 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
Fig. 14. The attack process of the latest two-layer Captcha deployed by Microsoft.
TABLE VII
ATTACK R ESULTS OF S INA S CHEME
Fig. 15. An example image of reCAPTCHA: (a) Binarized image; (b) Find
rotation angel; (c) Rotated image; (d) After extracting character components.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1683
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
1684 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 7, JULY 2017
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: RESEARCH ON THE SECURITY OF MICROSOFT’S TWO-LAYER CAPTCHA 1685
[4] J. Yan and A. S. El Ahmad, “Usability of CAPTCHAs or usability issues [29] Y. Jia et al., “Caffe: Convolutional architecture for fast feature embed-
in CAPTCHA design,” in Proc. 4th Symp. Usable Privacy Secur., 2008, ding,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675–678.
pp. 44–52. [30] M. Mohamed et al., “A three-way investigation of a game-CAPTCHA:
[5] K. Chellapilla, K. Larson, P. Y. Simard, and M. Czerwinski, “Building Automated attacks, relay attacks and usability,” in Proc. ACM Symp.
segmentation based human-friendly human interaction proofs (HIPs),” Inf., Comput. Commun. Secur., 2014, pp. 195–206.
in Human Interactive Proofs. Heidelberg, Germany: Springer, 2005,
pp. 1–26.
[6] K. Chellapilla, K. Larson, P. Y. Simard, and M. Czerwinski, “Computers
beat humans at single character recognition in reading based human Haichang Gao (M’07) is currently an Associate
interaction proofs (HIPs),” in Proc. 2nd Conf. Email Anti-Spam (CEAS), Professor with Xidian University. He is also in
Jul. 2005, pp. 1–8. charge of a project of the National Natural Sci-
[7] J. Yan and A. S. El Ahmad, “Breaking visual CAPTCHAs with naive ence Foundation of China. He has published over
pattern recognition algorithms,” in Proc. 23rd Annu. Comput. Secur. 30 papers. His current research interests include
Appl. Conf. (ACSAC), Dec. 2007, pp. 279–291. Captcha, computer security, and machine learning.
[8] J. Yan and A. S. El Ahmad, “A low-cost attack on a microsoft
CAPTCHA,” in Proc. 15th ACM Conf. Comput. Commun. Secur., 2008,
pp. 543–554.
[9] E. Bursztein, J. Aigrain, A. Moscicki, and J. C. Mitchell, “The end is
nigh: Generic solving of text-based CAPTCHAs,” in Proc. 8th USENIX
Workshop Offensive Technol. (WOOT), 2014, pp. 1–15.
[10] H. Gao et al., “A simple generic attack on text CAPTCHAs,” in Proc.
Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2016,
pp. 1–14. Mengyun Tang is currently pursuing the master’s
[11] K. Thomas, D. McCoy, C. Grier, A. Kolcz, and V. Paxson, “Trafficking degree in computer science with Xidian University.
fraudulent accounts: The role of the underground market in twitter spam Her current research interest is Captcha.
and abuse,” in Proc. USENIX Secur. Symp., 2013, pp. 195–210.
[12] E. Bursztein, A. Moscicki, C. Fabry, S. Bethard, J. C. Mitchell, and
D. Jurafsky, “Easy does it: More usable CAPTCHAs,” in Proc. Sigchi
Conf. Human Factors Comput. Syst., 2014, pp. 2637–2646.
[13] Y. LeCun et al. (2015). LeNet-5, Convolutional Neural Networks.
[Online]. Available: http://yann.lecun.com/exdb/lenet
[14] M. Naor, “Verification of a human in the loop or iden-
tification via the Turing Test,” 1996. [Online]. Available:
http://www.wisdom.weizmann.ac.il/~naor/PAPERS/human.ps
[15] M. D. Lillibridge, M. Abadi, K. Bharat, and A. Z. Broder,
“Method for selectively restricting access to computer systems,”
U.S. Patent 6 195 698 B1, Feb. 27, 2001. Yi Liu is currently pursuing the master’s degree in
[16] G. Mori and J. Malik, “Recognizing objects in adversarial clutter: computer science with Xidian University. His current
Breaking a visual CAPTCHA,” in Proc. IEEE Comput. Soc. Conf. research interest is Captcha.
Comput. Vis. Pattern Recognit., vol. 1. Jun. 2003, pp. I-134–I-141.
[17] P. Simard, “Using machine learning to break visual human interaction
proofs (HIPs),” in Proc. Adv. Neural Inf. Process. Syst., vol. 17. 2005,
pp. 265–272.
[18] S. Hocevar, “Pwntcha-CAPTCHA decoder Web site,” 2007. [Online].
Available: http://sam.zoy.org/pwntcha/
[19] H. Gao, W. Wang, J. Qi, X. Wang, X. Liu, and J. Yan, “The robustness of
hollow CAPTCHAs,” in Proc. ACM SIGSAC Conf. Comput. Commun.
Secur., 2013, pp. 1075–1086.
[20] C. H. B. L.-P. Karthik and R. A. Recasens, “Breaking microsoft’s
CAPTCHA,” Tech. Rep., 2015.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Ping Zhang is currently pursuing the master’s
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, degree in computer science with Xidian University.
pp. 2278–2324, Nov. 1998. His current research interest is Captcha.
[22] M. O’Neill, “Neural network for recognition of handwritten digits,”
Standard Ref. Data Program, Nat. Inst. Standards Technol., Tech. Rep.,
2006.
[23] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convo-
lutional neural networks applied to visual document analysis,” in Proc.
7th Int. Conf. Document Anal. Recognit., Aug. 2003, pp. 958–963.
[24] B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
and L. D. Jackel, “Handwritten digit recognition with a back-propagation
network,” in Proc. Adv. Neural Inf. Process. Syst., 1990, pp. 396–404.
[25] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet.
(Dec. 2013). “Multi-digit number recognition from street view Xiyang Liu (M’14) is currently a Professor with
imagery using deep convolutional neural networks.” [Online]. Available: Xidian University. He is in charge of a project of
https://arxiv.org/abs/1312.6082 the National Science and Technology Major Project.
[26] N. Otsu, “A threshold selection method from gray-level histograms,” He also leads the Software Engineering Institute,
Automatica, vol. 11, nos. 285–296, pp. 23–27, 1975. Xidian University. He has authored over 20 papers.
[27] H. Gao, W. Wang, Y. Fan, J. Qi, and X. Liu, “The robustness of His research interests include computer security and
‘connecting characters together’ CAPTCHAs,” J. Inf. Sci. Eng., vol. 30, trustworthy computing.
no. 2, pp. 347–369, 2014.
[28] D. J. Field, “Relations between the statistics of natural images and the
response properties of cortical cells,” J. Opt. Soc. Amer. A, Opt. Image
Sci., vol. 4, no. 12, pp. 2379–2394, 1987.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 13,2022 at 15:14:39 UTC from IEEE Xplore. Restrictions apply.