Research On Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha
Research On Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha
Abstract— The ability of hackers to infiltrate computer systems If the success rate of solving a Captcha for humans reaches
using computer attack programs and bots led to the development 90% or higher and computer programs only achieve a success
of Captchas or Completely Automated Public Turing Tests to rate of less than 1%, the Captcha can be considered good [3].
Tell Computers and Humans Apart. The text Captcha is the
most popular Captcha scheme given its ease of construction Multiple variations have been proposed, but they always use
and user friendliness. However, the next generation of hackers the same fundamental idea: show users an image and request
and programmers has decreased the expected security of these that they conduct a recognition task.
mechanisms, leaving websites open to attack. Text Captchas are Since its introduction, the text Captcha has been the
still widely used, because it is believed that the attack speeds most widely deployed Captcha scheme [4]. However, current
are slow, typically two to five seconds per image, and this is
not seen as a critical threat. In this paper, we introduce a research threatens the security of existing text Captchas,
simple, generic, and fast attack on text Captchas that effectively e.g., [3], [5], [6]. A large range of early simple text
challenges that supposition. With deep learning techniques, our Captcha schemes deployed in the wild have been broken,
attack demonstrates a high success rate in breaking the Roman- such as Google, Yahoo!, and Microsoft, among many others.
character-based text Captchas deployed by the top 50 most Captcha designers typically learn from previous failures to
popular international websites and three Chinese Captchas
that use a larger character set. These targeted schemes cover design Captchas with increased security and usability. Kumar
almost all existing resistance mechanisms, demonstrating that our suggested that the robustness of text Captchas should be
attack techniques are also applicable to other existing Captchas. measured by the difficulty of segmenting characters from
Does this work then spell the beginning of the end for text- Captcha images rather than recognizing what each character is,
based Captcha? We believe so. A novel image-based Captcha because it is easy for powerful classifiers, e.g., convolutional
named Style Area Captcha (SACaptcha) is proposed in this
paper, which is based on semantic information understanding, neural networks (CNNs), K-nearest neighbor (KNN) and sup-
pixel-level segmentation, and deep learning techniques. Having port vector machine (SVM), to recognize rotated or warped
demonstrated that text Captchas are no longer secure, we hope characters [7]. This has been referred to as the segmentation-
that our proposal shows promise in the development of image- resistance principle, and it has become the footstone for
based Captchas using deep learning techniques. designing text Captchas. Current text Captchas are much
Index Terms— Captcha, text-based, security, deep learning, more sophisticated than previous models. To enhance secu-
convolutional neural network, image-based. rity, Captcha designers tried to add various novel resis-
I. I NTRODUCTION tance mechanisms to existing text Captchas, e.g., crowding
characters together (CCT), noise arcs, complicated back-
C APTCHA stands for “Completely Automated Public
Turing Test to Tell Computers and Humans Apart”.
Since it was proposed by Von Ahn et al. in 2003 [1],
grounds, hollow schemes and two-layer structures. However,
all of these resistance mechanisms seem to be ineffective, as
studies [8]–[11] have demonstrated. Some more-recent
Captchas have been widely used in commercial applications
researchers have claimed that they can break a large group
to provide security against malicious computer programs and
of Captchas with a variety of design features in a single
bots [2]. Captchas automatically generate and evaluate a test
step [12], [13].
that is difficult for a computer to solve, but easy for humans.
Clearly, these attacks pose a realistic threat to current text
Manuscript received July 18, 2017; revised December 25, 2017 and Captchas. There is now a general agreement that they are no
March 6, 2018; accepted March 21, 2018. Date of publication March 29, longer secure. However, text Captchas are still widely used
2018; date of current version May 9, 2018. This work was supported in
part by the National Natural Science Foundation of China under Grant because the attacks are slow, typically taking two to five sec-
61472311 and in part by the Fundamental Research Funds for the Central Uni- onds per image. In this paper, we propose a simple, fast and
versities. The associate editor coordinating the review of this manuscript and effective attack with deep learning techniques. It contains three
approving it for publication was Prof. Karen Renaud. (Corresponding author:
Haichang Gao.) main steps: pre-processing converts a color Captcha image to
The authors are with the Institute of Software Engineering, Xidian Univer- black-and-white and removes noise arcs or complicated back-
sity, Xi’an 710071, China (e-mail: hchgao@[Link]). grounds; color filling segmentation (CFS, introduced in [6]) is
Color versions of one or more of the figures in this paper are available
online at [Link] used to select single characters or to simply divide a Captcha
Digital Object Identifier 10.1109/TIFS.2018.2821096 into equally distributed segments according to the number of
1556-6013 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.
TANG et al.: RESEARCH ON DEEP LEARNING TECHNIQUES 2523
characters it contains; and recognition utilizes deep CNN to Captchas, presents the methods to break them and evalu-
determine what each single character is, and it combines the ates the security of both traditional Roman character-based
recognition results as the final result. Captchas and the newly emerged large-alphabet Captchas.
We tested our attack on the text Captchas deployed by the Finally, based on this thorough analysis, we conclude that
top 50 most popular websites according to the Alexa ranking existing text Captchas are not robust enough to resist automatic
to evaluate the security. These real-world Captchas cover attacks using optimized image pre-processing algorithms,
all resistance mechanisms, including Google’s reCAPTCHA, advanced deep learning techniques and improved hardware
which is derived from Google Street View, character isolated equipmebluent. It is necessary to develop new Captchas with
schemes, hollow schemes, CCT schemes and other schemes increased security to resist deep learning-based attacks.
with noise arcs or complicated backgrounds. Our attack has The remainder of this paper is organized as following.
achieved success rates ranging from 10.1% to 90%, with an Section II provides a review and an analysis of real-word
average attack speed of 0.03 to 0.65 seconds. Judged by text Captchas, and Section III evaluates the robustness of
the commonly used criteria presented in [3], this attack has the most commonly used Roman character-based Captchas.
successfully broken all these targeted Captcha schemes at a In Section IV, we conduct a further study analyzing the
speed that allows us to claim it as a real-time attack. Although security of Captchas using large character sets, such as Chinese
various anti-segmentation mechanisms were adopted or these Captchas, we verify the effect of network depth on our attack,
schemes and our rough segmentation method cannot precisely and we compare it with prior attacks. A novel image-based
segment a Captcha image into individual characters, our Captcha we called SACaptcha is proposed in Section V, and
attack still achieved high success rates. This proves that the Section VI concludes the paper.
segmentation-resistance principle may no longer be applicable.
The most widely used text Captchas are usually based
II. A S URVEY A BOUT T EXT C APTCHAS
on English letters and Arabic numerals, which are limited
to 62 character categories (26 uppercase letters, 26 low- From its inception, the text Captcha has played an important
ercase letters and 10 digits). An additional type of text role in enhancing Internet security. It is the earliest and the
Captcha was developed from large-alphabet languages such as most popularly deployed Captcha scheme, especially those
Chinese, Japanese and Korean. For example, Chinese Captcha, schemes based on English letters and Arabic numerals. This
which has billions of users, is much more complicated than widespread usage may be due to several factors [7], [14]: from
the commonly used Roman character-based Captchas. There the user’s point of view, a Captcha is only a text recognition
are approximately 3755 commonly used Chinese charac- problem, and it is intuitive to users worldwide; for a Captcha
ters, which makes the solution space larger than traditional deployer, the cost of generating a text Captcha is lower than
text Captchas using English letters and Arabic numerals. other Captcha alternatives, e.g., image-based and audio-based
This paper also analyzes the security of such large-alphabet Captchas.
Captchas. We collected three Chinese Captchas from Baidu The original text Captcha forms are relatively simple. For
and QQ as representatives and tested our attack on them. instance, an early Captcha scheme deployed by Yahoo! for its
The success rates we obtained vary from 28.6% to 93.0%, free email services just asks users to read a distorted word.
indicating that large-alphabet Captchas are equally unsecure. Current Captchas are much more sophisticated, with various
We propose an image-based Captcha named Style Area resistance mechanisms to enhance their security. We sum-
Captcha (SACaptcha) that is based on the neural style transfer marize the most commonly used resistance mechanisms as
technique. To pass the test, users are required to click fore- follows:
ground style-transferred regions in an image based on a brief Character Isolated: The character isolated Captcha scheme
description. Unlike earlier image-based Captchas, SACaptcha might be the simplest; see Figure 1 (a). CFS [6] is able to
relies on human understanding of semantic information and extract single characters easily, and powerful classifiers work
pixel-level segmentation, which seems to be more difficult well in recognizing these extracted characters.
for machines to solve. With neural style transfer, labels for Rotation and Warping: Rotation and warping are the earliest
images will be unnecessary, and any image can serve as the and most widely used resistance mechanisms. Rotation con-
input to automatically generate a Captcha. To test its usability structs characters in a random angle, whereas warping waves
and robustness, we conduct user study to evaluate human the local characters or the whole Captcha string. Figure 1 (b)
performance and try to attack it using three state of the art provides a sample Captcha. Essentially, rotation and warping
techniques. We think it is a positive attempt on applying deep increase the variations of characters, making the recognition
learning techniques to Captcha design. task more difficult. However, it has already been proved that
Unlike previous work, which only uses deep learning as a modern classifiers are able to easily recognize these rotated
recognition engine for individual characters, we also utilize and warped single characters [7].
it to enhance the security of the image-based Captcha. Our Overlapping: Overlapping removes the space between
work provides the evidence that deep learning techniques can adjacent characters to crowd characters together, making it
be used not just as an attack tool to break Captchas but can more difficult for computers to detect where each character
also be leveraged to improve their security. is, as Figure 1 (c) shows. It is an effective application
This paper provides a comprehensive analysis of text of the segmentation-resistance principle. However, several
Captchas. It analyzes all resistance mechanisms used in text attacks have been proposed to solve this Captcha scheme,
2524 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018
TABLE I
R ESISTANCE M ECHANISMS A DOPTED BY TARGETED C APTCHA S CHEMES
The pixels at the very bottom and at the right end in a Captcha
image are identified in order to determine the angle, from
which we can determine the rotation required to create the
upright characters.
Noise arcs: The Captchas deployed by Yandex and Sina
contain noise arcs. The main difference between the schemes
is that the noise arcs in the Yandex scheme are within the text
area, whereas Sina’s extend past the text area. We remove the
noise arcs in the latter case, since they cause these obtained
segments to contain noise information. The noise arcs in the
Sina scheme are the same color, so we detect the color of
the pixels in each column, and if a column only contains
one color, we set the non-white pixels in this column to
white.
Complicated Background: The reCAPTCHA, QQ, PayPal
and Apple schemes all use a complicated background mech-
anism. reCAPTCHA is the most difficult scheme to break,
as it uses real-world street views. We binarize the image
and remove the noise blocks connected with the edge of the
image. For QQ scheme, we do not remove all the background
information, but we locate the text area according to the
projection method introduced in [10]. This requires counting
the number of non-white pixels in each column and each row
Fig. 2. Sample images of targeted schemes: (a) reCAPTCHA, (b) Baidu, and then removing all non-white pixels in these columns and
(c) Wikipedia, (d) QQ, (e) Microsoft’s single-layer scheme, (f) Microsoft’s the rows whose number of non-white pixels is smaller than a
two-layer scheme, (g) Sina, (h) Weibo, (i) Yandex, (j) PayPal, (k) Apple.
certain threshold. This threshold is determined by analyzing
a small sample set of data. The backgrounds of PayPal and
method [17]. For other Captcha schemes, the adoption of Apple can be easily deleted by removing tiny noise blocks and
rotation, noise arcs, a complicated background or a two-layer single-direction lines.
structure will lead to an unclear location of the text area and Two-Layer Structure: The second version of Microsoft’s
will have a negative effect on later segmentation. Therefore, Captcha is two-layered. The attack for this scheme is to first
we treat them during pre-processing as follows: segment the two-layer Captcha into two single-layer Captchas
Rotation: Microsoft’s single-layer Captcha scheme is heav- by using the envelope algorithm proposed by Gao et al. [11].
ily rotated, making the segments created by later segmentation The pre-processing results of each scheme are listed
too thin to contain enough information for each character. in Table II. Existing advanced image pre-processing techniques
Therefore, we rotate the original Captcha images to upright. are robust enough to remove the interference in all cases.
2526 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018
TABLE II
ATTACK P ROCESS
2) Segmentation: This step divides a Captcha image into acter according to their location relationship. That is, if one
mini segments that each contains a single character. According connected area is entirely on the top of another connected area,
to the relationship between adjacent characters, we classify we combine them.
the Captchas into two categories here: the character isolated b) CCT scheme: For schemes with connected characters,
scheme and the CCT scheme. The characters in the character we divide the Captcha image into equally distributed segments
isolated scheme are separated, whereas the characters of the according to the number of characters it contains. However,
CCT are connected to each other. We leverage different some Captcha schemes use a varied Captcha string length, and
approaches to obtain mini segments for these two schemes. we cannot know how many characters each Captcha image
a) Character isolated scheme: For character isolated contains in advance. Using Microsoft’s single layer Captcha
schemes, in which all characters are single connected areas, as an example, the string length ranges from 5 to 7. Thus,
e.g., reCAPTCHA and Wikipedia, we simply utilize the CFS the first challenge of segmentation is to determine the number
algorithm [6] to identify each individual character. Note that of characters in a Captcha image.
for characters like ‘ i ’ and ‘ j ’ in Wikipedia that consist We regard this as a classification problem, and we utilize a
of two connected areas, we combine them into a single char- CNN model to conduct this classification task. For example,
TANG et al.: RESEARCH ON DEEP LEARNING TECHNIQUES 2527
Fig. 3. Our improved LeNet-5: where C denotes the convolution layer, MP denotes the max-pooling layer, and n@a×a denotes that this layer generates n
feature maps with a size of a×a; Flatten: 1024 denotes that the flatten layer generates a vector with 1024 factors, and F7: 512 denotes that the 7th layer is
a fully connected layer containing 512 hidden neurons.
determining the number of characters in Microsoft’s single- using stochastic gradient descent (SGD), described in [19].
layer Captcha is a three-classification problem, that is, its The original learning rate was set to 0.001. In total, we trained
Captcha images can be classified into three categories: images 50 epochs.
with 5 characters, images with 6 characters and images with Although there have been many deeper network architec-
7 characters. The width and height of the text area provide tures for image classification, and although our CNN model is
significant information for estimating Captcha string length. probably not optimal, we did not optimize the network, for one
To maintain the information and guarantee a standard input to simple reason: our network turned out to be effective enough
the network, we remove the surrounding blank space in each at recognizing the segments extracted from Captcha images.
Captcha image and relocate it to the center of a relatively larger We will investigate the effect of network depth on our attack
blank image as the input of the network. The output is the results in Section IV.
estimated Captcha string length of this image. The CNN model
we utilized here is same as the one we used for recognition B. Experiment Results
and will be illustrated in detail later. We implemented our attack and tested it on all targeted
For the CCT schemes with a varied Captcha string length, schemes. The Captcha images were processed in C# on a
our method has achieved an average accuracy of 91.5% desktop computer with a 3.3 GHz Intel Core i3 CPU and
for estimating how many characters each image contains. 2 GB RAM. Our modified LeNet-5 is based on a deep learning
Compared to the methods in [8] and [13], which determine framework called Caffe [20], which was developed by a UC
the number of characters by dynamically selecting a path Berkeley team. To speed up the training process, we trained
according to the recognition results, our method is much it with an NVIDIA TITAN X GPU and 64 GB RAM.
simpler and more effective. 1) Data Collection: For each scheme listed in Table II,
Then, we vertically segment each Captcha image into c we collected 3400 random Captcha images from the corre-
equally distributed segments, where c is the number of char- sponding websites, 2000 of which were used as the training
acters it contains, as estimated by the CNN. To guarantee that set, 400 were used as the validation set, and the remaining
each segment contains the main information of a character, 1000 were used as the test set. The collection of our data was
we slightly overlap them, since the width of each character is carried out from 2016 to 2017.
varied. Table II lists the segmentation results of each Captcha 2) Success Rate: Our attack evaluation process follows
scheme. Note that we only use the CNN model to estimate the previous practice. Table III summarizes our attack results for
Captcha string length of the Captcha images in the test set. each Captcha scheme. Our success rates range from 10.1% to
For the training set, we divide a Captcha image according its 90.0%. A standard goal for Captcha robustness is to prevent
real number of characters. automated attacks from achieving a success rate higher than
3) Recognition: Finally, we modify the most fundamen- 1% [3]. Therefore, our simple attack broke all of the targeted
tal CNN architecture, LeNet-5 [18], which was originally schemes. Apart from reCAPTCHA, the success rates achieved
designed for hand-written and machine-printed character by our attack are higher than 47%, far beyond the standard
recognition, and use it as our recognition engine to recognize criterion. It turns out that all text Captchas deployed by the
mini segments received by segmentation. The recognition top 50 most popular websites in the world are not as secure
results of all segments are combined as the final result (see as expected.
the last column of Table II). The success rate against reCAPTCHA is the lowest, since
LeNet-5 uses a feed-forward architecture. It takes a charac- it uses street views containing house numbers as the Captcha.
ter image as input, continues with a linear succession of layers These complicated street view scenarios make the extraction
and then outputs the category this character image belongs to. of a house number extremely difficult, and there is a wide
The original LetNet-5 contains three convolutional layers, two variation in the choice of fonts in real-world house numbers.
subsampling layers and two fully connected layers. To receive Both of these reasons explain the lower success rate of our
higher-level features, we modify LeNet-5 by adding an extra attack.
convolutional layer. Figure 3 illustrates this. 3) Attack Speed: We ran all test images ten times
The mini segments are normalized to 28 × 28 pixels as the and calculated the average attack speed of each scheme.
inputs of the network architecture. The network was trained As shown in Table III, our average attack speed ranges from
2528 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018
TABLE III have never learned Chinese. Baidu and QQ are selected as the
ATTACK R ESULTS representatives, as they are the largest search engine and the
most widely used communication tool in China respectively,
with Alexa ranking them as the top 4 and 7 most popular
websites in the world, respectively (see Table IV). The QQ
and Baidu Tieba schemes both require users to click bottom
characters in order to match the sequence of upper characters,
while the Baidu Registration scheme needs to input the two
characters shown in the Captcha image to pass the test.
In the QQ scheme, the characters’ pixels are black and are
surrounded by white pixels. Using this characteristic, we easily
extracted the upper and lower characters from the complicated
background. For Baidu Tieba and Baidu Registration, the pro-
jection method [10] is used to remove the noise arcs that
extend the text area. After removing the noise arcs, the CCT
0.03 to 0.65 seconds, indicating that this is a real-time attack.
sections of these two Baidu schemes are divided into equally
Our data shows that humans identify the Captcha more slowly
distributed segments, as Table IV shows.
than our attack method. Prior work [21] reported an average
We build a new CNN architecture to serve as the recognition
solving time for some real-world Captchas of more than
engine of Chinese characters, as Figure 4 shows. It contains
46 seconds.
five convolutional layers, three max-pooling layers and two
To sum up, it is clear that our attack is extremely effective,
fully connected layers. The convolution filters of the first
fast and poses a realistic threat to all text Captchas. What
two convolutional layers are 11 × 11 and 5 × 5, respectively,
is also important to mention is that our work undermines
and each of them is followed by the rectified linear unit
the idea presented in the segmentation-resistance principle
(ReLU) [22] non-linear activation function and by normal-
proposed in [7]. Even though our rough segmentation method
ization. The following convolutional layers all have a filter
used for most Captcha schemes cannot precisely segment all
of 3 × 3, and zero padding is applied to each. Apart from
characters from each other, with deep learning techniques, our
the stride of 4 in the first convolutional layer, the other four
attack can still work very effectively on targeted schemes. The
convolutional layers all have a stride of 1. Each max-pooling
segmentation-resistance principle used in these schemes seems
layer has a kernel of 3 × 3, and the stride is 2. Similar to
to be ineffective. A powerful recognition engine and a large
previous experiments, the SGD algorithm [19] is used to train
amount of data can be used to compensate for the failure of
the network, and the initial learning rate is 0.001. In total,
segmentation. The segmentation-resistance principle may be
we finish the training after 50 epochs. The implementation
not appropriate for current Captcha design, especially in the
framework and environment are the same as in previous
face of deep learning-based attacks.
experiments.
Unlike the common text Captchas, we mined 1000 random
IV. D ISCUSSION
Captcha images from the corresponding websites as the test
A. The Security of Large-Alphabet Captchas set. For the training and validation sets, we mimicked the
The above work has shown the vulnerability of commonly shapes of the Chinese characters contained in these Captchas
used text Captchas based on English letters and Arabic to automatically generate single-character images to train and
numbers. Such Roman character-based Captchas, at most, select the model. The 3755 most commonly used Chinese
contain 62 categories of characters. In fact, most Captchas characters from the China National Standard GB2312 were
use a much smaller character set than 62, since some letters taken to make the samples. For each character category,
are too similar for users to distinguish. Another type of we generated 200 single character images with random rota-
text Captcha has emerged that is derived from large-alphabet tion. For the Baidu Registration scheme, we added random
languages, e.g., Chinese, Japanese and Korean. These large- noise arcs to increase their variations. A total of 180 sin-
alphabet Captchas usually have a large character set, which gle character images were used for training, and another
appears to increase the solution space and promise better 20 were used for validation. In total, the training set con-
security. Chinese Captcha, for instance, has thousands of tained 675,900 single-character images, while the validation
commonly used characters. This makes the recognition task set contained 75,100 single-character images. All images were
extremely difficult, especially for traditional machine learning normalized to 108 × 108 pixels to feed to the network.
algorithms. We achieved a success rate of 93.0%, 32.2% and 28.6%
To evaluate the security of large character set Captchas, against QQ, Baidu Tieba and Baidu Registration, with an
we conduct a detailed analysis of Chinese Captcha average attack speed of 2.816 seconds, 1.408 seconds and
as a representative. It is generally thought that only 0.108 seconds, respectively. Our analysis demonstrates that
users who know or have studied Chinese can recognize even with a much larger solution space, Chinese Captchas are
Chinese Captchas. However, previous work [15] has shown still vulnerable to attacks based on deep learning techniques.
that Chinese Captchas can also be designed in such a way that Our results are equally applicable to Captchas derived from
they are universal, meaning that they can be used by users who other large-alphabet languages, such as Japanese and Korean.
TANG et al.: RESEARCH ON DEEP LEARNING TECHNIQUES 2529
TABLE IV
ATTACK P ROCESS AND ATTACK R ESULTS OF C HINESE C APTCHAS
Fig. 4. CNN model used for recognizing Chinese characters: where C denotes convolution layer, MP denotes max-pooling layer, and n@a×a denotes that
this layer generates n feature maps with a size of a×a; Fi: m denotes that the ith layer is a fully connected layer and contains m hidden neurons.
used sophisticated object recognition algorithms to break than ours. Although we simply used CFS to select single
Gimpy and EZ-Gimpy, with success rates of 33% and 99%, characters or roughly segmented a Captcha image into equally
respectively. In 2004, Moy et al. [24] proposed distortion distributed segments, our method still achieved high success
estimation techniques to attack EZ-Gimpy, with a success rates. Excluding reCAPTCHA, the attack success rates of the
rate of 99%. In [5], Yan and Ahmad broke a number of other schemes we tested are all higher than 47%, but only 3 of
Captchas using a pixel counting method. However, these the 10 targeted schemes Gao’s team tested achieved a success
attacks are neither generalizable nor robust in detecting slight rate higher than 47%. What is most important to mention is
changes in the Captchas, whereas our attack has broken a large that our attack speed is extremely fast. Our slowest attack on
group of text Captchas. Other ad hoc methods include [6] traditional text Captchas is 0.65 seconds, which is much faster
and [25]–[27], of which [25]–[27] reported various attacks on than the fastest attack reported in [12] and [13]. To sum up, our
previous text versions of reCAPTCHA, whereas [6] presented attack is simple and generic, but effective and fast. We claim
attacks against Microsoft’s Captcha. that it is a real-time attack.
In 2010, Li et al. [28] reported a comprehensive analysis Our work also tested the robustness of a two-layer Captcha
of e-banking Captchas. They built a set of image process- deployed by Microsoft, similar to the scheme analyzed by
ing and pattern recognition tools, such as k-means cluster- Gao et al. in [11]. Again, the success rate achieved by our
ing and morphological operations, and successfully broke attack is much higher (65.8% vs. 28.0%), and it is also
three e-banking Captchas for transaction verification and much faster (0.65 seconds vs. 12.56 seconds). We analyze the
41 e-banking Captchas for login, with success rates either reasons as follows. Their method used a DP algorithm to select
equal to or close to 100%. For each Captcha scheme, they a path with the largest average confidence level as the final
combined different tools to attack. Therefore, it is actu- result, but this leads to numerous failures in estimating the
ally a toolbox method. Decaptcha is another toolbox-based Captcha string length; our attack instead utilizes deep learning
attack, proposed by Bursztein et al. [3]. They successfully techniques to determine the Captcha string length, achieving
attacked 13 of the 15 Captchas from popular websites and an accuracy of 91.5%. Additionally, their method includes
claimed that their method is generic. In fact, Decaptcha many complex processes (e.g., using CFS to convert hollow
uses a five-stage pipeline: pre-processing, segmentation, post- characters to solid and using Gabor filters to extract character
segmentation, recognition and post-preprocessing. Similar to components), whereas our attack comprises three simple steps,
the attack reported by Li et al., various techniques were used increasing our attack speed. However, the envelope algorithm
for different Captcha schemes in each stage of Decaptcha. in [11] for the two-layer Captcha is indeed effective, and it
Our method uses special techniques for a few complicated provided a good basis for our attack.
Captcha schemes during pre-processing, but for the segmen- Large-alphabet Captchas, as a newly developed Captcha
tation and recognition processes, we used unified algorithms form, are much more difficult to solve than traditional text
on all Captcha schemes. Our method is much simpler than Captchas, since they have a larger solution space. However,
theirs. Decaptcha also failed to break reCAPTCHA and hollow using Chinese Captcha as a representative, we also determined
Captcha, whereas our method can. that it is vulnerable to our attack. Yan’s team [15] also
Gao’s team first introduced a single-step attack at analyzed Chinese Captchas, but they only provided evidence
CCS’13 [8] that uses machine learning to solve the seg- that computers recognize individual Chinese characters well,
mentation and recognition problems simultaneously. However, whereas our attack has successfully broken real-world Chinese
their method mainly focuses on hollow Captchas; it cannot Captchas.
break non-hollow schemes. The CFS algorithm they adopted Deep learning techniques have become the main tools in
to extract hollow fonts cannot extract solid fonts, so that their analyzing the robustness of Captchas. Goodfellow et al. [16]
method cannot separate non-hollow characters that connect proposed a method based on deep CNN to recognize multi-
with each other. However, our method has broken both hollow digit numbers from street view imagery, and with the use of a
and non-hollow schemes. dataset containing tens of millions of transcribed street num-
Other generic methods include [12] and [13]. The attack bers, they achieved an extremely high accuracy (over 96%).
proposed by Bursztein et al. [12] is the second attempt to In 2015, Karthik and Recasens [29] utilized a template-based
address segmentation and recognition simultaneously, and it method and a CNN-based method to break Microsoft’s single-
successfully broke a group of text Captchas. Their method layer Captcha. Their results show that the CNN-based method
consists of four components: a cut-point detector, a slicer, performs much better. In 2016, Sivakorn et al. [30] success-
a scorer and an arbiter. It analyzed all possible ways to segment fully attacked the semantic image-based Captchas deployed
a Captcha image and then used ensemble learning to identify by Google and Facebook with accuracies of 70.78% and
among each sequence of segments the best possible one as the 83.5% respectively. All of these only evaluated one or two
result. The success rates achieved by their method range from Captcha schemes using deep learning techniques, whereas ours
5.33% to 55.22%. At NDSS’ 16, Gao’s team [13] reported has carried out a more comprehensive anlysis which contains
another generic attack, which utilized Gabor filters to extract 14 Captchas. Le et al. [31] introduced a Captcha-breaking net-
character components and then tries different combinations of work combining CNNs and recurrent neural networks (RNNs).
adjacent components. Finally, when directed by a DP algo- They used synthetic data to train the network. When tesing
rithm, the most likely combination is chosen as the final result. on the synthetic Captchas, they achieved high success rates
Both Bursztein’s and Gao’s methods are more complicated ranging from 91.0% to 99.9%. But when testing on real-world
TANG et al.: RESEARCH ON DEEP LEARNING TECHNIQUES 2531
D. Analysis
Our real-time effective attack has broken all text Captchas
deployed by the top 50 most popular websites in the world
with high success rates and fast attack speeds. We also
successfully attacked several Chinese Captchas, which use a
large character set. Our results are also applicable to other
Roman character-based Captchas and Captchas derived from
other large-alphabet languages such as Japanese and Korean.
Obviously, Captcha designers have tried their best to pro-
pose various resistance mechanisms to improve the security
of text Captchas, e.g., complicated backgrounds and two-layer
structures. However, they are ineffective for the following rea-
sons: first, various image pre-processing techniques proposed
by early efforts can easily remove noise arcs, complicated
backgrounds or other types of interference; second, with the
development of deep learning techniques, advanced hardware
equipment (e.g., GPU) and large datasets, character recog-
nition has achieved extremely high accuracy, not only with
Roman characters but also with Chinese characters, which
have a large character set.
Therefore, we conclude that all previously proposed resis-
tance mechanisms are vulnerable, and existing text Captchas
are not as secure as expected. However, given Captcha’s
user friendliness, instead of discarding it completely, we are
willing to develop more-effective ways to design text Captchas
with higher security and better usability. Based on the
segmentation-resistance principle [7], Captcha designers have
focused extensively on how to make text Captchas more diffi-
cult for computers to segment a Captcha into single characters.
Our work has demonstrated that even when characters are
imprecisely segmented, our attack still achieves high success
rates. Thus, we argue that Captcha designers should pay more
attention to designing text Captchas that are more resistant to
recognition, especially for deep learning-based attacks. Fig. 5. Other Captcha alternatives: (a) ASIRRA, (b) Semantic image
Captcha deployed by Google, (c) Semantic image Captcha deployed by
Facebook, (d) Avatar Captcha, (e) FR-CAPTCHA, (f) FaceDCAPTCHA,
V. SAC APTCHA : A N OVEL I MAGE -BASED C APTCHA (g) IMAGINATION, (h) What’s up Captcha?, (i) DCG Captcha, (j) Drawing
A. Other Captcha Alternatives Captcha.
asks users to click all foreground regions with circle, heart and background more varied. The larger α is, the more style
pentagram shapes. The image on the right shows the answer information the synthetic background contains.
by surrounding the shapes with red outlines. A disturbed
yb = αy + (1 − α)yc (1)
foreground region is marked with a green outline. Essentially,
solving this Captcha requires users to understand the semantic • Generate the Captcha. We randomly crop regions with
information (shapes of regions) contained in the image and to different shapes from other style-transferred images and
conduct a rough pixel-level segmentation. relocate them in the synthetic background to generate
We limit the number of foreground style-transferred regions a Captcha. To ease attacks based on edge detection,
in each Captcha image to 4 to 7, and we keep the maximum the edge of each region is blurred.
width and the maximum height of each region to less than • Generate a description. Finally, we generate a brief
80 pixels. The shape of each region is randomly selected: it can description to guiding users on how to pass the test.
be a rectangle, a triangle, a circle or other irregular shapes such With pre-trained style transfer networks, it takes 0.688 sec-
as a heart, a leaf, a moon and so on. We also randomly choose onds on average to automatically generate a Captcha image.
the targeted shapes to generate a description from the shapes Neural style transfer changes the features of the original
in the corresponding Captcha image. The number of targeted image. Therefore, even if the attackers find the original
shapes is varied, but we expect that users will be required to image, the security of the SACaptcha is not compromised,
click at least four regions. because they do not know which style was used in a Captcha.
2) Generation Process: Figure 7 illustrates the generation Moreover, any image can be the resource for generating a
process of SACaptcha, it includes four main steps: SACaptcha, so we do not need to manually add labels to source
• Generate style-transferred images. The network intro- images. In summary, the cost of generating an SACaptcha is
duced in [57] has been improved to create style- very low.
transferred images. Several style transfer networks are 3) Usability Study: Usability and security are the most
pre-trained, and each is provided with a style. To generate significant factors for Captcha designing. Thus, we carry
a Captcha, we select an image as the content image and out experiments to evaluate the usability of SACaptcha first.
send it into some randomly selected pre-trained style It involved 100 participants whose age ranges from 18 to 42.
transfer networks to generate style-transferred images, To investigate how the numbers of foreground shapes and
where one generates the synthetic background image and styles of SACaptcha affect human perfomance, we test three
the others generate 4 to 7 foreground images. versions of SACaptcha: the first one uses Captchas contain-
• Synthesize the background. For the background, one ing two simple forground shapes (circle and rectangle) and
of these style-transferred images is synthesized with the 11 styles; the second one uses Captchas with 25 foreground
original image at a ratio of α, as equation 1 shows, shapes but only two styles (one for the background and another
where y denotes the style-transferred image, yc denotes for the foreground); while the last one uses 25 foreground
the original image (content image) and yb denotes the shapes and 11 styles. For each test, every participant is
final synthesized background image. Note that α is a required to complete 10 challenges at least, and we record
random value with a range from 0.1 to 1.0 to make the the solving rate and the response time.
2534 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018
Fig. 8. Attack results of the sample SACaptcha: (a) Edge detection, (b) Object detection, (c) Pixel-level segmentation.
TABLE VII
ATTACK R ESULTS OF SAC APTCHA
each Captcha indicating the predicted result of each pixel. Obviously, designing a Captcha that is completely resistant
Figure 8 (c) depicts the attack result of the sample Captcha. to all known attacks is extremely difficult. We proposed a
To make sure that readers can easily understand the attack novel method of designing image-based captcha using deep
result, we also mark targeted and disturbed regions using learning techniques, although its security does not seem ideal.
red lines and green lines in Figure 8 (c). The figure shows Overall, this work is a positive attempt, and our use of deep
that FCN correctly detected two foreground regions with a learning techniques to enhance the security of Captchas is a
circle shape and one with a rectangle shape, but it incorrectly promising direction.
predicted a heart-shaped foreground region as a circular one.
It also wrongly predicted a background region as a foreground
region and omitted a pentagram-shaped foreground region as VI. S UMMARY AND C ONCLUSION
well. We have systematically provided a comprehensive analysis
We follow a same evaluation principle as object detection of text Captchas. To evaluate their security, we proposed a
attack. As shown in Table VII, the success rates received by simple, effective and fast attack on text Captchas. Using deep
pixel-level segmentation attack range from 4.5% to 43.6%. The learning techniques, we have successfully attacked all Roman
average attack speeds vary from 2.16 to 2.21 seconds, which character-based text Captchas deployed by the top 50 most
are slower than object detection attack’s. Again, the more popular websites in the world and achieved state-of-the-art
the foreground shapes and styles were applied, the lower results. Our success rates range from 10.1% to 90.0%. The
the success rate was received. Therefore, given acceptable average speed of our attack is much faster than earlier reported
usability, it is better to use more foreground shapes and attacks. We also used Chinese Captcha as a representative to
styles in SACaptcha. FCN is effective in predicting the style analyze the security of Captchas using a large character set.
region pixels, but it always wrongly selects a large number It was still vulnerable to our attack, despite the larger solution
of background pixels as foreground pixels. While FCN works space. We have successfully broken three Chinese Captchas
well on guessing standard shapes, e.g., circles and rectangles, collected from QQ and Baidu with success rates of 93.0%,
it fails to predict special shapes such as hearts and pentagrams. 32.2% and 28.6%.
e) Further study: To investigate how the size of training The attack presented in this paper has provided powerful
set impacts on the success rates, we take the third version of evidence that existing text Captchas are not robust enough,
SACaptcha as a representative and enlarge the training sets neither for traditional text Captchas based on English let-
from 4000 to 10,000 to retrain the deep learning models. ters and Arabic numerals nor for Captchas originating from
With the enlargement of the training set, the success rates large-alphabet languages. Our work counters the concept
received by object detection attack and pixel-level segmen- of the segmentation-resistance principle introduced in [7].
tation attack increase to 7.2% and 11.8%, respectively. But Even though various mechanisms are adopted to increase
from the point of view of attackers, the cost of manually the difficulty of determining where each character is in a
adding labels to images has also increased dramatically. Above Captcha image, and although the rough segmentation method
attack results we achieved are based on the accurate labels we used cannot precisely segment characters from each
we specifically generated with the SACaptcha production other, our attack was effective on all targeted schemes. The
process, e.g., the location information for object detection segmentation-resistance principle may no longer be appropri-
and the mask image for pixel-level segmentation. Compared ate in current Captcha design.
with text Captchas and early image-based Captchas based Text Captcha, as the most widely used Captcha scheme,
on image classification, it is more difficult to manually add has played an important role in distinguishing humans from
labels to SACaptcha, especially using pixel-level segmentation computers for a long time. Although previous work has shown
techniques to attack. that it is not as secure as expected, text Captcha is still widely
Above results indicate that we can further increase the used, as previous attacks have been slow. However, our attack
categories of foreground shapes and styles to enhance the has created a real-time threat to real-world text Captchas
robustness of SACaptcha under the premise of ensuring usabil- despite the various resistance mechanisms used. It is therefore
ity. However, in fact, it is hard to strike the right balance natural to ask: is the era of text Captcha at an end? Our attack
between security and usability in Captcha design [4]. We also might suggest that the current common use of text Captcha
cannot exhaust all possible attacks here. Thus, we leave the designs is doomed, and yet we hesitate to pronounce a death
more comprehensive security analysis as an open problem and sentence on text Captcha altogether. It is highly likely the new
share it with all research communities. generation of text Captchas will be designed to be resistant to
2536 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 10, OCTOBER 2018
deep learning attacks. We expect that our work will promote [16] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet.
more-effective ways to enhance the security of text Captchas. (2013). “Multi-digit number recognition from street view imagery
using deep convolutional neural networks.” [Online]. Available:
We also proposed a novel image-based Captcha named [Link]
SACaptcha using neural style transfer techniques. Most early [17] N. Otsu, “A threshold selection method from gray-level histograms,”
image-based Captchas are based on the problem of image clas- Automatica, vol. 11, nos. 285–296, pp. 23–27, 1975.
sification, whereas SACaptcha relies on problems of semantic [18] Y. LeCun et al. (2015). Lenet-5, Convolutional Neural Networks.
[Online]. Available: [Link] lecun. com/exdb/lenet
information understanding and pixel-level segmentation. This [19] D. C. Ciresan et al., “Flexible, high performance convolutional neural
is a positive attempt to improve the security of Captchas by networks for image classification,” in Proc.-Int. Joint Conf. Artif.
utilizing deep learning techniques. Intell. (IJCAI), vol. 22. no. 1, 2011, p. 1237.
In this paper, deep learning techniques play two roles: as a [20] Y. Jia et al., “Caffe: Convolutional architecture for fast feature
embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, Nov. 2014,
character recognition engine to recognize individual characters pp. 675–678.
and as a powerful means to enhance the security of the image- [21] D. D’Souza, P. C. Polina, and R. V. Yampolskiy, “Avatar CAPTCHA:
based Captcha we proposed. This work seems to provide an Telling computers and humans apart via face classification,” in Proc.
IEEE Int. Conf. Electro/Inf. Technol. (EIT), May 2012, pp. 1–6.
evidence that deep learning is a double-edged sword. It can
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
be either used to attack Captchas or improve the security of with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
Captchas. Process. Syst., 2012, pp. 1097–1105.
We hope our work has indicated a direction for future [23] G. Mori and J. Malik, “Recognizing objects in adversarial clutter:
Breaking a visual CAPTCHA,” in Proc. IEEE Comput. Soc. Conf.
Captcha study: existing text Captchas are no longer secure, Comput. Vis. Pattern Recognit., vol. 1. Jun. 2003, pp. I-134–I-141.
and other Captcha alternatives are worthy of investigation as [24] G. Moy, N. Jones, C. Harkless, and R. Potter, “Distortion estimation
substitutes. The question of whether other Captcha alternatives techniques in solving visual CAPTCHAs,” in Proc. IEEE Comput. Soc.
are robust and whether the designs of new Captchas can be Conf. Comput. Vis. Pattern Recognit., vol. 2. Jun. 2004, pp. II-23–II-28.
simultaneously secure and usable are still open problems and [25] C. Cruz-Perez, O. Starostenko, F. Uceda-Ponga, V. Alarcon-Aquino,
and L. Reyes-Cabrera, “Breaking reCAPTCHAs with unpredictable
are part of our ongoing work. collapse: Heuristic character segmentation and recognition,” in Pattern
Recognition. Berlin, Germany: Springer, 2012, pp. 155–165.
R EFERENCES [26] P. Baecher, N. Büscher, M. Fischlin, and B. Milde, “Breaking
[1] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: reCAPTCHA: A holistic approach via shape recognition,” in Future
Using hard ai problems for security,” in Advances in Cryptology- Challenges in Security and Privacy for Academia and Industry, vol. 354.
EUROCRYPT. Berlin, Germany: Springer, 2003, pp. 294–311. Berlin, Germany: Springer, 2011, pp. 56–67.
[2] A. L. Von, M. Blum, and J. Langford, “Telling humans and computers [27] O. Starostenko, C. Cruz-Perez, F. Uceda-Ponga, and V. Alarcon-Aquino,
apart automatically,” Commun. ACM, vol. 47, no. 2, pp. 56–60, 2004. “Breaking text-based CAPTCHAs with variable word and character
[3] E. Bursztein, M. Martin, and J. C. Mitchell, “Text-based CAPTCHA orientation,” Pattern Recognit., vol. 48, no. 4, pp. 1101–1112, 2015.
strengths and weaknesses,” in Proc. 18th ACM Conf. Comput. Commun. [28] S. Li, S. Shah, M. Khan, S. A. Khayam, A.-R. Sadeghi, and R. Schmitz,
Security, 2011, pp. 125–138. “Breaking e-banking CAPTCHAs,” in Proc. 26th Annu. Comput. Secu-
[4] J. Yan and A. S. El Ahmad, “Usability of CAPTCHAs or usability issues rity Appl. Conf., 2010, pp. 171–180.
in CAPTCHA design,” in Proc. 4th Symp. Usable Privacy Security, [29] C. H. B. L.-P. Karthik and R. A. Recasens, “Breaking Microsoft’s
2008, pp. 44–52. CAPTCHA,” MIT, Cambridge, MA, USA, Tech. Rep., 2015.
[5] J. Yan and A. S. E. Ahmad, “Breaking visual CAPTCHAs with naive [30] S. Sivakorn, I. Polakis, and A. D. Keromytis, “I am robot: (Deep)
pattern recognition algorithms,” in Proc. 23rd Annu. Comput. Security learning to break semantic image CAPTCHAs,” in Proc. IEEE Eur.
Appl. Conf. (ACSAC), 2007, pp. 279–291. Symp. Security Privacy (EuroSP), Mar. 2016, pp. 388–403.
[6] J. Yan and A. S. El Ahmad, “A low-cost attack on a microsoft
CAPTCHA,” in Proc. 15th ACM Conf. Comput. Commun. Security, [31] T. A. Le, A. G. Baydin, R. Zinkov, and F. Wood, “Using synthetic data
2008, pp. 543–554. to train neural networks is model-based reasoning,” in Proc. Int. Joint
[7] C. Kumar, L. Kevin, Y. S. Patrice, and C. Mary, “Computers beat humans Conf. Neural Netw. (IJCNN), 2017, pp. 3514–3521.
at single character recognition in reading based human interaction proofs [32] D. George et al., “A generative vision model that trains with high
(HIPs),” in Proc. 2nd Conf. Email Anti-Spam (CEAS), Stanford, CA, data efficiency and breaks text-based CAPTCHAs,” Science, vol. 358,
USA, 2005, pp. 1–8. no. 6368, p. eaag2612, 2017.
[8] H. Gao, W. Wang, J. Qi, X. Wang, X. Liu, and J. Yan, “The robustness of [33] J. Elson, J. R. Douceur, J. Howell, and J. Saul, “Asirra: A CAPTCHA
hollow CAPTCHAs,” in Proc. ACM SIGSAC Conf. Comput. Commun. that exploits interest-aligned manual image categorization,” in Proc.
Security, 2013, pp. 1075–1086. ACM Conf. Comput. Commun. Secur. CCS, Alexandria, VA, USA,
[9] A. S. El Ahmad, Y. Jeff, and L. Marshall, “The robustness of a new Oct. 2007, pp. 366–374.
CAPTCHA,” in Proc. EUROSEC, 2011, pp. 36–41. [34] P. Golle, “Machine learning attacks against the Asirra CAPTCHA,” in
[10] H. Gao, W. Wang, Y. Fan, J. Qi, and X. Liu, “The robustness of Proc. 15th ACM Conf. Comput. Commun. Security, 2008, pp. 535–542.
‘connecting characters together’ CAPTCHAs,” J. Inf. Sci. Eng., vol. 30, [35] M. Chew and J. D. Tygar, “Image recognition CAPTCHAs,” in Infor-
no. 2, pp. 347–369, 2014. mation Security (Lecture Notes in Computer Science), vol. 3225. Berlin,
[11] H. Gao, M. Tang, Y. Liu, P. Zhang, and X. Liu, “Research on the security Germany: Springer, 2004, pp. 268–279.
of microsoft’s two-layer Captcha,” IEEE Trans. Inf. Forensics Security,
[36] B. Cheung, “Convolutional neural networks applied to human face
vol. 12, no. 7, pp. 1671–1685, Jul. 2017.
classification,” in Proc. 11th Int. Conf. Mach. Learn. Appl. (ICMLA),
[12] E. Bursztein, J. Aigrain, A. Moscicki, and J. C. Mitchell, “The end is
vol. 2. Dec. 2012, pp. 580–583.
nigh: Generic solving of text-based CAPTCHAs,” in Proc. 8th USENIX
Workshop Offensive Technol. (WOOT), 2014, pp. 1–15. [37] G. Goswami, B. M. Powell, M. Vatsa, R. Singh, and A. Noore, “FR-
[13] H. Gao et al., “A simple generic attack on text CAPTCHAs,” in Proc. CAPTCHA: Captcha based on recognizing human faces,” PLoS ONE,
Netw. Distrib. Syst. Secur. Symp. (NDSS), San Diego, CA, USA, 2016, vol. 9, no. 4, p. e91708, 2014.
pp. 1–14. [38] G. Goswami, B. M. Powell, M. Vatsa, R. Singh, and A. Noore,
[14] K. Chellapilla, K. Larson, P. Y. Simard, and M. Czerwinski, “Building “FaceDCAPTCHA: Face detection based color image CAPTCHA,”
segmentation based human-friendly human interaction proofs (HIPs),” in Future Generat. Comput. Syst., vol. 31, no. 1, pp. 59–68, 2014.
Human Interactive Proofs. Berlin, Germany: Springer, 2005, pp. 1–26. [39] H. Gao, L. Lei, X. Zhou, J. Li, and X. Liu, “The robustness of face-based
[15] A. Algwil, D. Ciresan, B. Liu, and J. Yan, “A security analysis of CAPTCHAs,” in Proc. IEEE Int. Conf. Comput. Inf. Technol., Ubiquitous
automated Chinese turing tests,” in Proc. 32nd Annu. Conf. Comput. Comput. Commun., Dependable, Auton. Secure Comput.; Pervasive
Security Appl., 2016, pp. 520–532. Intell. Comput. (CIT/IUCC/DASC/PICOM), Oct. 2015, pp. 2248–2255.
TANG et al.: RESEARCH ON DEEP LEARNING TECHNIQUES 2537
[40] Y. Rui and Z. Liu, “Artifacial: Automated reverse turing test using facial Haichang Gao (M’07) is currently a Professor
features,” Multimedia Syst., vol. 9, no. 6, pp. 493–502, 2004. with the Institute of Software Engineering, Xidian
[41] B. B. Zhu et al., “Attacks and design of image recognition CAPTCHAs,” University. He is in charge of a project of the
in Proc. 17th ACM Conf. Comput. Commun. Security, 2010, National Natural Science Foundation of China. He
pp. 187–200. has published more than 30 papers. His current
[42] R. Datta, J. Li, and J. Z. Wang, “Imagination: A robust image-based research interests include Captcha, computer secu-
CAPTCHA generation system,” in Proc. 13th Annu. ACM Int. Conf. rity, and machine learning.
Multimedia, 2005, pp. 331–334.
[43] R. Gossweiler, M. Kamvar, and S. Baluja, “What’s up CAPTCHA?
A CAPTCHA based on image orientation,” in Proc. 18th Int. Conf.
World Wide Web, 2009, pp. 841–850.
[44] M. Mohamed et al., “A three-way investigation of a game-CAPTCHA:
Automated attacks, relay attacks and usability,” in Proc. 9th ACM Symp.
Inf., Comput. Commun. Secur., 2014, pp. 195–206.
[45] M. Shirali-Shahreza and S. Shirali-Shahreza, “Drawing CAPTCHA,” in
Proc. 28th Int. Conf. Inf. Technol. Interfaces, 2006, pp. 475–480.
[46] R. Lin, S.-Y. Huang, G. B. Bell, and Y.-K. Lee, “A new CAPTCHA Yang Zhang is currently pursuing the master’s
interface design for mobile devices,” in Proc. 12th Austral. User degree in computer science with Xidian University.
Interface Conf., vol. 117. 2011, pp. 3–8. Her current research interest is Captcha.
[47] M. Osadchy, J. Hernandez-Castro, S. Gibson, O. Dunkelman, and
D. Pérez-Cabo, “No BOT expects the DeepCAPTCHA! introduc-
ing immutable adversarial examples, with applications to CAPTCHA
generation,” IEEE Trans. Inf. Forensics Security, vol. 12, no. 11,
pp. 2640–2653, Nov. 2017.
[48] I. J. Goodfellow, J. Shlens, and C. Szegedy. (2014). “Explain-
ing and harnessing adversarial examples.” [Online]. Available:
[Link]
[49] C. Szegedy et al. (2013). “Intriguing properties of neural networks.”
[Online]. Available: [Link]
[50] S. Gu and L. Rigazio. (2014). “Towards deep neural network
architectures robust to adversarial examples.” [Online]. Available:
[Link]
[51] NUCAPTCHA. Nucaptcha & Traditional CAPTCHA. Accessed: Yi Liu is currently pursuing the master’s degree in
May 2016. [Online]. Available: [Link] computer science with Xidian University. His current
[52] Y. Xu, G. Reynaga, S. Chiasson, J.-M. Frahm, F. Monrose, and research interest is Captcha.
P. C. van Oorschot, “Security and usability challenges of moving-
object CAPTCHAs: Decoding codewords in motion,” in Proc. USENIX
Security Symp., 2012, pp. 49–64.
[53] E. Bursztein. How we Broke the NuCAPTCHA Video Scheme and
What we Proposed to fix it. Accessed: Mar. 2016. [Online]. Available:
[Link]
and-what-we-propose-to-fix-it/
[54] J. Tam, J. Simsa, S. Hyde, and L. V. Ahn, “Breaking audio CAPTCHAs,”
in Proc. Adv. Neural Inf. Process. Syst., 2009, pp. 1625–1632.
[55] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convo-
lutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2015,
pp. 262–270.
[56] L. A. Gatys, A. S. Ecker, and M. Bethge. (2015). “A neural algorithm Ping Zhang is currently pursuing the master’s
of artistic style.” [Online]. Available: [Link] degree in computer science with Xidian University.
[57] J. Johnson, A. Alahi, and L. Fei-Fei. (2016). “Perceptual losses His current research interest is Captcha.
for real-time style transfer and super-resolution.” [Online]. Available:
[Link]
[58] E. Bursztein, S. Bethard, C. Fabry, J. C. Mitchell, and D. Jurafsky, “How
good are humans at solving CAPTCHAs? A large scale evaluation,” in
Proc. IEEE Symp. Security Privacy (SP), May 2010, pp. 399–413.
[59] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
time object detection with region proposal networks,” in Proc. Adv.
Neural Inf. Process. Syst., 2015, pp. 91–99.
[60] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2015, pp. 3431–3440.
Mengyun Tang is currently pursuing the master’s Ping Wang is currently pursuing the Ph.D. degree
degree in computer science with Xidian University. in software engineering with Xidian University. Her
Her current research interest is Captcha. current research interest is Captcha.