Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 7


Abstract: Nowadays, in web application mapping,
developers use Captcha tools for preventing several security
attacks. To improve the capability of a character image to
hinder many of breaking attacks, we propose a text-based
Captcha algorithm in which each challenge will be associated
with main features in order to enhance the Captcha security.
Different and new features at each challenge will be
generated randomly. The randomness is used to reduce the
chance of predicting the next challenge, thus, such automated
breaking-Captcha techniques will be unable to use these
features as attack vectors when they are extracted from the
text Captcha. The main intent of this work is to provide an
effective and efficient method to generate a robust text-based
Captcha that thwarts many of breaking attacks and at the
same time provides reliable and usable way that can
distinguish between clients and computer.

Keywords: Text-based Captcha, web application security,
robust Captcha, automatic attacks thwarting.

1. INTRODUCTION
Presently, living has changed the dimension with the
introduction of the Internet to mankind, ways people
connect to each other, advertising, shopping, education,
etc. Consequently, system security has become the most
important issue for any websites since there are many
methods used to intrude the system over the internet [1].
People have developed techniques, systems, programs and
software systems that can replace a normal human being
to do a job; such kinds of jobs include entering of data
into systems, generate data automatically, handling events
that occur on or within a system [2]. As a matter of fact,
web sites must ensure that the services are supplied to
legitimate human users rather than bots to prevent service
abuse [3]. To thwart automated attacks, services often ask
users to solve a puzzle before being given access to a
service. Human Interactive Proofs (HIPs), focus on
automation tests that virtually all humans can pass but
current computer programs fail. Completely Automated
Public Turing test to tell Computers and Humans Apart
(CAPTCHA) was an acronym that was coined in 2000. It
is a type of challenge-response test that only a human
completes successfully [4]. CAPTCHAs are designed to
be simple problems that can be quickly solved by humans,
but are difficult for computers to solve. Using Captchas,
services can distinguish legitimate users from computer
bots while requiring minimal effort by the human user
[5]. In the procedure, a computer or a program creates a
test for its user, who is expected to be a human. The test
is meant for the humans, that is, it is to be solvable only
by humans and not any other machine, system or
program. The user is required to provide a correct
response to the test and then the user is permitted to
access the work. When a correct response is received, it is
presumed that the response arrived because of a human
user.

CAPTCHA techniques have been classified into four
categories [2]:
- Text based Captcha.
- Audio based Captcha.
- Image based Captcha.
- Video based Captcha.
Each type is suitable to serve different group of users

2.LITERATURE REVIEW
CAPTCHA has a number of applications on the web such
as 1) Registration of web forms: Number of websites
offers free registration to services. Unfortunately these
may be susceptible to a web robot attack which is a script,
capable of registering thousands of email account on the
internet, wasting precious web space. 2) Online polling:
The result of any online poll can only be trusted if it is
ensured only humans have polled [6]. 3) Web crawler:
CAPTCHA provides reasonable solution, when one wants
that web pages should not be crawled for indexing by
search engines [7]. Huang, et al. [8] have exploited text-
based CAPTCHA as one of the items that is used to to
mitigate the DDoS (Distributed Denial of Service) from
an IaaS cloud service. The challenge response procedure
can mitigate the traffic and decide if the packets are from
human or machine (program). However, much effort has
been made to provide a new Captcha format or more
robust captcha. For example Thomas and Kaur [2]
proposed a CAPTCHA technique that utilizes image from
custom mouse cursors and outperforms some most
popular CAPTCHA techniques such as Text based
CAPTCHAs and previous Image based CAPTCHAs.
The authors in [9] aimed to provide a method to generate
text-based Captchas which are resilient against
segmentation attack by proposing an empirical algorithm
with support of Taguchi method to guarantee the quality
of the chosen colour schemes. While the authors in [10]
tried to provide a balance between the effectiveness and
human success rates of text-based CAPTCHAs by
A More Robust Text Based CAPTCHA For
Security in Web Applications

Mumtaz M. Ali AL-Mukhtar
1
and Rana Riad K. AL-Taie
2


1
AL-Nahrain University, Information Engineering College,
Baghdad, Iraq

2
AL-Nahrain University, Information Engineering College,
Baghdad, Iraq

International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 8


proposing a text-based CAPTCHA mechanism in which
each challenge is associated with a tip providing humans
enough information to recognize the alphabets in a highly
distorted text image. Rahman et al. [7] proposed a
dynamic image based on three tier CAPTCHA system to
make cracking a difficult task for BOT. They observed
that the probability of a BOT hacking the proposed
prototype system is considerably low. While in [11] a
CAPTCHA implementation has been proposed in the
form of 3D animation and based on the weak point of
computer vision. The derived method could prevent
attacks based on both image recognition and moving
objects recognition in videos.

We aim to provide a secure scheme that is resilient to the
state of the art attackers by applying the secure design
principles at all layers. We propose a new mechanism for
Captcha text-based systems that provides a balance
between the two goals: that it is robust against the
automated programs and easy for people to be solved.
Each challenge in our mechanism consists of random and
varied significant features which provide a highly
distorted text image which will be resilient effective
against automated attacks but harmless for users.

3.PROPOSED SCHEME
During the design stage we have been addressing the
main principles which play an important role for
providing a more robust CAPTCHA. In our proposed
scheme, multiple secure features which are extremely
effective to obfuscate challenges for the breaking attack
but easy to solve by users have been applied. Our
processing will disallow replay of previously submitted
challenge (never be reused), thus for every time the page
is generated (or refreshed), the principle features will be
changed as the following:

CAPTCHAs code is a series of characters (uppercase
and lowercase) and numbers.
Multiple randomizing functions are used to generate a
random code (stream of characters and numbers) in
each challenge in order to make it not susceptible to a
dictionary attack.
The length of the code is varied (minimum length is 6
characters-numbers).
Multiple font types are handled to prevent intrusion
using image processing techniques when a consistent
font is used.
String/code are rotated at various angels.
Lines are utilized to prevent segmentation. The
numbers and the length of lines and their positions are
varied each time in order to distort the text image
randomly before being presented to the user.
The text image is blurred using a specific technique in
order to make CAPTCHA difficult for malicious
software.
Image dimensions are varied inconsistent with all the
features above.
CAPTCHAs code and line colour are kept in gray
scale colours but at a different level for each time.

Captcha is generated by creating an image from text in
php using GD library. GD library is an open source code
library which has been used in this work for the dynamic
creation of images that can be customized a lot in
different formats. Before generation the HTML code with
the input fields for displaying captcha image and
submitting the correct code/string, the captcha image
code is first created. We have two files, first file is the file
that holds the form which contains the Captcha image,
and the other one is the file that is used to generate the
captcha image.

When a user accesses a website that has been protected
with a Captcha to prevent abuse by automated programs,
the user requests a secure form from the web server. The
web server will automatically generate a random captcha
image by calling a php file (lets say image.php), then
the web server sends a file (lets say captach.php) which
holds the form associated with the random captcha image.
The steps of the devised algorithm are depicted in figure
(1) and stated as follows:

1- On the form page, rand() php functions are used to
generate random numbers (used to determine the font
size, code length, font type) and other php functions
(str_shuffle() and substr()) are used to generate a code
(string) consisting of numbers, uppercase and
lowercase letters.
2- The random numbers and string will be sent to the
server along with secure sessions that are used for
generating a CAPTCHA image.
3- The random values (for font type and font size) and the
string (code) will be sent to the server along with secure
sessions to be used for generating a Captcha image.
4- At the server side, the image.php file will be called.
Since GD library offers tons of ways to dynamically
create PNG, JPEG or GIF files and output them directly
to the browser, several GD library functions will be
used as a way to convert certain text elements into
image in order to create the Captcha image making use
of the sessions values received from the user side.
5- The CAPTCHA image is retrieved and displayed on the
form page as a challenge for the user.
6- The user reads the code, fills in the correct letters and
numbers, and his/her form is submitted.
7- The stored code in the session is verified against the
solution provided by the user. If the user does not
provide a valid code, or if the user refreshes the page, a
newly generated session (step 2) is asserted. If
verification is successful, user will be directed to next
logical step (such as website services).


International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 9



Figure 1 Captcha framework

4.IMPLENTATION PHASE
To make it possible to provide a Captcha with different
font types on every new page, we should have True Type
font files (TTF). We are deploying complex twenty TTF
files taken from the fonts folder of the window OS with
.ttf as their extension. In the first step we change the
names of these files to numbers values instead of their
original names. For example: (Curlz MT Reguler.ttf) is
changed to (1.ttf). This step will help to provide a
Captcha with different font types on every refreshed page.
These files are in the same folder in which we will place
our php script. The steps of implementation are shown in
figure 2. In Captcha.php file we will start to set the
security random numbers by making use of the rand() php
function to:

Select a random number between 6, 12 which
represents the length of the code (string) in different
values each time the page is refreshed.
Select a random number between 1, 20 which
represents the name of the a true font file for providing
different font types each time the page is refreshed.
Select a random number between 30,40 which
represents the font size for providing different font each
time the page is refreshed.
The last two randomly generated numbers will be
stored in sessions:
$_SESSION['font']=$font_type.".ttf";
$_SESSION['size'] = $font_size;
To avoid the dictionary attack, we generate a random
string (code) consisting of characters
(uppercase/lowercase) and numbers by following steps:

A string which consist of numbers from (0-9) and
letters (uppercase and lowercase) from (a-z), will be
shuffled randomly by using str_shuffle() php function.
A portion of the shuffled string will be returned using
substr() php function. this portion is specified by the
start and length parameters (the length parameter will
equal to the code length value which generated
previously randomly by using the rand function rand
(6,12)). Thus the length of the code/string is made
variable at each time.
The string/code is stored in a php session lets say
$_SESSION['secure']. So it can be accessed later for
verification and to be used by image.php file for
creating the image from that string/code. All the
previous sessions are to be satisfied to create the image
in the image.php file.
The image.php is deployed to create the Captcha image
using GD functions for creating text in image format.
This is initiated by calling the secure sessions variables
which have been setup in the previous file:
$_SESSION['font'], $_SESSION['size'] and $_SESSION
[secure]. First we specify the height and the length of the
image. In order to make the width of the image to
accommodate varying length of the string/code
dynamically we will:

Calculate the number of characters costituting the
secure code/string that has been stored in the
$_SESSION['secure'].
Multiply the result by the font size (usually stored at
$_SESSION['size'] session).
The final result will be incremented by an estimating
value until we have a good space between the text/code
and the image edge, that is when applying the
maximum font size=40 and the maximum string/code
length=12.
Specifying the foreground and the background of the
image by:
Specifying the image background colour with white
colour.
Specifying the text colour (image foreground) which
will be at gray scale levels but with different level each
time the page is refreshed. The $red, $green and $blue
variables hold the random gray scale colour values.

Determining the angle value for rotating the text in
addition to the y-coordinate and x-coordinate which is
required when writing code/string to the image, in
addition to image blurring is carried out by the following
steps:
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 10



Figure 2 Implementation steps flowchart

Randomly select the angles values that are used to
rotate the string/code at a different angle for each time
the page is refreshed.
Determine bounding box of text/code (Get exact
dimensions of text string). These dimensions will be
changed with the changing of each angle, font size,
font type and string/code values.
Get the width and the height of current code/string
from dimensions (which will be varied with every new
challenge passed on the other varied items).
Get x-coordinate of centered text horizontally using
length of the image and length of the text.
Get y-coordinate of centered text vertically using height
of the image and height of the text.
Appling the GD function called imageconvolution() to
blur the image by applying a convolution matrix on the
image, using a given coefficient and offset.
To mask this string/code and make it still readable by
humans but unreadable by computers, we use imageline()
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 11


GD function that adds lines to the image in order to
generate some of noise as follows:
Apply this function inside a for loop; the loop interval
represents the number of lines that are added and are
randomly selected within a limited range.
The lines colour will be as the text colour.
Create lines that are completely in a random position so
that they can splash to cross the image.
Change the length to be consistent with the image
width (because the width of the image is varied).
As the Captcha has been created, it is important to verify
the input against the captcha value. Recall that the secret
string/code has been stored in a session variable called
$_SESSION['secure']. Once the form is submitted, the
users value is checked against the one stored in the
$_SESSION['secure'] session. If they match, the user will
be directed to the next step otherwise an error message
would be generated and another image with new
string/code is generated.

4.RESULTS
Figures 3, 4, 5 and 6 show some of samples of the
proposed robust text-based captcha.



Figure 3 Font size =34, font file name =19.ttf, string
length =9


Figure 4 Font size =33, font file name =18.ttf, string
length =12


Figure 5 Font size =40, font file name =8.ttf, string
length =7


Figure 6 Font size =30, font file name =5.ttf, string
length =9

Every time the secure form page (captch.php file) is
refreshed, we have a new CAPTCHA image varied in:
width and height, font type, font size, string position,
number of lines with, random lines position, font and
lines colour (varying in the gray scale level), the length of
the lines in conjunction with the image dimension. The
string contains mixed combination of from numbers,
uppercase and lowercase characters.

5.CONCLUSION
As a contribution toward improving the web security in
the field of an automated challenge and response against
attacks issued by automated programs, we proposed a
more robust text_based CAPTCHA. Two main goals have
been considered to be achieved that is: simplicity of
solving the technique for a human as well as the time that
a human actually needs to find the solution. Since a weak
CAPTCHA implementation can only provide a false sense
of security, we have been addressing the principle features
which contribute in effective way to provide more secure
challenge and a combination of these features to construct
a generic text-based Captcha. To increase the difficulty
for segmentation and recognition attacks on Captchas, we
varied these significant features at each challenge in
ranges potentially acceptable to human users. Our
mechanism provides a solution to maximize the
robustness and usability of text-based CAPTCHAs
simultaneously.
International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 3, Issue 2, March April 2014 ISSN 2278-6856


Volume 3, Issue 2 March April 2014 Page 12


References
[1] Tamang T., and Bhattarakosol P. Uncover impact
factors of text-based CAPTCHA identification, 7th
International Conference on Computing and
Convergence Technology (ICCCT)., pp. 556 - 560
Dec. 2012.
[2] Thomas V.A, and Kaur K., Cursor CAPTCHA
Implementing CAPTCHA Using Mouse Cursor,
Tenth International Conference on Wireless and
Optical Communications Networks (WOCN)., pp. 1 -
5, July 2013
[3] Haichang G., Honggang L., Dan Y., Xiyang L., and
Uwe A., An Audio CAPTCHA to Distinguish
Humans from Computers, Third International
Symposium on Electronic Commerce and Security
(ISECS)., pp. 265-269, July 2010.
[4] Sushma Yalamanchili, and M. Kameswara Rao, A
Framework for Devanagari Script-based Captcha,
International Journal of Advanced Information
Technology (IJAIT)., Vol. 1, No. 4, September 2011.
[5] Gossweiler R., Kamvar M., and Baluja S., What's
Up CAPTCHA? A CAPTCHA Based On Image
Orientation, WWW09 Proceedings of the 18th
international conference on the World wide web, pp.
841-850, April 2024, 2009.
[6] Luis Von Abn, Manual Blum, Nichlas Hoper and
John Langford, CAPTCHA Using a hard problems
for security, EUROCRYPT, pp. 294311, 2003.
[7] Rahman UR., Tomar D.S., and Das S., Dynamic
Image Based CAPTCHA, International Conference
on Communication Systems and Network
Technologies (CSNT)., pp. 90-94, May 2012.
[8] Huang V.S., Huang R., and Ming Chiang, A DDoS
Mitigation System with Multi-Stage Detection and
Text-Based Turing Testing in Cloud Computing,
27th International Conference on Advanced
Information Networking and Applications
Workshops (WAINA)., pp. 655-662, March 2013.
[9] Pan Lei, and Zhou Yan, Developing an Empirical
Algorithm for Protecting Text-based CAPTCHAs
against Segmentation Attacks, 12th IEEE
International Conference on Trust Security and
Privacy in Computing and Communications
(TrustCom), pp. 636-643, July 2013.
[10] Wei-Bin Lee, Che-Wei Fan, Kevin Ho, and Chyi-Ren
Dow, A CAPTCHA with Tips Related to Alphabets
Upper or Lower Case, Seventh International
Conference, Broadband on Wireless Computing,
Communication and Applications (BWCCA)., pp.
458-461, Nov 2012.
[11] Jing-Song Cui, Jing-Ting Mei, Wu-Zhou Zhang, Xia
Wang, and Da Zhang, A CAPTCHA
Implementation Based on Moving Objects
Recognition Problem, International Conference on
E-Business and E-Government (ICEE)., pp. 1277-
1280, May 2010.

AUTHOR
Mumtaz AL-Mukhtar is associate professor at Information
Engineering College in AL-Nahrain University, Iraq. He
received M.Sc. degree in Technical Cybernetics from Czech
Technical University, Prague in 1979. He received M.Sc. in
computer engineering in 1989 and in 2001 he earned his Ph.D. in
the same field fromthe University of Technology, Iraq. He
advised tens of Ph.D dissertations and Master theses. He has
published widely in journals and conferences. His research
interests include distributed systems, cloud computing, wireless
sensor networks, social networks, pervasive computing, and
mobile learning. He also holds an Adjunct position of
electrical& computer engineering at Michigan State University,
USA.

Rana Riad received the B.SC. degrees in Computer and
Software Engineering from AL-Mustansiriya University in
2005. This paper is part of her M.SC. degree which she is
currently pursuing.

You might also like