Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

IMPROVING STRENGTH OF CAPTCHA

BY-
Alok Nandan Jha
(7503895)

Nishith V Oze
(7503906)
What is CAPTCHA
•Completely Automated Public Turing Test To Tell
Computer And Human Apart.
•Used to protect resources like emails from “bot’s”
attacks
•.should be automated, usable and secure
• e.g..:
 
•CAPTCHAS are generally of 3 types namely
• VISUAL
• AUDIO
• VIDEO

• Breaking a CAPTCHA gives its strength and helps


in developing more robust CAPTCHAS
 
NEED OF BREAKING CAPTCHA

•We want to improve the current strength of


network and for that we have to break the existing
CAPTCHAS.
•Once broken it would also help the genuine
downloader who waste hundreds of hours in
entering the CAPTCHA values yearly.
Breaking a Visual CAPTCHA
•The most widely used CAPTCHA is the visual
CAPTCHA.

Breaking it involves mainly 2 steps


•Segmentation
•Identification

•Before segmentation we also perform some


preprocessing on the image to remove the
background and some noise.
CAPTCHA Extraction

•We are using Beautiful Soup Library in python to


parse the webpage and extract all the image.

•From all the images, only the image with a


random name or named as captcha is saved.

•If the file extension is not present the file is saved


as .gif as default extension.
Preprocessing

The preprocessing is done to prepare the image for


segmentation. After preprocessing the image is
send to segmentor as an input.
Segmentation:

•The strength of CAPTCHA depends on the strength


of its segmentation problem.
•Segmentation is character location in right order.
Over the time computational powers have
increased but it has hardly affected the
segmentation process
•Segmentation problem is made hard by
introducing clutters in image or/and making the
text rotated, scaled, warped.
Segmentation is difficult problem because
•Every position has to be tested for potential solution.
•Input becomes large as it includes non valid
characters also.
•Our approach is to use sliding window method.
Identification/Recognition

•This is the final step for CAPTCHA breaking. Once


the segmentation is done it’s the job of OCR to
recognize the text.

•However nowadays OCRs are easily fooled using


distorted text. reCAPTCHA(a group that develops
CAPTCHAs) uses text that fools the OCR in
development of CAPTCHAs. Hence any development
in OCR could be used to strengthen CAPTCHAs
•Instead of using OCR we could also take the help
of machine learning for identification. We can train
a system using a large set of data to recognize from
the input passed to it.
RECOGNITION ALGORITHM
•We used LEVENSHTEIN ALGORITHM for recognition
purpose. Computing of levenshtein distance involves
the use of an (n + 1) × (m + 1) matrix,
where n and m are the lengths of the two strings. 
CAPTCHA Breaking:
•The CAPTCHA is considered broken if the computer
could correctly recognize it 0.01% of time . This 0.01%
is based on the cost involved to break the CAPTCHA.
RESULTS:

•PREPROCESSING

•NOISE REMOVAL
•ROTATION AND SMOOTHING

•SEGMENTATION
•RECOGNITION
Future prospect
•Once the CAPTCHA is broken there is a need to
develop more robust CAPTCHAs. The
development of CAPTCHA is basically a tradeoff
between human readability and computer
readability.
•The ideal real life CAPTCHA should be human
readable more than 80%of time and machine
readable less than 0.01% of time.
•Apart from Visual CAPTCHAs audio CAPTCHAs
are also gaining popularity. So there is field of
study in that area also.
THANK YOU

You might also like