Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29


• Definition
• Background
• Types
• Applications
• Constructing CAPTCHAs
• Breaking CAPTCHAs
• Issues with CAPTCHAs
• Conclusion
 CAPTCHA Completely Automated Public
Turing test to tell Computers and Humans

 Invented at CMU by Luis von Ahn, Manuel

Blum, et. al

 A program that is a challenge – response test

to separate humans from computer programs
 Generic CAPTCHAs distort letters and

 Distorted characters are presented to user

 User has to recognize the distorted letters

 If the guessed letters are correct, the user is

inferred to be a human and allowed access

 Else, user is a bot and denied access

• Humans can read the distorted and noisy text

• Current OCRs cannot read them

 Why CAPTCHA was needed?

 Sabotage of online polls

 Spam emails
 Abusing free online accounts
 Tampering with rankings on recommendation
systems (like EBay, Amazon)
 Altavista first used a crude CAPTCHA in
their sites

 Resulted in 95% spam reduction

 Yahoo partnered CMU to counter these

threats in Messenger chat service.

 Luis von Ahn and Manuel Blum of CMU

trademarked CAPTCHA in 2000
Turing Test
 What is a Turing test?
o Proposed by Alan Turing
o To test a machine’s level of intelligence
o Human judge asks questions to two participants,
one is a machine, he doesn’t know which is
o If judge can’t tell which is the machine, the
machine passes the test
o CAPTCHA employs a reverse Turing test,
 judge = CAPTCHA program,
 participant = user
 if user passes CAPTCHA, he is human
 if user fails, it is a machine

Text Based Audio Based Graphics Based

Bongo PIX
Gimpy Ez-Gimpy MSN Passport Service
Text Based : CAPTCHAS
• Simple, normal language questions
•What is sum of three and thirty-five?

•If today is Saturday, what is day after tomorrow?

• Which of mango, table, water is a fruit?

• Very effective, needs a large question bank.

• Cognitively challenged users find it hard .
 Gimpy:
o Designed by Yahoo and CMU
o Picks up 10 random words from dictionary and
distorts, fills with noise
o User has to recognize at least 3 words
o If user is correct, he is admitted
 EZ-Gimpy:
o A modified version of Gimpy
o Yahoo used this version in Messenger
o Has only 1 random string of characters
o Not a dictionary word, so not prone to dictionary
o Not a good implementation, already broken by
 MSN’s Passport service CAPTCHAs:

o Provided for Microsoft’s MSN services

o Use 8 characters
o Warping is used to distort
o Very strong implementation, hasn’t been broken
o It is segmentation-resistant
 Graphic based CAPTCHAs:

o After M.M.Bongard, pattern recognition expert
o User has to solve a pattern recognition problem
o Has to tell the distinct characteristic between
two sets of figures
o Then tell to which set a given figure belongs to
 PIX:
o Uses a large database of labelled images
o It shows a set of images, user has to recognize
the common feature among those
o E.g., Pick the common characteristic among the
following four pictures-----”Aeroplane”
 Audio CAPTCHAs:
o Consist of downloadable audio clip
o User listens and enters the spoken word
o Helps visually disabled users
o Below is the Google’s audio enabled
o Not popular
 Protect online polls

 Prevent Web registration abuse, protect

passwords from brute-force attack

 Prevent comment spam and spam emails

 E-Ticketing, prevent scalping

 Verify digitized books: reCAPTCHA
o Used in Google Books Project .
o Two words are shown, the program knows
first word.
o If user enters first word correctly, it assumes
that the second unknown word will also be
entered correctly.
o Second word becomes “known”.
 Help advance AI knowledge

 CAPTCHAs are called Hard-AI problems

 A win-win scenario:
o If CAPTCHAs are broken by a bot, a Hard-AI
problem is solved
o If its not yet broken, then current implementation
is able to withstand attacks

 Thus AI knowledge is advanced if CAPTCHAs

are broken
Constructing CAPTCHAs
 Things to keep in mind:
o Don’t store CAPTCHA solution in Web page’s

o A CAPTCHA is no good if it doesn't distort

o Need a large database of different CAPTCHA


o Avoid repetition of questions

 CAPTCHA Logic:

 Generate the question

 Persist the correct answer

 Present the question to user

 Evaluate answer, if incorrect, start again--

Generate a different CAPTCHA

 If correct, allow access to user

 Embeddable CAPTCHAs:
o Availablefreely, just embed code into Web
page’s HTML, from e.g.,
o No maintenance

 Custom CAPTCHAs:
o Fitsto the theme of the page
o Better protected from spammers

 Can be written in any language– Perl, .NET,

ASP, JavaScript
 Guidelines:
o Accessibility

o Image security

o Script security

o Security after widespread adoption

o Custom
implementation or a general
Free CAPTCHA service that helps to digitize books,
newspapers and old time radio shows.
Re-CAPTCHA improves the process of digitizing books
by sending words that cannot be read by computers to the
Web in the form of CAPTCHAs for humans to decipher.
Each word that cannot be read correctly by OCR is placed
on an image and used as a CAPTCHA.
This is possible because most OCR programs alert you
when a word cannot be read correctly.
Working of Re-CAPTCHA

 Two words are shown, one word is known as Control

Word, and another one is known a questionable word.
 System assumes that if human types the control word
correctly, the questionable word is also correct.
 The identification performed by each OCR program is
given a value of 0.5 points, and each interpretation by
a human is given a full point.
 Once a given identification hits 2.5 votes, the word is
considered called.
 Social engineering to break CAPTCHAs:
o Spammer encounters a CAPTCHA
o That CAPTCHA is copied to another site
o Humans are baited, e.g., free MP3s
o To get those MP3s, users are told to solve the
copied CAPTCHA
o Solution is routed to the spammer
 Solution: Fix a time-to-live period for a

 CAPTCHA cracking as a business:

o Firms offer CAPTCHA cracking service in
exchange for money
 Usability issues:
o W3C mandates Web to be accessible to all
o Some CAPTCHAs are inaccessible to visually
impaired, cognitively challenged people

 Compatibility issues:
o JavaScript may need to be activated in browsers
o Some may need Adobe Flash plugin installed
 CAPTCHAs are an effective way to counter
bots and reduce spam
 They serve dual purpose– help advance AI
 Applications are varied– from stopping bots
to character recognition & pattern matching
 Some issues with current implementations
represent challenges for future

You might also like