0% found this document useful (0 votes)
339 views36 pages

Understanding CAPTCHA in Web Security

1. CAPTCHAs are challenge-response tests used to distinguish humans from bots by having users solve tests that are easy for humans but difficult for computers. 2. They were developed in the late 1990s and early 2000s to prevent bots from engaging in automated online activities like submitting web forms, voting in polls, and using chatrooms. 3. Common types of CAPTCHAs include text-based, graphic-based, audio-based, and book digitization challenges. However, with advances in machine learning and optical character recognition, no CAPTCHA is completely secure and breakable given enough time and resources.

Uploaded by

Abhishek Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
339 views36 pages

Understanding CAPTCHA in Web Security

1. CAPTCHAs are challenge-response tests used to distinguish humans from bots by having users solve tests that are easy for humans but difficult for computers. 2. They were developed in the late 1990s and early 2000s to prevent bots from engaging in automated online activities like submitting web forms, voting in polls, and using chatrooms. 3. Common types of CAPTCHAs include text-based, graphic-based, audio-based, and book digitization challenges. However, with advances in machine learning and optical character recognition, no CAPTCHA is completely secure and breakable given enough time and resources.

Uploaded by

Abhishek Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Captcha In Web Security : Secure or Not ?

Presented By Abhishek Sharma (08CE04)

How CAPTCHA Looks Like ?

CAPTCHA Used By Google

CAPTCHA : The Acronym


Completely Automated Public Turing Test to Tell Computers and Humans Apart

CAPTCHA : Literal Meaning


Completely : Whole

Automated :
Public :

Made by Machine
Universally Known

Turing Test to Tell :Test Presented


by Alan Turing Computers and Humans Apart

Contents
Introduction History The Need of CAPTCHA Basic Terminologies Earlier CAPTCHAs How does a CAPTCHA work? Types of CAPTCHA Implementation of CAPTCHA Can CAPTCHA be broken? CAPTCHA Guidelines Applications Benefits of CAPTCHA Limitations of CAPTCHA Conclusion

Introduction
A CAPTCHA is a type of Challenge-response test used in computing as an attempt to ensure that the response is generated by a person or by some other Computer.

It is needed because activities such as online commerce transactions, search engine submissions, Web polls, Web registrations, free e-mail service registration and other automated services are subject to software programs, or bots.

CAPTCHA : History
1997: Andrei Broder at AltaVista wanted to prevent bots from automatically submitting sites for indexing.
He decided to add a test to the submission page. He reversed Brother scanner OCR optimization techniques.

2000: Luis von Ahn, Manuel Blum & John Langford at CMU trademarked CAPTCHA.
Yahoo partnered CMU to counter these threats in Messenger chat service.

CAPTCHA : The Basic Needs


In 1999, [Link] issued an online poll asking users to pick the best computer science school in the US. Students at MIT and Carnegie Mellon University created voting bots to vote for their school multiple times

MIT finished with 21156 votes and Carnegie Mellon Finished with 21032 votes.
All other schools finished with less than 1000 votes. Proved that online polls could not be trusted unless they ensured that only humans could vote. In September 2000, Yahoo! reported that bots were entering their online chat rooms & pointing legitimate users to advertising sites.

CAPTCHA : The Basic Needs


Yahoo! turned to CMU to help them solve their problem. Luis von Ahn, Manual Blum, Nicholas Harper, and John Langford developed CAPTCHA. They determined that CAPTCHAs should : 1. Present challenges that are automatically generated and graded. 2. Be simple enough to be taken quickly and easily by humans. 3. Accept virtually all human users and reject few. 4. Reject virtually all machine users. 5. Resist automatic attacks for many years to come.

CAPTCHA : Terminologies
Bots
Turing Test

Challenge Response Test


Spam

Terminologies : BOTS
A bot is a software program on the Internet. It is a software agent that interact with other network services intended for people as if it was a real person. Types of Bot :[Link] Bots [Link] Account Registration Bots

[Link] Spam Bots

Terminologies : Turing Test


A mathematician, Alan Turing imagined a game in which three players played it. One is interrogator, who had to find out that which one is the machine. What is a Turing test?
To test a machines level of intelligence Human judge asks questions to two participants, one is a machine, he doesnt know which is which If judge cant tell which is the machine, the machine passes the test CAPTCHA employs a reverse Turing test, judge = CAPTCHA program, participant = user if user passes CAPTCHA, he is human if user fails, it is a machine

Challenge Response Test & Spam


A challenge-response test is a test involving a set of questions (or "challenges"), that the person or other entity has to answer in order to pass the test. If the person or entity provides an adequate response to the challenges, then it is seemed that this person or entity has passed the test.

Terminologies :

What is Challenge Response Test ?

What is SPAM ?
Spamming is the act of sending unwanted electronic messages in bulk. In the popular eye, the most common form of spam is that delivered in e-mail as a form of commercial advertising. Sending bulk messages in this fashion, to recipients who have not desired them, has come to be known as spamming, and the messages themselves as spam.

CAPTCHA : Earlier Design


Gimpy: A puzzle consists of a display of ten distorted and overlapping words chosen at random from a dictionary of simple words Solving the puzzle requires to identify only three of the ten words and to type them into the box provided. It looks Like below figure.
.

CAPTCHA : How does It works ?


A CAPTCHA image is generated randomly on the web page from the stored database that have two attributes: the one is for the image and the other one is for the key associated to that image. When the user has entered the letters in the textbox provided then these letters are matched with the secret key. If the key is matched then the user is redirected to the next page else the new CAPTCHA image will displayed and the same process is repeated.

CAPTCHA : Different Types of CAPTCHAs


Text Based CAPTCHA Graphic Based CAPTCHA Gimpy CAPTCHA E-Z Gimpy CAPTCHA Audio Based CAPTCHA reCAPTCHA and book digitalization

Text Based CAPTCHA


Simple, normal language questions: What is sum of three and thirty-five? If today is Saturday, what is day after tomorrow? Which of mango, table, water is a fruit? Very effective, needs a large question bank Cognitively chalenged users find it hard.

CAPTCHA :

Types of Text Based CAPTCHA Printed CAPTCHA H-CAPTCHA

Text Based CAPTCHA :


Printed CAPTCHA is difficult to break Lots of algorithms are available to generate these Humans cannot identify these very easily Two major types are there viz. Baffle text, Pessimal print.

Printed CAPTCHA

Baffle Text Based CAPTCHA


Developed by Monica Chew and Henry Baird Uses pronounceable English characters with masking that are not present in English dictionary

Pessimal Print Image CAPTCHA


Developed by Allison Coates and Henry Baird and Richard Fateman Uses the degradation model simulating physical defects caused by printing and scanning of printed text

Graphic Based CAPTCHA

CAPTCHA :

BONGO
1. A visual recognition problem. 2. Two sets of shapes with a distinguishing characteristic. 3. Must choose which set the shape belongs to.

PIX
A database of labeled images of recognizable objects Randomly chooses an object and displays N pictures of it.

Must correctly identify the object.


Pictures are distorted. Image based captcha .

Gimpy CAPTCHA
Gimpy CAPTCHA :
Designed by Yahoo and CMU. Picks up 10 random words from dictionary and distorts, fills with noise. User has to recognize at least 3 words. If user is correct, he is admitted. Below is a Example of Gimpy.

CAPTCHA :

E-Z Gimpy CAPTCHA


EZ-Gimpy CAPTCHA:
A modified version of Gimpy. Yahoo used this version in Messenger. Has only 1 random string of characters. Not a dictionary word, so not prone to dictionary attack. Not a good implementation, already broken by OCRs.

CAPTCHA :

Audio Based CAPTCHA


Audio CAPTCHAs:
Consist of downloadable audio clip User listens and enters the spoken word Helps visually disabled users Below is the Googles audio enabled CAPTCHA Not popular

CAPTCHA :

reCAPTCHA & Book Digitalization


Verify digitized books: reCAPTCHA Used in Google Books Project Two words are shown, the program knows first word If user enters first word correctly, it assumes that the second unknown word will also be entered correctly Second word becomes known

CAPTCHA :

Implementation & Creation


Creating CAPTCHA in Different Fashion
1. One way to create a CAPTCHA is to pre-determine the images and solutions it will use. This approach requires a database that includes all the CAPTCHA solutions, which can compromise the reliability of the test. 2. A CAPTCHA can be created using a Image and some characters by applying some effects on them like blurring, distortion etc. 3. One can make His/her Own CAPTCHA for a web forum by using some randomize function in which Some sort of strings are generated randomly. 4. a CAPTCHA might include series of shapes and ask the user which shape among several choices would logically come next. The problem with this approach is that not all humans are good with these kinds of problems and the success rate for a human user can go below 80 percent.

CAPTCHA :

CAPTCHA :
Implementation

There are two basic Implementation of CAPTCHA for a Website or Web Forum.
1. Embeddable CAPTCHAs : The easiest implementation of a
CAPTCHA to a Website would be to insert a few lines of CAPTCHA code into the Websites HTML code, from an open source CAPTCHA builder, which will provide the authentication services remotely. Most such services are free. Popular among them is the service provided by [Link] s reCAPTCHA project.

2. Custom CAPTCHAs: These are less popular because of the


extra work needed to create a secure implementation. Anyway, these are popular among researchers who verify existing CAPTCHAs and suggest alternative implementations.

Can CAPTCHA be broken ?

CAPTCHA :

The answer to this question is: YES! Given enough effort, absolutely every CAPTCHA algorithm can be broken.

Breaking A CAPTCHA

CAPTCHA :

A very Popular method used for breaking a CAPTCHA is OCR(Optical Character Recognition). Most text based CAPTCHAs have been broken by software Computer Character Recognition. Other CAPTCHAs were broken by screaming the tests for unsuspecting users to solve.

Computer Character Recognition

Breaking A CAPTCHA :

A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality:

[Link] processing [Link] [Link]

Computer Character Recognition :


Pre-processing
Application of algorithms to remove the effects of distortion, blurring, clutter, background noise, etc. Easy problem for computers to solve.

Step By Step Process

Segmentation
Splitting the image into regions which contain a single character. Complex and computationally expensive.

Character Recognition
OCR software used to identify the characters

Guidelines For CAPTCHA


Accessibility
All users need to have access to the protected site. For example, visually-impaired users need audio CAPTCHAs.

CAPTCHA :

Image Security
Images must be secure enough to prevent OCR-based attacks. Random and thorough distortion techniques.

Script Security
Programs must be secure as well. Passwords passed in encrypted text. Destroy sessions after a CAPTCHA is solved.

Security After Widespread Adoption


Large pool of dictionary or words or images. Phonetic generators and nonsense words.

Guidelines For CAPTCHA


Security from OCR is achieved by randomness:

CAPTCHA :

Making the letters wiggly:


Adding noise or lines: Using a messy background: Crowding or blending letters: Segmenting characters: Varying font thickness, color:

Applications Of CAPTCHA

CAPTCHA :

1. 2. 3. 4. 5. 6. 7. 8. 9.

Online Polls Protecting Web Registration: Preventing comment spam Search engine bots E-Ticketing Email spam Preventing Dictionary Attacks As a tool to verify digitized books Improve Artificial Intelligence (AI) technology

Benefits of CAPTCHA

CAPTCHA :

Using a CAPTCHA significantly narrows the number of potential attackers on your website. CAPTCHA images ensure that not every beginner hacker can attack your web forms.
You can always change the algorithm used if the previous one is broken. It's highly unlikely that a hacker will spend his entire time trying to break new algorithms as you change them.

Limitations of CAPTCHA

CAPTCHA :

CAPTCHA is not 100% solution for all the problems like BOTs and Spams. CAPTCHA can be broken. 1. Using Computer Character Recognition software. 2. Using cheap human labor to process the test.

CAPTCHA :

Conclusion

As with all security solutions, risk can only be decreased, but there is no such thing as a single security measure that is 100% safe. But the presence of a CAPTCHA is always necessary when you need to enhance the stability and security of any web service or application. So a CAPTCHA is a technique that can generate and grade that : A human can pass very easily but its not so easy for any computer or software program.

! QUERIES !

Common questions

Powered by AI

CAPTCHA plays a significant role in AI and machine learning by providing a rich source of labeled data necessary for training algorithms . reCAPTCHA, in particular, has been used to improve machine learning models by helping verify digitized text, contributing to projects like Google Books, where users recognize text that algorithms struggle with, effectively turning CAPTCHA solutions into data for validating AI . This symbiotic relationship underscores CAPTCHA's utility beyond mere security, enhancing AI's ability to interpret distorted text by leveraging human input .

CAPTCHAs have varied types, each with distinct functionalities. Text-based CAPTCHAs use simple questions or distorted text that is easy for humans but difficult for machines . Graphic-based CAPTCHAs like BONGO and PIX involve visual recognition problems and require identifying images or shapes . Gimpy CAPTCHA distorts random words for user recognition, not requiring dictionary attacks, but it is vulnerable to OCR . Audio CAPTCHAs allow users to hear sounds and type the correct answer, benefiting visually impaired users . Each type has its pros—like user-friendliness and accessibility—and cons, such as susceptibility to OCR (text-based) or limited use (audio).

CAPTCHA technology can be compromised through Optical Character Recognition (OCR), which is effective against text-based CAPTCHAs, and by enlisting humans to solve tests cheaply . To counter this, security measures involve adding randomness to CAPTCHAs, such as making text wiggly, adding noise, using messy backgrounds, blending letters, and randomizing font characteristics . While no CAPTCHA is 100% secure, changing algorithms frequently and using complex distortions can deter attacks .

Since its inception, CAPTCHA has evolved from simple text-based challenges to more sophisticated forms like audio and graphic-based CAPTCHAs . Initial designs like Gimpy used distorted dictionary words, whereas advancements led to E-Z Gimpy with non-dictionary characters to avoid dictionary attacks . Design improvements aimed at enhancing security include complex distortions and layered security techniques . Applications have expanded to diverse areas including AI training and assisting in book digitization projects . This evolution demonstrates CAPTCHA's adaptability to changing security needs and technological advances.

The Turing Test and CAPTCHA share a foundational principle of distinguishing humans from machines. The Turing Test involves a human judge determining which participant is a machine based on responses . CAPTCHA employs a reverse Turing Test scenario where the program, as the judge, assesses if participants are human . This relationship highlights both tools' reliance on cognitive abilities that are uniquely human, illustrating the nuances in human versus machine interaction and reasoning .

CAPTCHA benefits web security by preventing unauthorized automated access, thereby reducing vulnerabilities to beginner hackers . It narrows the pool of attackers by introducing challenges only humans can easily navigate, safeguarding online interactions . However, CAPTCHAs are not foolproof due to the potential of technical breakthroughs like OCR and manual human intervention to solve CAPTCHAs . Thus, while strengthening security, CAPTCHAs do not offer absolute protection and must be part of a broader security strategy .

CAPTCHA has limitations as it is not an all-encompassing solution against bots and spam. It can be broken using OCR technology or by distributing the challenges to human solvers via crowdsourcing . While CAPTCHA significantly reduces the chances of automated attacks by filtering out basic bots, it cannot deter sophisticated attacks entirely or prevent human-assisted solutions . Its effectiveness is constrained by the ever-evolving capabilities of technology that aim to bypass such security measures .

To remain an effective security tool, CAPTCHA must adhere to guidelines such as ensuring accessibility for all users, involving alternative formats like audio for visually impaired users . Images in CAPTCHAs should be secure against OCR attacks by using random, thorough distortions . Additionally, script security must be maintained by passing data in encrypted text and destroying sessions post-solution . Adopting a large pool of diverse challenge inputs and regularly updating the CAPTCHA mechanism are also crucial to counter evolving threats .

CAPTCHAs are widely used in modern web services for applications like protecting online polls from tampering, preventing automated web registration and comment spam, stopping search engine bots from overloading resources, and combating email spam through account verification . They are also used in preventing dictionary attacks and verifying digitized books . By ensuring that interactions on these platforms involve human participation, CAPTCHAs greatly enhance the stability and security of online services against bot attacks .

CAPTCHA technology was developed due to the growing need to prevent automated software programs, or bots, from abusing online services. In 1997, Andrei Broder at AltaVista wanted to prevent bots from automatically submitting sites for indexing and used a test to mitigate this issue . The turning point came in 1999 when MIT and Carnegie Mellon University students manipulated online polls using voting bots, compelling online polls to become untrustworthy unless they ensured only humans could vote . Yahoo! faced a similar bot intrusion problem in chat rooms, which led them to partner with CMU to develop CAPTCHA . The primary goals included ensuring challenges were simple for humans yet difficult for machine users and resisting automatic attacks .

You might also like