CAPTCHA:
Telling Humans and Computers Apart Automatically
A CAPTCHA is a program
that protects websites against bots by generating and grading tests that humans
can pass but current computer programs cannot. For example, humans can read
distorted text as the one shown below, but current computer programs can't:
The term CAPTCHA (for
Completely Automated Public Turing Test To Tell Computers and Humans Apart) was
coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford
of Carnegie Mellon University.
Get a Free CAPTCHA For Your Site
A free, secure and
accessible CAPTCHA implementation is available from the reCAPTCHA
project. Easy to install plugins and controls are available for WordPress, MediaWiki, PHP, ASP.NET, Perl, Python, Java, and many other
environments. reCAPTCHA also comes with an audio test to ensure that blind
users can freely navigate your site. reCAPTCHA is our officially recommended
CAPTCHA implementation.
Test
Drive a CAPTCHA
- reCAPTCHA. Stop spam and help
digitize books at the same time! The words shown come directly from old
books that are being digitized.
- SQUIGL-PIX. Our newest CAPTCHA!
- ESP-PIX. A CAPTCHA script that's
close to our hearts. Instead of typing letters, you authenticate yourself
as a human by recognizing what object is common in a set of images. This
was the first example of a CAPTCHA based on image recognition.
Applications
of CAPTCHAs
CAPTCHAs have several
applications for practical security, including (but not limited to):
- Preventing Comment Spam
in Blogs. Most
bloggers are familiar with programs that submit bogus comments, usually for the
purpose of raising search engine ranks of some website (e.g., "buy penny
stocks here"). This is called comment spam. By using a CAPTCHA, only
humans can enter comments on a blog. There is no need to make users sign up
before they enter a comment, and no legitimate comments are ever lost!
- Protecting Website
Registration. Several
companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a few
years ago, most of these services suffered from a specific type of attack:
"bots" that would sign up for thousands of email accounts every
minute. The solution to this problem was to use CAPTCHAs to ensure that only
humans obtain free accounts. In general, free services should be protected with
a CAPTCHA in order to prevent abuse by automated scripts.
- Protecting Email
Addresses From Scrapers. Spammers
crawl the Web in search of email addresses posted in clear text. CAPTCHAs
provide an effective mechanism to hide your email address from Web scrapers.
The idea is to require users to solve a CAPTCHA before showing your email
address. A free and secure implementation that uses CAPTCHAs to obfuscate an
email address can be found at reCAPTCHA
MailHide.
- Online Polls. In November 1999, http://www.slashdot.org released an online poll asking which
was the best graduate school in computer science (a dangerous question to ask
over the web!). As is the case with most online polls, IP addresses of voters
were recorded in order to prevent single users from voting more than once.
However, students at Carnegie Mellon found a way to stuff the ballots using
programs that voted for CMU thousands of times. CMU's score started growing
rapidly. The next day, students at MIT wrote their own program and the poll
became a contest between voting "bots." MIT finished with 21,156
votes, Carnegie Mellon with 21,032 and every other school with less than 1,000.
Can the result of any online poll be trusted? Not unless the poll ensures that
only humans can vote.
- Preventing Dictionary
Attacks. CAPTCHAs
can also be used to prevent dictionary attacks in password systems. The idea is
simple: prevent a computer from being able to iterate through the entire space
of passwords by requiring it to solve a CAPTCHA after a certain number of
unsuccessful logins. This is better than the classic approach of locking an
account after a sequence of unsuccessful logins, since doing so allows an
attacker to lock accounts at will.
- Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent
others from finding them easily. There is an html tag to prevent search engine
bots from reading web pages. The tag, however, doesn't guarantee that bots
won't read a web page; it only serves to say "no bots, please."
Search engine bots, since they usually belong to large companies, respect web
pages that don't want to allow them in. However, in order to truly guarantee
that bots won't enter a web site, CAPTCHAs are needed.
- Worms and Spam. CAPTCHAs also offer a plausible solution against email worms and
spam: "I will only accept an email if I know there is a human behind the
other computer." A few companies are already marketing this idea.
Guidelines
If your website needs
protection from abuse, it is recommended that you use a CAPTCHA. There are many
CAPTCHA implementations, some better than others. The following guidelines are
strongly recommended for any CAPTCHA code:
- Accessibility. CAPTCHAs must be accessible. CAPTCHAs based solely on reading text
— or other visual-perception tasks — prevent visually impaired users from
accessing the protected resource. Such CAPTCHAs may make a site incompatible
with Section 508 in the United States. Any implementation of a CAPTCHA should
allow blind users to get around the barrier, for example, by permitting users
to opt for an audio or sound CAPTCHA.
- Image Security. CAPTCHA images of text should be distorted randomly before being
presented to the user. Many implementations of CAPTCHAs use undistorted text,
or text with only minor distortions. These implementations are vulnerable to
simple automated attacks.
- Script Security. Building a secure CAPTCHA code is not easy. In addition to making
the images unreadable by computers, the system should ensure that there are no
easy ways around it at the script level. Common examples of insecurities in
this respect include: (1) Systems that pass the answer to the CAPTCHA in plain
text as part of the web form. (2) Systems where a solution to the same CAPTCHA
can be used multiple times (this makes the CAPTCHA vulnerable to so-called
"replay attacks"). Most CAPTCHA scripts found freely on the Web are
vulnerable to these types of attacks.
- Security Even After
Wide-Spread Adoption. There
are various "CAPTCHAs" that would be insecure if a significant number
of sites started using them. An example of such a puzzle is asking text-based
questions, such as a mathematical question ("what is 1+1"). Since a
parser could easily be written that would allow bots to bypass this test, such
"CAPTCHAs" rely on the fact that few sites use them, and thus that a
bot author has no incentive to program their bot to solve that challenge. True
CAPTCHAs should be secure even after a significant number of websites adopt
them.
- Should I Make My Own
CAPTCHA? In
general, making your own CAPTCHA script (e.g., using PHP, Perl or .Net) is a
bad idea, as there are many failure modes. We recommend that you use a
well-tested implementation such as reCAPTCHA.
The "Pornography Attack" is Not a Concern
It is sometimes rumored that
spammers are using pornographic sites to solve CAPTCHAs: the CAPTCHA images are
sent to a porn site, and the porn site users are asked to solve the CAPTCHA
before being able to see a pornographic image. This is not a security concern for
CAPTCHAs. While it might be the case that some spammers use porn sites to
attack CAPTCHAs, the amount of damage this can inflict is tiny (so tiny that we
haven't even noticed a dent!). Whereas it is trivial to write a bot that abuses
an unprotected site millions of times a day, redirecting CAPTCHAs to be solved
by humans viewing pornography would only allow spammers to abuse systems a few
thousand times per day. The economics of this attack just don't add up: every
time a porn site shows a CAPTCHA before a porn image, they risk losing a
customer to another site that doesn't do this.
Advancing Artificial Intelligence
CAPTCHA tests are based on
open problems in artificial intelligence (AI): decoding images of distorted
text, for instance, is well beyond the capabilities of modern computers.
Therefore, CAPTCHAs also offer well-defined challenges for the AI community,
and induce security researchers, as well as otherwise malicious programmers, to
work on advancing the field of AI. CAPTCHAs are thus a win-win situation:
either a CAPTCHA is not broken and there is a way to differentiate humans from
computers, or the CAPTCHA is broken and an AI problem is solved.