175

Have any programming methods have been used to defeat reCAPTCHA?

I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods.

To clarify, not looking for reCAPTCHA-cheating solutions that involve humans in any way, whether teams tasked with filling out CAPCHAs, porn-seekers, or Mechanical Turk.

I'm also not looking for alternatives to reCAPTCHA, like picking the type of animal, or background fields or javascript trickery.

15
  • 20
    the amount of misinformation in these answers is ASTONISHING. If ReCaptcha has been "broken", then someone better tell Facebook, Craigslist, and TicketMaster, stat! :p Commented Jun 26, 2009 at 9:16
  • 17
    Jeff, they HAVE been told, and the only misinformation is referring to CAPTCHA as a valid security mechanism. It has been empirically broken, both in common implementations AND in theory (not just reCAPTCHA, but the very concept of CAPTCHA). On the other hand, its not COMPLETELY valueless, I've actually referred to this very site as a valid use-case for CAPTCHA - in addition to the many other mechanisms, it can work together to cost the "attackers" just a little bit more.
    – AviD
    Commented Jun 26, 2009 at 10:54
  • 15
    I'm disappointed that the subject doesn't have pwned in it
    – skaffman
    Commented Jan 30, 2010 at 22:01
  • 2
    Some more research on the topic: schneier.com/blog/archives/2010/10/analyzing_captc.html. Actually I found the comments more interesting than the post or research itself...
    – AviD
    Commented Oct 20, 2010 at 13:35
  • 10
    Oo! Best CAPTCHA ever! xkcd.com/810
    – AviD
    Commented Oct 25, 2010 at 7:06

14 Answers 14

94
+50

I notice that almost all the answers here relate to the ineffectiveness of the concept of CAPTCHA, in principle - and while I very much agree with them, in fact gave a talk at OWASP a few months ago explaining just that - the question is very specific, so I will provide for a demonstration.
But first, I will reiterate that demonstration aside, re-read the other comments, since it's truth that CAPTCHA is pointless and not helpful, irrelevant of implementation....

But really, check out CAPTCHA Killer. You can upload a CAPTCHA image, and it will automatically, if not immediately, provide the OCR'd answer. It also provides for an API (REST, I think, but maybe also SOAP). I personally tried numerous reCAPTCHA images, and it was actually some of the easiest ones (or at least quickest) broken.

UPDATE: CAPTCHA Killer's website is now taken down, apparently under legal pressure. See http://captcha.org/ for a complete overview of the topic.

And yeah, OCR is not the best way to break a CAPTCHA protected site - there are many other better ways.

9
  • 3
    I wonder how captcha killer works. Somehow it looks to me like it's using cheap labour and making money with the advertisement on the website. (And merchandising.) Commented Feb 19, 2009 at 23:40
  • 3
    Useful answer about captchas in general, but the question was about reCAPTCHA specifically.
    – Mike
    Commented Oct 3, 2009 at 2:40
  • 2
    Just tried Captcha Killer with three reCAPTCHAs. All three expired without returning an answer.
    – lfaraone
    Commented Oct 3, 2009 at 20:17
  • 21
    CAPTCHA Killer seems to have be killed: it has been violently destroyed by multinational corporations seeking to spread their overlord dominion and eliminate the freedom of creative expression! Such a beautiful killer, such an early death!
    – Kiril
    Commented Feb 24, 2011 at 22:46
  • 5
    I think its just change of domain and the version become paid now, check this bypasscaptcha.com/captchakiller.php
    – MarmiK
    Commented Sep 6, 2013 at 10:38
54

You might be interested in this detailed report on how 4chan defeated reCAPTCHA, and used it to manipulate Time.com's annual TIME 100 Poll results.

Hacking Recaptcha (aka ‘The Penis Flood’)

The next tactic used was to see if they could find a flaw in the reCAPTCHA implementation. One thing they discovered about reCAPTCHA was that it always presents two words to a user for decoding - one word is a control word known by the reCAPTCHA system, while the other is an unknown word (reCAPTCHA uses the humans to help correct OCR errors). Wikipedia describes the process: “Scanned text is subjected to analysis by two different optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. The word is displayed along with a control word already known and is labeled by the human. Those words that are consistently given a single label by human judges are recycled as control words”. 2iasdo4 What Anonymous realized was that if they always labeled the unknown scanned text with the same word - and if they did this thousands and thousands of times eventually a large percentage of the unknown words would be mislabeled with their word. All they had to do was look at the two words in the captcha, enter the proper label for the ‘easy’ one (presumably that would be the one that the two optical scanners would agree upon) and enter the word “penis” for the hard one. If they did this often enough, then soon a significant percentage of the images would be labeled as ‘penis’ and the ability to autovote would be restored (one side effect, that was not lost on Anonymous, was the notion that for years to come there would be a number of digital books with the word ‘penis’ randomly inserted throughout the text. Update: I asked Ben Maurer, chief engineer of reCAPTCHA about this ‘penis flood‘ attack, Ben says that they’ve anticipated this type of attack and they have numerous protections that will keep the penises from penetrating the reCAPTCHA barrier.

Optimizing reCAPTCHA

As appealing as the notion of sprinkling the word ‘penis’ into texts, the Anonymous team knew that the clock was ticking, and if they were going to restore the Message they didn’t have time to wait for the autovoters to come back online - they were going to have to vote manually, many, many times. And so they needed to be able to enter captcha’s as fast as they could. They developed a set of guidelines that allowed them to quickly decide which reCAPTCHA words they could skip. For example:

You will be given 2 words: 1 real, 1 fake.

For [REAL FAKE] or [FAKE REAL], you can just type in REAL and it should be accepted.

If it’s [LOOKSREAL LOOKSREAL] or [LOOKSFAKE LOOKSFAKE], it’s usually just quicker to just type in both words. Don’t waste precious time deciding which one of them is real.

Use both the appearance and the type of word to identify a fake word. Don’t rely on just one of them.

The whole ruleset is here: fake captcha.

4
  • 4
    But is not the point of that story that they did not break reCAPTCHA? They instead succeeded by streamlining the manual voting process to allow determined volunteers to vote thousands of times each.
    – pdc
    Commented Jan 19, 2010 at 10:24
  • 4
    @pdc, just because they didnt OCR the images (though this could also have been done), doesnt mean they didnt break reCAPTCHA. Think about it like this: Is the purpose of reCAPTCHA to present undecipherable images? Or is it to prevent automated flooding? If its the first, you might be able to argue that it was not broken (arguable, but I would not agree with you), but if its the second - then you have empiric proof that reCAPTCHA does not work. I also think it should be quite clear that aside from entertainment value, the SECOND purpose is the real one, and only one that counts.
    – AviD
    Commented Jan 28, 2010 at 10:50
  • @AviD Huh? According to the article, automated flooding was no longer possible. Rather, dedicated people were able to vote several times faster than they otherwise could (and various non-captcha-related techniques were used to thwart ineffective measures against such heavy voting by humans). Basically equivalent to using cheap human labor - which reCAPTCHA of course doesn't claim to stop. Commented Apr 18, 2019 at 0:08
  • @ToolmakerSteve that's exactly the issue, reCAPTCHA doesn't try to stop the real problem. CAPTCHA tries to solve the wrong problem, badly.
    – AviD
    Commented Apr 18, 2019 at 6:44
32

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

Not much you can do about that really.

It's also far cheaper than trying to do image recognition, OCR, etc on the actual image (you may get a response for under $0.01 the other way).

13
  • 63
    Or even better, they grab the captcha off your site, and show it to some wanker (literally) as a requirement to showing them some porn. Commented Jan 15, 2009 at 23:57
  • 2
    Man... that's clever (credit where credit is due).
    – cletus
    Commented Jan 16, 2009 at 0:04
  • 7
    Note that this doesn't make it an ineffective tool. It merely means that if your site is popular enough then this might happen. For the other 99.99% of the websites in the world, a simple captcha will do.
    – Robert P
    Commented Jan 16, 2009 at 0:13
  • 1
    Hell, CodingHorror's captcha doesn't even change, nor is it obfuscated, and it manages to do the job all right!
    – Robert P
    Commented Jan 16, 2009 at 0:14
  • 6
    Actually, that's not entirely true. Although there are examples of this, it is FAR cheaper to OCR-crack a CAPTCHA. Using sweat shops are usually NOT economically feasible for the spammers. Commented Feb 15, 2009 at 15:07
21

Before giving in to the pressure of using captcha, consider creative workarounds such as having a field labeled "Your Comments" that is hidden by CSS. If the field is entered, the request is dropped by the server. Most bots will fall for it even if there is still not a good way to defeat the room full of underpaid laborers, which captcha does not help with anyways.

UPDATE: Just read a case study where removing CAPTCHA increased conversion rates by almost 10%. That would indicate to me that it is rather broken if you are losing 10% of your leads just to filter out bots. Imagine what 10% means to most businesses.

6
  • 2
    This is very smart but doesn't work if you're sufficiently popular. Yahoo or Google, for example, could never use this.
    – dreeves
    Commented Feb 15, 2009 at 18:40
  • 2
    The question here is whether your site is valuable enough to attack specifically. Most aren't, and having little idiosyncrasies will do some good. Commented Feb 17, 2009 at 20:08
  • 3
    I would +1 for the update re 10% loss - VERY important point. (but I can't +1 cuz of the hidden field suggestion - this is less than useless.)
    – AviD
    Commented Feb 18, 2009 at 7:06
  • 2
    There are 2 problems "targeted attack" and "random spam". Your solution might save your ass for random spam, a targeted attack will flood your system within a day though.
    – dr. evil
    Commented Feb 20, 2009 at 12:14
  • 1
    @dreeves: didnt google just acquire reCAPTCHA?
    – Prabu
    Commented Nov 19, 2009 at 2:31
18

My favorite captcha is from Microsoft: http://research.microsoft.com/en-us/um/redmond/projects/asirra/

Asirra (Animal Species Image Recognition for Restricting Access) is a HIP that works by asking users to identify photographs of cats and dogs. This task is difficult for computers, but our user studies have shown that people can accomplish it quickly and accurately. Many even think it's fun!

It is a free service and they have example code to get you started.

I wonder how long it will be before it is cracked.

7
  • 1
    Unfortunately cletus's answer above shows how such a service will be ineffectual in the greater fight against spam. Commented Jan 15, 2009 at 23:46
  • 1
    i failed that one 2 out of 4 times, a badly lit picture of a Pomeranian can look like a cat :( Commented Jan 16, 2009 at 0:05
  • 3
    I took the test and it feels good to know that I am a human. :)
    – BoltBait
    Commented Jan 16, 2009 at 0:29
  • 5
    Actually the best captcha used to be HotCaptcha - but its offline last time i checked. Based on HotOrNot.com, it wasnt horribly effective, but VERY popular with the users :-)
    – AviD
    Commented Feb 14, 2009 at 21:31
  • 2
    The issue here is that it would be very easy to brute force due to a small key space. If yuo start adding more objects to name then you get into ambiguity in naming (example, is it a Kangaroo, a Joey, or a Baby Kangaroo?). You would need to make sure you had a one to many relation between objects to be named and their possible names.
    – Oorang
    Commented Dec 30, 2009 at 3:23
11

reCAPTACHA isn't broken and it won't be for a very long time. The thing is, if you implement your own captcha if it's broken, it probably takes a long time to fix it.

This is taken from the page about reCAPTCHA security:

reCAPTCHA is a Web service. That means that all the images are generated and graded by our servers. (…) this also provides an extra level of protection: our CAPTCHAs can be automatically updated whenever a security vulnerability is found.

For example, if somebody writes a program that can read our distorted images, we can add more distortions in very little time, and without Web masters having to change anything on their side.

I believe as they are specialized on captchas they have improved versions stored, ready to be deployed in little time if needed. (Why should they create stronger security when the weaker isn't broken yet?)

0
9

Not only has it been defeated, but also a useful application has been successfully built on top of it, to become the most amazing tool to defeat all kind of free-account protections of a big list of direct download sites (not only megaupload and rapidshare).

Jdownloader is open source and written in Java so a peek at the source code can answer not only if it is broken but also how.

Edit: Most of direct download sites do not use reCaptcha, but a simpler Captcha method (3 capital letters colored in different colors). Nonetheless Jdownloader and Cryptload (a program similar to Jdownloader) are the only working implementations that I know that effectively have broken a Captcha method. I have not heard of any implementation to crack reCaptcha.

Update: It seems that at least one implementation of reCaptcha (not whole reCaptcha itself) has been cracked too.

Update Dec 2010: Jdownloader seems at last to be defeating reCaptcha. The plugin is still experimental and works only on Windows versions of Jdownloader, but, as I have been told by a mate who tried it, it does work.

2
  • 2
    Do you know which one of those filehosters use RE-captcha because rapidshare and megaupload don't.
    – dr. evil
    Commented Feb 16, 2009 at 8:04
  • @dr.evil it was covering a list of hosters almost all we can say, as the list was containing many that we minght not have heard any time, the program was smart enough to break most of captcha and if not it was prompting user for the same, ain't it useful. I have used that in past personally. It was one of the best downloader in some cases better then IDM, Please note: I am not promoter of jDownloader. Thank you
    – MarmiK
    Commented Sep 5, 2013 at 18:04
8

There was a speech at Defcon last year that went into the problems with CAPTCHAs in general. One of the things they did is use multiple free OCR engines and had them vote on the best words. Doing this, they were able to achieve a somewhat decent chance of succeeding. For one kind, it was 40% or so, I don't think it was reCaptcha, though.

1
  • 3
    That's an important point, a spam bot doesn't have to break all capthas - 1% would do if it can keep trying. Commented Feb 20, 2009 at 16:23
8
  • "In fact, it [reCAPTCHA] became pretty useless on 4 January [2011] when spammers apparently got their collective hands on a piece of software that circumvents reCAPTCHA and allows for a fully automated registration process. The bots have been busy, very busy indeed, ever since" [ 1 ]

2-3 years ago the text-typing based captchas approach trespassed the line when they lost its battle, i.e. further complications just make them relatively (since computer power is increasing, while human's not) easier for machines and more repugnant and repelling, if not completely impossible, to humans. This contadicts to original paradigm of CAPTCHA as a test to to ensure that the response is not generated by a computer

Update:
Note that reCAPTCHA is owned by Google Inc. but Google Inc. does not use it by their own services.
Here is a link containg webpage with captcha used by Google itself/internally for ex., for Gmail registration:

alt text



Note that Google's reCAPTCHA always has 2 words.
Here is the link for image with Google's reCAPTCHA offered to be used by others.

And reCAPTCHA's screenshot:

alt text

I leave to make the obvious conclusions to a reader.

Cited: [ 1 ]
vBulletin forums hit by reCAPTCHA cracking spam bot | PC Pro blog
Posted on January 12th, 2011 by Davey Winder

5

I'm seeing blog comments on a system protected by reCAPTCHA where the page loads and 1 second later the post was made successfully. The User-Agent was nonsense (in this particular case it claimed to be running Ubuntu 9.25/Firefox 3.8), the referrer was from a completely unrelated site with no link to us.

This is clearly automated.

3

reCAPTCHA has not been defeated. If it had been, then why did Google just buy it and announce they will be applying the technology within Google to increase fraud and spam protection for Google products?

from Google Acquires reCAPTCHA posted to the Google Blog on 9/16/09:

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

3

The easiest way to defeat Captchas is Amazon Mechanical Turk. There's a guy named Kermit Welda who pays people a nickel each to register Hotmail, AOL and Gmail accounts. That's 6,000 fake email accounts at 5 cents = $300 a day. The cost of doing business is pretty cheap when you have other people do the dirty work for you. No wonder our server's spam filters want to reject anything from Hotmail.

3
  • Is this really an answer...? Commented Nov 30, 2012 at 3:54
  • Make sense, some similar concept to Death By Captcha.
    – kenorb
    Commented Mar 16, 2015 at 13:14
  • OP ha clearly stated this is not what he is looking for. Commented Mar 16, 2015 at 15:42
2

AFAIK In practice there is no tool to crack RE-captcha implementation, however eventually I assume someone will get it.

Funny enough if someone manages to get it then the whole RE-captcha project is pointless because re-captcha designed digitalize books which can't be done in an automated way.

BTW :

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

You can't secure a system thinking like that, this is like saying "your web application is not secure enough if your host is not in a old military bunker, because now people can steal your machine".

2
  • 3
    Your sentiment is spot on, but the application of it is misplaced: The thinking (of the comment you quoted) is that CAPTCHA does not solve the problem it intends to. Or as I often say "CAPTCHA (in general) is a bad solution to the wrong problem." The problem CAPTCHA tries to solve (by definition) is: How do I know that the user is a person, not a computer? Whether or not CAPTCHA solves this (it doesnt), the REAL problem is: How can I prevent mass flooding of my service? CAPTCHA farms and proxies show the exact difference. It's why any security solution should start with the threats.
    – AviD
    Commented Jan 28, 2010 at 10:57
  • 1
    You right, it's all come down "Why are you using CAPTCHA?". For some systems it's just enough security for some systems it's not even close. But just like keysize in crypto helps you to protect something by making brute forcing take years (although eventually they are going to crack it! but not in this life time or not in next 10 years) CAPTCHA in some systems can help enough security in the very same way. So as you said it's all come down what are you using CAPTCHA for?
    – dr. evil
    Commented Jan 28, 2010 at 16:05
2

There are lots of methods that are used to crap recaptcha. While its hard to use neural netwpork enabled programs to automatically solve them, its possible to grab the image and have amazon's mechanical turk or some equivalent program to solve them.

http://codemagician.wordpress.com/2010/01/22/solving-recaptcha/

Not the answer you're looking for? Browse other questions tagged or ask your own question.