7

There are lots of non-image-based CAPTCHA ideas floating around. But what about the old-fashioned way?

What are the elements of a good image CAPTCHA? What visual elements are hard for computers, but easier for humans? What about mistakes, elements that are easier for computers than they are for humans? What are good techniques for increasing the speed of a CAPTCHA generator?

Here's an example of a CAPCHA I've been working on. It generates the functions for two sine waves, then stretches a text between them. It lays that over a background drawn from a pool of images. Image-based CAPTCHA

How could this be improved? (Specifically, I'm using PHP GD.) Things that come to mind are:

  • Change the color of the text, possibly making it multicolored.
  • Add "scratches" or marks that mildly obscure the text.
  • Add to the distortion so that it's affected by sine waves horizontally as well.

What goes into a superb image CAPTCHA?


Edit: I know that there are some very worthy third-party CAPTCHA resources. I'm looking for attributes that make them good. I'd like to use my own CAPTCHAs, just for the purpose of self-improvement. So, you can talk about reCAPTCHA, but it's not exactly what I'm looking for.

Also, it has been brought up that not only the image, but also the experience matters, so feel free to comment on that.

10 Answers 10

5

Make each letter/number out of a pattern, I.E. unconnected dots. Meaning the computer has no way of knowing that a dot is part of a letter other than pattern recognition (which they don't have yet.) Then the usual distortions and random lines.

How you do this is the challenge.

EDIT: Also, bonus points for patterns of different shapes, and try alpha transparency on the characters (on the edges or the whole character), so they merge with the background.

1
  • This is just the type of idea I'm looking for. Commented Oct 13, 2008 at 3:08
4

Make letters difficult to separate. Use handwriting-like font or add lines that join letters. Decrease and randomize spacing between letters.

Add wave distortion in other axis too. Distortion in one axis only can be relatively easily analyzed and reversed.

Don't bother with color background at all. It's super-easy to automatically filter black from other colors. Your background hinders only humans.

Don't add scratches or other noise unless it has the same thickness as letters. Noise-removal algorithms can easily remove things that are thinner than letters.

1

What if the color of the letters faded into other colors... for instance the 5 can start off as yellow on top and fade into blue or something. The colors chosen should be random.

With the multicolored background it might make it hard for the computer to pickup where the background ends and the character begins.. and hopefully it would not be too difficult for the human to actually pick up the pattern.

1

Instead of generating captcha you can create a captcha table in your database and you yourself create the table by search on google for good captcha images.

So no need to worry "Will this generation method work?"

0

I really hate CAPTCHA on sites, they just annoy me, but if you want to try and make a robust one try the following:

  • Ability to get a new image without submitting
  • Spoken version for the visually impaired
  • Non-uniform characters

I've used Recaptcha on a few sites, it's a nice and robust solution.

Or if you want to be really funky about it check out this: http://research.microsoft.com/asirra/

0

Algorithms that try to break captcha are pattern matchers that work by a few different ways: scaling and skewing the symbols that they already know about, finding and tracing edges, and counting interior holes to help. If you can break the letter up into pieces, vary the letter quality, or add strong lines or “scratches” along the letters these techniques will help. However all of this is fairly moot considering we have recaptcha for this purpose and it’s a wonderful third party app for this. Additionally captcha will help the security of your site, but will not stop those who are truly enticed.

0

I like the idea of KittenAuth and Microsoft's Asirra project. The idea is that, while OCR will eventually evolve to break your traditional captcha, the ability to distinguish a kitten from a dog is many orders of magnitude more complex a problem, while absolutely trivial for humans.

This solution, while probably the sexiest captcha idea ever, has the limitation of not being easily portable to hearing-impaired methods.

2
  • "Identify all the cats: MeowMeowWoofMeowWoofWoofMeowMeow*" Okay, you're right about the audio... Commented Oct 13, 2008 at 12:09
  • It also suffers from a limited set of pictures - the spammers can just cycle through the 100 or so images, catageorise them manually and then just id the picture, I suspect the set is already available catagorised on the net. Commented Feb 9, 2009 at 14:52
0

What about shearing and shuffling bands to mangle display and mouse-only input?

Start by taking your sine-wave morphed text, divide into horizontal bands or maybe even a grid.

That makes optical recognition harder and might allow you to avoid the kind of nasty background games that make some captchas hard for humans.

For a site where you can rely on local drag in the browser, instead of typing in an entry use shuffling requiring the user to re-order pieces (just in sloppy order, not like one of those puzzles). Or, if you wanted to use clicks alone, the classic sliding tile puzzle.

Note, I've run into a captcha where you had to identify which of N cartoons had an animal in them which succeeded in blocking me!

Wellington Grey sums up the AI CAPTCHA race nicely.

0

You could add a random array of fonts so that GD renders each character using a different one.

0

Be wary of suggestions of ReCaptcha. I have submitted incorrect input into it a couple few dozen times, and have had success each time. Several of those times I have submitted incorrect input for both words rather than just the most obscured word; the success rate, as I said, has been 100%.

I also think that image-based CAPTCHAs are user-hostile and should be avoided wherever possible. The advantage of text-based solutions is that you can tailor them to your site's audience, adding a level of obscurity that may trip up machines as they become more savvy with text-based solutions.

At the very least, don't use this all the time: orange
(source: codinghorror.com)

2
  • I'll shoot for "green". Or, perhaps, "lemon"? Commented Oct 13, 2008 at 3:21
  • I would say this is because of a poor implementation, most often with recaptcha this happens because the 'programmer' forgot to check the legit word, he/she thought that recaptcha would do that for them, and indeed it doesn't.
    – UnkwnTech
    Commented Oct 13, 2008 at 12:16

Not the answer you're looking for? Browse other questions tagged or ask your own question.