smallCaptchaSpaceWithRoughAlpha.pngAlong with the site redesign, I’ve asked our owners, NAPCO, to implement reCAPTCHA. You have undoubtedly seen this on other sites you’ve visited.

Since May we have received over 172,000 spam comments. Why is this important? It’s important because all those spam comments are held by our filter for review before being deleted. That means that Chris and I have reviewed over 172,000 spam comments to be sure that they do not contain false positives. We regularly find false positives and have to approve them. This isn’t the end, though, because an “approved” spam then becomes, in WordPress’s wisdom, a pending comment and we have to go into the pending comments area to approve the false spam for posting – a two step process.

Quite honestly, this is taking too much time. So we’re going to try to see how well reCAPTCHA works. At least it will contribute to TeleRead’s mission. Here’s the scoop on it, in case you don’t know:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Currently, we are helping to digitize old editions of the New York Times and books from Google Books.

So don’t be surprised when it shows up. Let’s try it for a while and see how it goes.

8 COMMENTS

  1. Will this mean going through the recaptcha process EVERY time we post ?? I hope not because that seems a little excessive. I post to a large range of forums and have to to the recaptcha thing when registering … but thereafter their filter system works well. I also often have to refresh recaptcha three or four times to get something I can read 🙁

  2. I just thought I would post something positive in response. I have really liked reCAPTCHA, and it has been a big improvement over the CAPTCHA systems that came before. (I would guess that natural language questions are more effective, but they don’t “contribute to Teleread’s mission” and tend to be a filter for non-native language speakers.)

    Also, if I had to choose between typing a CAPTCHA for every post and registering, I would prefer to type the CAPTCHA.

  3. reCaptcha sounds kind of noble in digitizing old books, etc, but the system is actually a pain. Very often I find myself retyping the Captcha words, because they weren’t recognized (respectively I didn’t recognized them ;-)). I would prefer to register as user to post a comment.

  4. Bob: So have I, but when that happens I just click the “give me another choice” icon (the top of those three red buttons) and a second later get a completely new captcha.

    Howard: Is it really that hard to type two extra words, and know that you just helped digitize part of some old document?

    Thinking about it, though, perhaps it would be best if the ReCAPTCHA was just required of users who aren’t “recognized” yet—the ones whose posts are, if not determined to be spam, passed by us for approval. Since if you’re being passed through automatically, you already have passed a Turing test once.

  5. Chris please don’t take it as a major complaint .. just discussion. I contribute and have contributed to many many forums currently and over the years. Usually Registration works very well for regular posters and exempts the need to verify with reCaptcha – allowing auto login with the browser. If someone prefers not to register then a reCaptcha verification seems fair in order to avoid the spammers. In my own personal opinion this forum here would appear to me to be very much a registration-style forum. It is not a passing whim type of yelling forum. But that’s just my opinion.

  6. You might try using the WordPress plugin called “SI CAPTCHA Anti-Spam”. You could have it up a running in about 2 minutes. I use it on my blog and it seems to work well, and is much easier to read than the characters on the type of reCAPTCHA that is shown in the picture for this post. And there’s usually only 4 characters to enter so it’s much quicker.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.