8

smallCaptchaSpaceWithRoughAlpha.pngAlong with the site redesign, I’ve asked our owners, NAPCO, to implement reCAPTCHA. You have undoubtedly seen this on other sites you’ve visited.

Since May we have received over 172,000 spam comments. Why is this important? It’s important because all those spam comments are held by our filter for review before being deleted. That means that Chris and I have reviewed over 172,000 spam comments to be sure that they do not contain false positives. We regularly find false positives and have to approve them. This isn’t the end, though, because an “approved” spam then becomes, in WordPress’s wisdom, a pending comment and we have to go into the pending comments area to approve the false spam for posting – a two step process.

Quite honestly, this is taking too much time. So we’re going to try to see how well reCAPTCHA works. At least it will contribute to TeleRead’s mission. Here’s the scoop on it, in case you don’t know:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can’t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here’s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Currently, we are helping to digitize old editions of the New York Times and books from Google Books.

So don’t be surprised when it shows up. Let’s try it for a while and see how it goes.

 
8