A “captcha” is one of those little boxes that have squiggly text in them that you’re supposed to type so that the form you’re trying to submit knows that you’re a real person and not one of those stupid BOTS that plague our networks with spam. You’ve probably seen these when you’ve tried to post a comment one someone’s blog or signed up for an online account of almost any type.
David Warlick posted this morning on his 2 Cents Worth blog about how books are being digitized and thousands of people are helping with these projects by using “captchas.” His article is Re-Capturing Books through Captcha…
A CAPTCHA is a program that can tell whether its user is a human or a computer. You’ve probably seen them — colorful images with distorted text at the bottom of Web registration forms. CAPTCHAs are used by many websites to prevent abuse from “bots,” or automated programs usually written to generate spam. No computer program can read distorted text as well as humans can, so bots cannot navigate sites protected by CAPTCHAs.
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.
To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then, to make them searchable, transformed into text using “Optical Character Recognition” (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.
Since the human eye (and mind) is more accurate than OCR programs, this kind of user-involved digitization project results in a much more accurate transcription. Plus, it takes advantage of Distributed Computing, only of a more organic nature. What a cool way to take a new-ish technology that sometimes seems like a pain and turn it into something useful!
Be sure to visit the ReCAPTCHA site to learn more about how the system works (it’s very ingenious) and to help by typing a few words in. ![]()
Current music:
Soundtrack from The Time Machine




Na







