6

Help wanted from some fellow sleuths-archeologists:

Recent reports suggest that a Russian OCR tool called Cuneiform has been released as Free and Open Source Software (FOSS). The unfortunate part, for me, is that all the news seems to sit on the Russian side of the web, and I don’t speak Russian.

The matter becomes extra confusing when you notice that there is an American site that presents itself as the manufacturer of Cuneiform OCR (called Cognitive Enterprises), that still sells the package (albeit a much earlier version), and that keeps remarkably mum about the whole open sourcing its flagship product thing. Does anyone know what’s going on here? Is this open source release legit?

Easily beats two other FOSS OCR offerings

Why is this at all important? Well, I took a gamble and downloaded the software, and my test results with Cuneiform are so far easily superior to those of the other two FOSS OCR offerings, Tesseract and GOCR/JOCR. Without me telling it that it had to recognize Dutch (remember, I don’t know how to tell it that as I don’t speak Russian) it managed to OCR several pages almost perfectly, leaving only 3 or 4 errors per page. The other two averaged more than one error per line, admittedly mostly because of their inability to recognize where a line started and ended. (Language recognition software, be it speech recognition or OCR, tends to pass the annoyance test if it leaves in less than 1 error per sentence.) Good OCR software is hard to produce, and is therefore invariably expensive. A cheap (read FOSS) version of a quality OCR tool has the potential to emancipate the long tail of printed text.

 
6