image Check out Dorothea Salo’s complaints about Google’s digitization project. “Let’s be blunt. The digital files that Google is going to produce are crap, not ebooks. They’re not one-tenth as readable as even a bare-bones Project Gutenberg ASCII. They substitute for a properly-designed ebook (never mind a print book) in roughly the same way that a moldy bottle of off-brand spaghetti sauce substitutes for a six-course meal at a five-star Italian restaurant.”

Branko Collin Says:
February 18th, 2006 at 3:01 pm

I am afraid I am missing the point here. Who said that the aim of Google Books is creating ebooks, when and where did they say that, and what recreational drugs were they consuming at the time?

David Rothman Says:
February 18th, 2006 at 3:40 pm

Hey, Branko, what Google is doing will lead to crap. While the big thing now may be search, folks ideally could do other things with the files. If nothing else, the searches themselves could be less than accurate. Meanwhile here’s a summary of some issues:

“Clancy mentioned that Google was NOT going for archival quality (indeed COULD not) in their scans and were ok with skipped pages, missing content and less than perfect OCR — he mentioned that the OCR process AVERAGED one word error per page of every book scanned.”

Source material here.

Perhaps I’m missing something, but it would appear there could be problems for people who in the future do want to make books from this. You never know what Google and partners will do for commercial reasons, and meanwhile, alas, the media are associating Google with books. Google could be pre-empting some valuable efforts by actual libraries, not to mention certain public-domain-related organizations. Ugh, aren’t books meant to be read, not just searched? Like it or not, in the public perception, Google is in digital book territory.

Still, in terms of technical accuracy, I do appreciate the distinction between books and files–so, even though the sloppy scanning at Google could eventually affect books, I’ve changed the headline. Thanks for the feedback. Always appreciated.

-David

Branko Collin Says:
February 18th, 2006 at 6:06 pm

What Google is doing will lead to an index of books which, hey!, just happens to be Google’s goal. The argument that there is not some sort of windfall for passers-by is exactly the argument the Author’s Guild is using when sueing Google. They noticed that Google is a lot more business savvy than your average author, and figured they would sue just to get on the gravy train.

If what Google is doing is crap because the by-product of their index-building cannot be repurposed by passers-by, then everything everyone in the world is doing is crap. You’re dilluting the meaning of the word by applying it in such a loose fashion.

But again, I may totally misunderstand what this is about.

David Rothman Says:
February 18th, 2006 at 6:21 pm

Branko, I was thinking especially but not exclusively of public domain books that Google is scanning. It would help if they did a good job and made life easier for public domain folks who wanted to piggyback–in a good sense–on their efforts.

As for copyrighted books, I entirely agree with you that building indexes is entirely different from books! Google actually will be helping authors.

Thanks,
David

Branko Collin Says:
February 18th, 2006 at 8:14 pm

“It would help if they did a good job”

As far as I am aware, they are doing a good job. It could be argued, and I believe it has been argued, that for an index you do not need the sort of perfection that you would for readable books. Google is producing to a quality level that is required for a Google-quality index. I would hardly call that “crap”.

Yes, it’s true, Google’s bundles of scans do not good ebooks make. But to blame Google for that would be as insincere as blaming me for not speaking Chinese.

David Rothman Says:
February 18th, 2006 at 9:29 pm

OK, Branko, we’ll agree to disagree. But if Google ever gets sold and greedsters prevail, be prepared for crap to squirt out–in the form of books. Meanwhile, from the perspective of public domain boosters, of which I’m one, it is tragic that Google cares so little about the quality of its scans–at least of the more popular public domain works. Imagine all the OCRing that DP could do if Google cared about giving it the right fodder to work from.

Beyond that, I’ll repeat what I said earlier. Google’s well-publicized efforts are pre-empting others by people more conscious about quality.

Will let you have the last word.

Thanks,
David

(Who enjoyed your just-posted item on Dumas père)
David Rothman Says:
February 20th, 2006 at 2:07 am

Actually I’ll cheat and add one more word, since facts are involved. See this press release. Clearly Google is headed in the direction of full-text display of public domin books. – David

================

NB: I deleted a trackback on September 1, 3009 (from a post that Paul Biba and I made), and accidentally zapped the original item from Feb. 18, 2006. Luckily I found it in Google’s cache and have reproduced it where with the original date. Branko certainly can verify the item’s authenticity. – D.R., 9/1/09.

NO COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.