Project Gutenberg is looking for volunteers to help with error correcting books that have been previously proofread. Here is Michael Hart’s request for assistance. It makes interesting reading. I note, by the way, that Project Gutenberg calls them eBooks – without the hyphen, but with a capital B. I can’t think of a more authoritative source for the spelling. David? Disagree?

Error Correction of Project Gutenberg eBooks

As many of you know, I like to do something around this time every year to take a new step forward in Project Gutenberg.

As luck would have it, I recently received an email reminder from one of our volunteers who reads our eBooks out loud for those who need or want audio eBook versions of our library.

This volunteer was kind enough to keep a log of errors found while recording one of our classics eBooks out loud and then sent us that list of errors, and now was following up.

Due to the fact that we receive more errors messages than we have volunteers to handle, these errors were not corrected, which stimulated me to write a request for help on this in a recent Project Gutenberg Newsletter.

The results were immediate, effective, and continuing.

The new edition, complete with ~23 corrections is online and has been for a couple days already, and we are still getting more volunteers for error correction.

This is a great and wonderful thing because the one thing in the history of eBooks that separates Project Gutenberg is an everlasting continuing process of improvement.

Hundreds of our eBooks are reissued each year with a variety of improvements, some technical, some in format and/or style of presentation, many with various error corrections.

How Good Can An eBook Get?

If we keep this process going for as many years more as this has been going on already, there is no reason average eBooks should not be as accurate, or even more accurate, than books being published on paper.

Some people like to pretend Project Gutenberg eBooks that we run through certain processes are “perfect,” but I think our own sensibilities tell us this is not the case.

The recent new edition mentioned above is a perfect example, as it had been through just about all the processes we have, and yet reading it out loud revealed ~23 more errors.

gutenberg logo.jpgI would certainly hesitate to bet that our average 250 pages long book would not have ~23 errors still in it.

After all, 25 errors in 250 pages at only 1,000 characters a page, would mean the book had 1 error per 10,000 characters, or that it was 99.99% perfect.

I won’t bore you all with numerical details, other than just a quick mention that the earliest eBook standards were 99.9% and then The Library of Congress upped that to 99.95%, and a few years later Project Gutenberg raised it to 99.975% and I would certainly bet our average eBook that has completed all our standard processes is at least that good.

However, there is always room for improvement, and that’s an awfully touchy subject for some, but not for CEO Greg Newby, or for myself, or for a few others who are willing to create a new Project Gutenberg Error Correction Team.

Believe it or not, we have receives perhaps 10,000 messages, over 37 years, encouraging us to check certain parts of book files for errors.

10,000 error messages!!!

We should expect to receive many more in the coming years as we will have many more readers.

strong>What Makes A Project Gutenberg eBook?

As I said earlier, the greatest difference between Gutenberg eBooks and all others is in the proofreading.

No one spends as much time and effort on accuracy as we do.

In the end, after virtually all the easy to find eBooks have been created, there will only be error correction to do, and translations into other languages, the rest grinding slowly, but assuredly to a halt, unless copyright trends reverse.

There is a reason that Project Gutenberg is used so greatly, particularly when compared to the millions of other eBooks– and that is because we work harder to make them better.

It takes an hour to work over the average book to correct an already existing list of errors. . .you have to get the book and then you have to open up in a program that won’t leave a trace behind, the various “artifacts” you often see when the eBooks have been pumped through ill-mannered programs, and a final pass to make sure all the margins still fit, etc.

Even then, one of our “Whitewashers” has to go over the book with a final fine tooth comb that pops out every character– every single character, even a comma, that changed from what was in the previous edition, and make sure each one of those changes was intentional.

It’s really not terribly easy to be the last persons to work on an eBook, and to know that any errors you leave behind or accidentally create will be there for millions of readers in the world until, hopefully, the next error checker finds and corrects them.

It is a great responsibility, but it also carries a greatest sense of achievement, as you realize all the future readers, which could be billions, will benefit from your work.

So, I thank each and every one of our Error Checking Team in great sincerity for their efforts, and at the same time I am asking for new members for this team to step forward to make yet one more level of contribution towards creating the best library humanity has ever seen.

Please be encouraged to forward this message to everyone and anyone you know who might be interested.

Again my HUGE thanks to you all!!!!!!!

Michael S. Hart
Project Gutenberg


  1. Are you saying that Michael Hart, the man who more or less single-handedly invented the e-book (or eBook) is not authoritative on how the thing he invented should be spelled? Rather, you listen to some old newspaper instead? :)

  2. I’ll cast a vote for ebook. As for the alternatives:

    eBook looks like a trademarked product (i.e. iBook) not a generic term. Also, capitalizations inside words drive spelling checkers mad and are a headache for anyone not in the know. It’s the sort of headache Adobe created by having products named InDesign and Photoshop. Look at all the people who type the latter PhotoShop. Make the term eBook, and millions of people will still type it ebook while others will create nonsense by analogy like eMail. Keep it simple and conventional to start with, and you’ll not have those problems.

    e-books. I’ve never understood the rationale behind hyphenating single characters at the start of words. Something like “work-study programs” I can see. You have two words that are being joined in a new way. It makes sense to add a separator. But “e” isn’t a word. It’s an abbreviation. If you’re going to abbreviate that radically, you might as well drop the hyphen at the same time. Also, as with hyphens, this special way of doing things is just one more thing to remember. (“Now is it ebook or e-book?”) Finally, over time hyphens tend to go away, so why start with them in the first place?

    So I cast my vote is for ebook for much the same reason that I use email rather than e-mail. Do a Google search on email and e-mail to see how much confusion there is in that area. In the long run, the simpler form will win.

    And yes, if the term following the “e” were something that created pronunciation difficulties, say “e-elections” for elections held electronically,” I would vote for keeping a hyphen. There, a reason exists. But email or ebook don’t have that rationale. They’re easily pronounced.

    This reminds me of a writer, I believe it was Tolkien, who has one of his characters say that it’s better to give kids simple names rather than one so long and complicated you have to shorten it with a nickname.

    –Michael W. (Mike) Perry, Inkling Books, Seattle

    P.S. And I wouldn’t put much stock in what the NY Times does. I once found myself trapped in a debate among Microsoft editors over “data is” versus “data are.” Those sorts of in-house debates are often dominated by those who like to complicate the world with rules needing enforcement.

  3. Given the New York Times and Publishers Weekly’s almost complete lack of internet savvy I don’t see how they can be an authority here :-)

  4. Chris, I have to respectfully disagree with you that Michael Hart is the “inventor” of the ebook (or e-book). Despite Michael Hart claiming this “title”, the facts don’t support his contention. Michael Hart has made significant historic contributions to our digital content future, but “inventor of the ebook” is not one of them, and I wish he would not claim the title since it makes him look like a self-promoter.

    Reference to the archive of the now inactive “ebook-history” Yahoo group provides more than enough evidence that Michael Hart (whose PG project never *really* got going until very late 1989, contemporaneous to a couple other text digitization projects that did not reach critical mass) was not the first to think of the idea of digital distribution and reading of book-like content. But of course Michael Hart played a significant role in promoting the idea by actually digitizing content and doggedly promoting the idea (many others digitized content, though, in the mid-1980’s, distributing the texts on various BBS.)

    A very strong argument can be made that the invention of the idea of the ebook in the modern sense was by Alan Kay and his Dynabook (1968). A component of the Dynabook idea was that book content would be distributed and read on the Dynabook tablet. There is no doubt that Kay’s idea of the “ebook” virally injected itself into the creative consciousness of the computer community.

    More thoughts on the history of the ebook can be given by Bill Janssen, if he wishes to do so. I consider Bill to be the #1 authority on ebook history.

    Now regards the spelling of “ebook” (hyphen or not), and its case variants, the “ebook-community” Yahoo group (started in early 1996 as “ebook-list”) uses “ebook”/”eBook”.

    An interesting exercise is to do a Google Groups search in the time frame of 1981 to 1992, which mostly covers Usenet. References to “ebook” and “e-book” (and all case variants) start in 1991. There was a company called “EBook Inc.” of San Leandro. No doubt the term “ebook” dates earlier than this, but I have no reference to it. I strongly believe Michael Hart did not originate the term, who didn’t even use it until more recently. The first public use of term “ebook/e-book” may never be determined.

  5. Paul, the NYT and PW might accuse more than a few Net folks of lack of PUBLISHING savvy, LOL. Meanwhile see Jon’s note. Michael is truly one of the great people of eBooks, er, e-books, but I’d hardly regard him as the ultimate authority on publishing.


  6. I’ll rephrase it then: Given the New York Times and Publishers Weekly’s almost complete lack of electronic publishing savvy, I don’t see how they can be an authority here :-)

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail