Diane Duane has posted an update to her blog on the error correction issue with Young Wizards e-books. She contacted her editor, who contacted the digital editions department at her publisher, and she’s received a response from them that they have developed a new error-correction process that looks specifically for commonly-occurring OCR errors and eliminates them at the XML level (so that corrected e-books can be generated in multiple formats from the new source material.). They would like to run the books through this process. Then, Diane can go back through and look to see what errors still exist, which will help them make their error-checking script better.

Naturally, Duane said yes. The process should take about two months to complete. Though the post doesn’t mention this, I hope and expect it will eventually result in the versions of the books that reside in the cloud at Amazon, Barnes & Noble, Kobo, and other places being replaced with corrected editions that those who’ve bought them can re-download at no extra charge.

One other thing that Duane mentions:

I gather from the interchange of emails on which I was “copied-in” that customer feedback is one of the factors that has been driving the development of these extra layers of error-checking. So, those of you who have been writing directly to publishers about these issues — keep doing it! It’s having an effect.

I’d also like to make the suggestion that Diane might want to have some other readers double-check for errors, just to make sure they all get caught. I know it can be hard to reread your own work looking for that sort of thing.

Anyway, kudos to Diane Duane and the publisher for going ahead to fix the errors, and the readers for writing in to complain!


  1. Apologize for what, DensityDuck? I am glad for Diane Duane’s sake that the errors are fixed since, as hers is the name on the cover, it reflects on her. And I’ll probably buy the books once I know that the corrected versions are available. But I still maintain that the only job of the customer should be to hand over their cash in exchange for the product. We really shouldn’t have an obligation beyond that.

    I was dissatisfied with Ms. Duane’s response more in the sense of how the sequence of things played out: book is released in Market A, has errors which they know of, and they expand into Market B without those errors having been corrected (i.e. knowingly selling a defective product). And THEN getting snippy with customers who knew about the errors and perfectly reasonably queried to ask if they were still there. That’s where my issue was. And as for why I made the issue with Ms. Duane and not her publisher, it’s because she was the one with the blog post.

  2. Heaven forbid that anyone in a publishing house should actually read a book that’s about to be issued under its imprint. Reading is so Twentieth Century — really, what do the customers see in it?

    If ebooks are doomed to be shovelware, let’s do what we can to improve the shovels. May I suggest that a book be scanned twice, using different machinery and software each time? A standard computer utility called “diff” would report discrepancies between the two scans, and a human being could consult the book to see which version is correct.

  3. Joanne: I don’t get why you seem to have this idea that it shouldn’t be a reader’s responsibility to complain. I know you used to be willing to complain about issues like lack of availability; I’ve read about your posts in which you said you did it and then complained people weren’t listening. If you get poor service at a restaurant, do you stay silent on the matter but then tell its food suppliers that it’s their responsibility to see the restaurant serves their food properly?

    Allan: Well, that’s what they’re doing: improving the shovels. I gather that the script they’ll be running the books through does essentially that: it looks for common OCR errors and fixes them. And the process of running Diane’s books through them will help improve them. For instance, perhaps the script doesn’t know about “arroz con pollo” getting rendered “arroz con polio” yet, and if the script doesn’t catch that, Diane will inform them of it and they’ll add it to the list.

  4. Let’s see. The publisher does a crappy OCR scan with minimal error checking and obviously nobody even reads it before it’s sold. Customers complain to the author but the author says they should be complaining to the publisher instead. The author goes back through her editor and the publisher offers to run it through another scan to check for common OCR errors but doesn’t offer to proof read it. The author is supposed to do that? Meanwhile the customer who bought the defective product is supposed to wait 2 months…. and Joanna is supposed to apologize.

    Got it.

  5. @Bob W: To clarify: The author takes the issue to her publisher, and also suggests to the consumer base that, if moved to do so, they might complain to the publisher **as well**, to help concentrate the publisher’s mind on the problem… as the consumers are the people who’re paying their good money for the product. …Meanwhile, for the time being, of *course* I’ll be proofing the books after the secondary scan. It was always my intention to do this, extra scan or not, once I started hearing reports that there were problems.

    I absolutely agree that in a perfect world, consumers should not / would not have to complain about this kind of thing. However, a perfect world isn’t where we are as yet. Right now publishers (I would guess) are still evolving mechanisms for dealing with this kind of problem. I don’t see how added pressure from consumers would do any harm to the long-range goal of getting ebook production to the point where it’s more carefully managed than it is now.

    Meanwhile, I don’t see why Joanna should owe me any apologies for anything.

  6. @Dan: Teleread sometimes screws up on the fine details. The email refers to an error _checking_ script (where Chris wrote “correcting”). Given that the process is expected to take two months, I think we can confidently assume that this is involves manual verification.

  7. A simple way to handle this for all ebooks is to include an email address somewhere, ‘If you find any errors in this ebook, please email us at pub-at-pub.com. Corrections that are confirmed will win their spotters a free ebook from us.’

    I wish Project Gutenberg would make it easier to post corrections to their books, too. In fact I wish PG would set up a subsidiary (or somebody else would) that would use wikimedia software to put up editions of the books that everybody could edit.

  8. Joanna wrote:
    “But I still maintain that the only job of the customer should be to hand over their cash in exchange for the product. We really shouldn’t have an obligation beyond that.”

    I did disagree with Joanna’s previous article but I find it hard to disagree with this statement, if I am reading it right.
    Joanna is saying that eBooks should be ‘fixed’ by the time they get anywhere near the public, and publishers should not be relying on readers to flag up errors. She is 100% correct in this !

    Why isn’t the Publisher giving the eBook to three readers on a Monday, to read and submit corrections by Sunday ? either they all read it and flag errors, or they read it through out loud in joint sessions and compile a list of errors.
    Corrections to be completed by Friday. All done and dusted. Problems solved.

    Next ?

  9. “Why isn’t the Publisher giving the eBook to three readers on a Monday, to read and submit corrections by Sunday ? either they all read it and flag errors, or they read it through out loud in joint sessions and compile a list of errors.
    Corrections to be completed by Friday. All done and dusted. Problems solved.”

    There used to be a step in the editing process that followed Copyediting called Proofreading (this does not simply mean a spell check), but apparently ebooks don’t merit that step even though we’re expected to pay the same prices as print customers. Print books must still get this step to some extent as while I’ve seen errors in print books they’re nothing like the errors (amount and type) found in a lot of ebooks.

  10. As someone who works with documents generated through OCR regularly I do not think that further automation will fix the problem. Oh there is a good bit of automation that you can do to make the process faster, but you still need a human to read the document. Several humans would be best! One of the problems I have run into was errors in the original, which just compounds problems in the OCR version.

    I have to admit I do not understand why the big publishers seem to have problems releasing updated copies. I do not know if they just do not care? or if their digital departments are just too overwhelmed.

    I have received the incorrect ebook on several occasions! Luckily I was able to get it fixed or a refund!

  11. What I am finding appalling in this process is at there is even OCR having to occur in the first place. If a book is new, there should be a digital file that was sent to the printing press. Why are we not using this as the basis for the ebook version. In an efficient work flow, there should be one digital master that gets all the editing, then a branch is sent to lay out for printing and the one sent off to the e book seller. Having to do the same work multiple times is just tossing money on the fire to watch it burn.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.