I posted earlier about some problems I have been having recently with error-filled ebooks—I am not talking about major editing-process errors, rather, I am talking about typos and formatting glitches resulting from unproofed conversions. People used to complain about these at times before ebooks ‘hit it big,’ but now that we e-reading customers are a more mainstream group, the complaints are getting increasingly vociferous and this has been the first year where I have really noticed a widespread problem myself.

But just how widespread is this problem? Is my feeling that these days, I am becoming more of a copy-editor than an actual reader just an inflated sense of irritation at paying full price for a seemingly inferior product? Or is it really the case that many new releases are hitting my Kindle and Kobo error-filled? For a little scientific investigation, I took a look at the last ten ‘big six’ commercial reads I read and noted which ones I had tagged in Calibre as being problematic. Here are the results.

Book 1: A new-release non-fiction book which came out in conjunction with its eponymous documentary film. Purchased from Amazon. Overall, it was readable. However, my Kindle bookmarks show at least one random line break in the middle of a sentence, and the book had a problem with proper names not always being capitalized. This occurred more than ten times.

Book 2: A self-help book circa 2009, which I purchased from Kobo books. I only noted two errors in this one, but they were stupid errors: the letter ‘p’ being rendered as ‘bl’ (e.g. ‘blurchasing’) due probably to an OCR error that was not caught in whatever proofread may or may not have been done. This is one of the reasons I am in favour of removing DRM for personal use from books I purchase legally! If one does so, such small errors are easily fixed and make the books much more pleasant on re-read.

Book 3: A book in the ‘for dummies’ series, purchased from Amazon. I am halfway through, and it’s flawless so far. The graphics, sidebars etc. all come out beautifully in my Kindle for iPad app. Kudos to the ‘for dummies’ people! The book is super-long and must have been a monster to put together.

Book 4: A blockbuster new release memoir, purchased from Kobo. I recall a few non-capitalized first words or sentences, and a handful of random line breaks, but nothing egregious. Irksome, at full retail price, but the book was still readable.

Book 5: A poetry anthology, circa 2011. Downloaded from the public library. No errors I could recall. Huzzah, a clean book!

Books 6 and 7: Two YA novels, both backlist (1990s or so) and downloaded from the public library. Both were readable, but had the usual ‘we didn’t proof it’ problems (random line breaks, things which should be capitalized but were not, words with no spaces between them) often enough that I noted it. Boo.

Book 8: Another blockbuster new-release memoir (I am in a memoir-reading phase right now) purchased from Kobo. In addition to random line breaks (often enough to be irksome) the book also had random periods in places periods should not have been. They were frequent enough that I took screenshots of this one, sent them to Kobo and got a refund.

Book 9: A 2010 non-fiction release, downloaded from the library. I noticed a few small errors (random line breaks, I think) but less than 5 overall. Not bad! But still, if they are charging people money for this at mainstream stores, not acceptable either.

Book 10: Yet another blockbuster new-release memoir, purchased from Amazon. This was was in Topaz format, if that matters, and it was full of errors. The most common error was that the book contained drop caps which did not display properly, so it made the first words of many paragraphs look as if they were missing a letter. There were also words which had no spaces between them, and words which had too many spaces between them. I would have asked for a refund if this was a Kobo purchase—they take anything back if you have proof. But I have heard Amazon is stingy about how many refunds they will process for one account, and the book was under $3, so I didn’t complain.

So, my overall experience? Of the ten books read…

2 of 10 were totally clean and error-free as far as I could determine
8 of 10 had errors of some kind
6 of those had errors which were irksome, but minor overall
and 2 of them had errors severe enough that they merited a refund

That is absolutely abysmal. No wonder my ebook spending has fallen by about 50% this year! I don’t want to play Russian roulette with either my reading time or my reading budget. If I have to deal with errors, at least I shouldn’t have to pay to do so. I will be spending my August ebook budget on renewing my Philadelphia library membership, which brings the total number of ebook-lending public libraries I can access up to 5. If the book isn’t available at any of them, it will have to be something I badly want to read before I will shell out cash for it. No way am I paying full retail price for the privilege of copy-editing a book that isn’t ready to be sold!

  1. Yes, the Topaz format matters. Topaz is a scan of a print book; the scans are broken into what can be identified as separate character and punctuation glyphs, the glyphs are matched up to collapse them into a small re-usable set, the glyphs are converted into vector form to allow scaling, and the pages are re-rendered as a series of glyph references. There’s also an OCR scan embedded in the file for search purposes.

  2. I think the majority of the ebooks I own have errors and many are significant. The last four books I’ve read all have annoying and consistent formatting errors. The two by China Miéville have the first five or six words of every new section in small caps. It isn’t as if the first sentence of each section is formatted differently, just the first few words of the sentence. The error is at the beginning of every new section in both books. I’m currently reading my second Charles Stross novel. In both, the first letter of every section is not capitalized. In the current ebook, in addition to the capitalization errors, the first three or four words of each section are in a larger font. Every time I see it, it annoys me. With the start of every new section, I’m pulled out of the story by a formatting error! I’ve been avoiding using the library, because I’m not a fan of the Overdrive app on the iPad, but I may head in that direction as well. I’m not happy paying for what are essentially defective products.

  3. I have had no problems with getting refunds from Amazon for badly formatted books. I’ve done it twice. Both times they were friendly and apologized for my inconvenience. I was asked to provide examples of the errors, which I did. The first time this happend, I asked for the refund and got it right away. The second time, I asked to be notified when the errors were corrected. About six weeks later I received notification that the re-formatted book was available to replace my error filled book. I have heard that B&N will not refund your purchase price for any reason.

  4. Not to make any excuses for these, but some notes about the errors you found: 1. Ebooks that start as PDFs are a problem. Often times, the built in OCR inside Adobe Acrobat reads the end of a line of text as a carriage return. This explains the odd breaks in the middle of a sentence. If the OCR from PDF is especially bad, there are hundreds of these to clean up… while also making sure the legitimate carriage returns are maintained. It’s possible to catch most if not all of these using targeted regular expressions, but over the span of a 400-page book a few might still be missed. 2. OCR errors from a scan can vary wildly from book to book. What was a problem in one isn’t a problem in another. Some of the OCR mistakes are subtle: lower-case letters often look alike and you end up with the word “seat” getting rendered as “scat” or “ear” becomes “car.” A human proofreader will probably catch this. Then again, maybe not. At the extreme, I’ve seen scanned books in which the OCR ignored just about period and comma, but was otherwise flawless! With this, you have two choices, really. Do you have the book rescanned and deal with other unknown needle-in-the-haystack errors that will pop up? Or do you deal with the enemy you know and try to replace the missing characters (periods are easier to handle en masse). Either way, deadlines are tight and people are working on multiple projects simultaneously, both of which necessarily increase the chances for error. 3. The use of small caps is generally the cause of words or proper names not being properly capitalized. To accomplish small caps in word processing software — or in the various book-making programs — you have to set the characters as lower-case and then apply the small-caps formatting. Unfortunately, when the OCR or conversion to XHTML occurs, the small caps revert back to their basic formatting: lower-case letters. If the person doing the conversion to ebook is unaware of the small caps in the first place, they generally won’t know to look for them. This is, more than anything, a disconnect in the workflow rather than the sign of sloppy work. @jgrnt1: What you’re seeing in the China Miéville books aren’t errors. That’s a deliberate design choice. I’d bet that, if you opened the print versions, the same words will be set in small caps, as well. This may not be your preference, but I think there’s a legitimate case to be made for having ebook interiors match print book interiors whenever possible and to the degree that it’s possible. In the Stross novel, the lower-case first letter IS an error: those are probably dropcaps AND small-caps in the source and the person who worked on the ebook didn’t catch them. This is odd given that you also report that the words that follow are in a larger font. Understanding what usually happens to small caps when the source converts to XHTML, it means the person who worked on the ebook re-capitalized those characters and missed the first letter. What’s worse is that the publisher didn’t have a review process in the workflow that would’ve caught these before they published the ebook.

  5. Errors in ebooks are frustrating to be sure, and there are lots of reasons these errors creep in as listed below. The biggest problem, though, is lack of care on the part of those building the ebooks. For example, I know from experience that ebooks made from PDFs have a certain set of common errors. It’s my job as the ebook creator to be aware of those problems and account for them. It’s also my job to be aware that different reading devices support different presentations. Small caps works in some readers and not in others. Again, the ebook builder has to be aware of this and to edit the code accordingly so it will come across properly wherever it’s viewed. An ebook is another presentation of a book and it should be produced with just as much care.

  6. I agree, Tony. There are lots of reasons why errors might occur in the original text, but it’s the responsibility of the author and publisher to make sure they’re fixed. If they’re going to sell an unproofread book, charge a lot less for it.

  7. All of the books I’ve read have been downloads from Amazon, many of them free. I also have noticed errors in most of the books…punctuation, spelling, words run together, formatting (skips in line from one word to the next). Since the books have been free it’s hard for me to complain much, but if I paid full price and found these errors I would definitely be contacting the author and publisher.

  8. Since I download books for free, I can hardly complain when they are riddled with errors. If errors are not as egregious and was simple as just a missing letter or an incorrect punctuation, or one misspelled letter, I will not be bothered. However, I’m recently reading Isaac Asimov’s The Complete Robot and it is a horrifying read. There’s no chapter breaks, or much less paragraph breaks. Everything runs continuous. What’s worse is that it must’ve been ran on an OCR that was nearing it’s death. Every few words was riddled with symbols instead of letters, like an A becoming a small triangle or an O a heart and a g a spade. While I could still decipher what the text meant, It was extremely annoying and slowed down my reading. I’ve been severely tempted to edit the text as html and reformat it again in epub to make my reading easier. But then, I realized that as soon as I finished editing the text, I would effectively have finished reading the book, making my effort moot.

  9. “I have heard that B&N will not refund your purchase price for any reason.”

    Not so. I personally have gotten refunds for 2 badly-proofed (I should say not proofed at all) Harper Collins books from B&N.

    As I’ve described in this blog post and elsewhere, most of the OCR errors could be fixed by doing a proper high-resolution scan up front, preferably using enlarged text which can be obtained by simply putting your book through a photocopier with an enlarging setting. It is the slap-dash nature of the low-bid third-party scan shops that cause the bulk of the initial problems, compounded by the publishers’ apparent reluctance to pay for a proofer (much less a decent scanning shop).

    In short, there is no excuse for this.

