Why (and How) I Scan Old Books
September 24, 2012 | 9:34 am
By Jon Jermey
I know this because I have The Best of Myles (1968), published in paperback by Picador, which I found second-hand at a church fête some weeks ago. A quick search of e-book sites reveals that there is, as yet, no other way to read it than on paper. All yez Kindlers, Koboists and Androghedans, as O’Brien might have described you, will have to find some other book. But not me. In a few minutes I will be readin’ Myles on me own blessed tablet, as happy as Larry. Because I am about to scan the book to PDF.
The history of my book-scanning attempts would fill a small book of its own. Flatbed scanners, cameras on tripods, cameras in plywood frames, and combination printer/scanners all played a part, but last year I bit the bullet and bought a dedicated Epson GT-S50 double-sided sheet-fed scanner. My scanning speed went up by a factor of four, and the scanning error rate fell by the same proportion. From a laborious and complex chore requiring over an hour, scanning a typical book became a twenty-minute job. I didn’t understand at the time why a dedicated sheet-feed scanner should cost three times as much as a printer with a scanner attached. Now I do.
The book is 400 pages long, so my first act will be to take a sharp knife, a steel rule and a cutting board, and divide it into three sections of about 130 pages each. Aligning the page edges of the non-bound side of each section, I will then trim off the binding about four millimetres from the edge, giving me a stack of loose sheets. Before scanning I will leaf through these one at a time to ensure they are loose, and not still connected in any way; then I will stack them into piles of about sixty sheets (120 pages). Starting the Epson Scan program that came with the printer, I will enter a name for the output file, a size for the scanned pages, and an output format of PDF. Text enhancement and automatic OCR are already switched on.
I will insert the first set of pages into the sheet feeder, and top it up with the others as they begin to run out. At roughly thirty sheets a minute, it will take about six minutes to process the book. Orienting the pages and performing OCR takes a couple of minutes, and assembling them into a PDF file another minute or so. At this point I will have a perfectly legible PDF version of my paperback.
But, being a perfectionist, I will open it up in a PDF editor and tweak it a little. This involves straightening the pages, cropping some of the page margins, and re-scanning any pages that have been missed or gone badly askew. Newer and more expensively-made books rarely give scanning problems, but older, cheaper ones sometimes do. I also run off a compact RTF copy of the text alone. The book, in PDF and RTF formats, is then ready to move into my Calibre collection. (Next week’s article: Accessing Calibre on PC from an Android tablet.) Sensitive bibliophiles may want to skip the next part, where I throw the used pages in the recycling bin, and pretend instead that I preserve them for posterity.
If all goes well, the process will be over within half an hour, and The Best of Myles will be sitting on my Android tablet instead of on my desk—portable, searchable, back-up-able, and still highly readable. As O’Brien himself crowed in similar circumstances: ‘Do you mind the cuteness of me?’