From Project Gutenberg e-text to PDF-based e-book for your Iliad
September 17, 2006 | 11:19 am
Modern E-ink based reading devices try and simulate the printed book. Where older devices lacked the definition to make typography something to worry about much, the resolution of devices such as the Iliad is high enough to make readers crave well laid out e-books. Luckily, with Project Gutenberg‘s HTML versions and OpenOffice.org Writer good looking PDFs are a matter of minutes.
The following tutorial may also be useful for those who wish to turn an e-book into a p-book.
Over the years it has been suggested that Project Gutenberg should preserve the page numbers of the books it scans. This, coupled with preserving edition statements will allow students of texts to determine exactly which artifact they are citing.
But preserving page numbers in e-books somehow feels wrong. E-books are liquid; they are not tied to archaic concepts of pages. And so Project Gutenberg stubbornly kept discarding page numbers for a long time.
Enter e-paper based reading devices. Especially Irex want their E-ink based reader, the Iliad, to emulate the printed page. The page flipping bar is designed exactly so that reading a text on the Iliad feels like flipping pages.
In a delicious twist of irony, the Distributed Proofreaders recently started to post e-books to Project Gutenberg that retain page numbers. That is unfortunate, because typically the pages of your new PDF version will not map exactly to the original pages; you will want to remove the page numbers.
In order to turn an HTML-based e-text from Project Gutenberg into a PDF file for your Iliad, you will need the following tools:
- A web browser with which to download the book you’d like to read
- OpenOffice.org Writer
Whew! Writer is part of the OpenOffice.org package. I do not believe you can download it separately.
You may be able to use other word processors. In this tutorial I will be using the following functionality:
- the spell-checker, with dictionaries for the relevant languages
- settings for page sizes, page margins and page numbers
- “Search for Styles”
- style editor that retains the custom styles from the HTML file
- PDF export
Step 1: acquire the e-book
Go to Project Gutenberg (PG) at www.gutenberg.org and download your e-book in HTML format. Not all of PG’s e-books are stored as HTML — about 60% aren’t. For the sake of this tutorial, I’ll assume that the book you want is a PG e-text with a HTML version available.
PG has a search engine that will let you find books stored in a certain format. However, it is slow, and with thousands of books available in glorious HTML, there’s not much chance you will find what you are looking for this way.
For this tutorial I will use H. Beam Piper‘s “Little Fuzzy” as an example. The book is fairly simple, it contains no illustrations, page numbers, footnotes and so forth. However, most of the techniques described below can also be used for more complex books. For instance, you can remove page numbers by using the Search for Styles function.
Sometimes PG’s e-books have an external stylesheet. Be sure to store as “Complete web page” instead of “Just HTML” in those cases.
Start OpenOffice.org Writer and use the File / Open menu to load the e-book. Writer can edit HTML files.
It is possible to copy an e-book from your browser window and paste it into a new Writer document, but that has the disadvantage of discarding custom styles.
Upon loading the file, immediately File / Export… it as a Writer document (.SXW). If you don’t do this, OpenOffice.org will keep treating the file as a web document. A disadvantage of this is that some of the Writer functionality will be unavailable to you.
Close the HTML document and open the Writer version.
Step 2: remove the PG header and footer
Any given PG e-text contains a lot of legalese, both at the front and the back. These are known as the PG Header and the PG Footer, and you will likely want to remove them. They outline the usage you may make of the PG trademark. So remove the PG header and footer. They are clearly marked as such. You may also wish to remove information about which volunteers worked on a book, which is typically considered part of the book by PG.
Step 3: set the language
Go to Tools / Options …. Select the Language Settings / Languages item. Select the default document language, and check the For the Current Document Only box.
Step 4: get the basic lay-out right
Whatever you do in this step, don’t start applying font changes and so on just yet. Use this and the next step first to make sure all the styling of the HTML document is preserved, but in a form you prefer.
Illustration: Little Fuzzy’s title page at this point.
Step 5: fixing styles
You now have an e-book suited more or less for the physical format of the Iliad screen. However, you may not like the way the default styles look. Here are some suggestions for changing styles.
Since I imported this text from an HTML original, all textual elements are associated with a certain style. You can add, edit and delete styles by opening the Styles and Formatting dialog: Format / Styles and Formatting…, or press F11. A document can contain many styles, and Writer makes it easier to navigate through them by organizing styles in categories. The two categories you will be concerned with are Custom Styles (the styles the PG volunteers added to the document) and Applied Styles (the styles that are actually used in the document).
The style catalog selector is hidden all the way at the bottom of the Styles and Formatting dialog.
Writer’s Edit / Find & Replace… has a Search for Styles option under More Options. Check the associated checkbox and the search field will turn into a style selector.
Search for each available custom style to see what it looks like. Make notes or screenshots, in case you will edit a style later on and want to reverse a choice.
The example document is optimized for the web. Paragraphs are in large letters and are separated by empty lines. These features are not necessary for a printed book or for an e-book to be read on an e-paper device. So let’s change these.
Leave the Find & Replace dialog and conjure up the Format / Styles and Formatting… dialog once more. Choose the Applied Styles catalog. Select the style for regular paragraph text, right-click on it, and in the resulting context menu choose Modify….
The paragraph style for PG’s Little Fuzzy is called “Text Body”. First change the font to one that is less web optimized. In this tutorial I will pick that old chestnut Times New Roman, at a size of 10.5 points. PG’s Little Fuzzy uses Georgia, which is a pretty letter but optimized for a relatively low-quality screen.
Next move to the Indents & Spacing tab. Set the Spacing Below Paragraph to 0,00cm. Also set Indent First Line to 0.97cm. With a margin-hugging document like this, you may also wish to increase the line spacing a tad, or use a larger font. Experiment until it feels right.
Changing the “Text Body” style has also changed all its dependent styles. This is unfortunate, because longer pauses — signified by more whitespace — have now disappeared. The next step is therefor to reintroduce a vertical space in the “Text body.spacedTop” style.
Open the Modify dialog for “Text body.spacedTop”, select the Indents & Spacing tab, and set the Spacing Below Paragraph to 0.50cm.
With every modification, check and make sure the results are as you want them.
Step 6: page numbers
Go to the top of your document. Select Insert / Footer / All. The cursor should jump to the bottom of the page. Now select Insert / Field / Page Number. As you’ll notice, all pages now have page numbers.
Select the first page number, and click the Align Right icon in the Formatting toolbar. All page numbers will now rest against the right margin.
Give it the once-over
Check the document for problems. The spell-checker’s red squigglies may help you locate trouble spots. You can manually adjust the style of just one paragraph, word or even character by selecting the phrase you want to edit, then choosing the relevant Format menu. Make sure all non-ASCII characters, such as accented letters, em-dashes and curly quotes are displayed correctly.
Export as PDF
Select File / Export as PDF.
Step last: you are done!
Illustration: Little Fuzzy’s title page at the end of the road.
Go read the book! Here’s an example PDF.
This tutorial is generic enough to be used for other e-readers or even for if you want to produce a PDF for a printed book.
The boring tail
Disclaimer 1: I do not own an Iliad or any other reader that requires paging. (I own a Palm Zire, and am quite content with the lack of “pages”.)
Disclaimer 2: I would not know “pretty” if it hit me in the face. If you want pretty e-books, I suggest you apply your own good taste. With the above tutorial I hope to have handed you the tools to do just that. At the Mobileread forums there are people with very strong opinions about what looks “good”.
Disclaimer 3: This tutorial is probably woefully incomplete. Please add your own tips in the Comments section or on the Mobileread Wiki. Also, Lulu‘s forums might provide you with typesetting hints for p-books that apply equally to e-books for the Iliad.
Tip: acquaint yourself with your tools. Writer is a program of huge complexity that lets you do all kinds of book-like things with your document.