my_scanner.jpg The Internet Archive has made available to all its automated function for deriving multiple formats from a single source. If you post a movie to archive.org in, say, MPEG2 format, a whole slew of copies in different formats will be created for you, such as MPEG4 videos, a Flash video, and an animated GIF.

Perhaps this function has been available for movie and music fans all along, but it is relatively new for e-book producers. Until recently, if I posted a set of scans for a public domain book at the site, the best I could do was to use DJVU Libre, a cripled alternative to the commercial DJVU archiving software, to create a browsable version of the scan set. Now, archive.org’s software will use professional versions of conversion tools to create the following from a set of TIFFs, JPEGs or JPEG2000s:

  • OCR text
  • OCR XML (text with markers that correspond with the exact position of the word in the original scan)
  • FlipBook (Javascript web interface to the book)
  • DJVU (sort of JPEG on acid, requires a dedicated viewer or browser plug-in)
  • Animated GIF (used to illustrate the book’s entry on the site)

This must be excellent news to a group like DistScan (Distributed Scanners), because it adds immediate usability to the fruits of their efforts. If only Jon Noring’s group managed to actually produce scans.

Here’s how you can post books to the Internet Archive…

1. Make sure you are allowed to distribute copies of the item (book, map, postcard, poster, …) you are going to distribute.

2. Scan all pages of the item; 300 dpi is usually sufficient for plain text, illustrations are typically scanned at 600 dpi. (Make sure the work is complete, or that at least you know what is missing.)

3. Clean up, deskew, and crop your scans, so that the image on your screen looks like a page, rather than a picture of a page. You should end up with a complete set of scans in either TIFF, JPEG or JPEG2000 format.

4. Go to www.archive.org.

5. Log in to your account, or create an account and then log in.

6. Click on the Contributions link at the top of the page.

7. On the Contributions page, click on the New movie, audio recording, live concert recording, or book link.

8. Follow the instructions. If the instructions are unclear, you may find some help here.

9. If all goes well, the derivative works should be created automatically. This will take some time (up to several hours) , so don’t immediately assume things went wrong. Use the links on the Contributions page to see which tasks have not yet completed. If things did go wrong however, you might be able to restart the derivation process via the Item Manager, which can be found by the item’s owner by clicking Edit Item on the item homepage, then Item Manager.

10. Your book will end up being part of the so-called Open Source Books collection. It will have its own web page where readers can leave reviews, and where all the different formats can be download. Of course you can also publish a true e-book at the Internet Archive, in HTML or any other format, but you will get different types of derivatives.

NO COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.