search6a00d83452242969e201b8d169d12d970c-250wi_thumb.jpgDo you remember when Amazon introduced both “Look Inside” and “Search Inside” functionality for books?

They were such simple yet revolutionary features at the time. Before Look/Search Inside, it was impossible to do a simple flip test as you could at a brick-and-mortar store.

Fast-forward to today, when we take Look/Search Inside features for granted—so much so that there’s been virtually no innovation on this front.

In my opinion, we can better help consumers find what they’re looking for, as well as significantly improve the overall content discovery and evaluation process.

Let’s start with a simple question: Why are Search and Look Inside both limited to individual books? What if my first problem is to figure out which book has the most in-depth coverage of topic xyz? Let’s say I want to do some research on the Pittsburgh Pirates, specifically looking for coverage of a former player named Dave Parker. How do I find the book with the most in-depth coverage of Parker?

The typical approach is to search on Amazon. The search results there are initially sorted by relevance, and you might think that’s the end of the story. But all Amazon is really doing is searching the metadata associated with each book; it’s not searching the actual contents of the books to push titles with higher relevance to the top of the results. That means books with that name or phrase in the title often get pushed to the top.

Take a closer look at those search results and you’ll quickly appreciate just how ineffective the current Amazon solution is. You’ll need to skip past the first four results, as they’re not books at all. (I requested “books” only, but the results reflect the challenges Amazon has with internal product types and definitions.) Those are followed by a couple titles that have nothing to do with Dave Parker the former baseball player, but they happen to be authored by another guy named Dave Parker.

This shows how much Amazon’s search prioritizes a book’s metadata; there are probably very few references to “Dave Parker” inside those books, but these titles float toward the top of the results simply because of the author’s name.

Next is a book about Dave Winfield, another former baseball player, which looks promising. The problem here is that it made it to the first page of results because the book’s co-author is Tom Parker. So when Amazon sees “Dave Winfield” and “Tom Parker” next to each other, it thinks there’s a hit because of the former’s first name plus the latter’s last name. Ugh.

At this point you might think the solution is to go to Google Book Search. Take a look at Google’s results, and I think you’ll agree I’m no closer to finding the right book than I was at the start. To be fair, Google Book Search is a better solution than Amazon’s search, but there are still some enormous holes. For example, although Google’s service is searching the book contents, it’s still highly biased by the metadata. Just look at the author names of the first several titles in those search results and you’ll see what I mean. Also, Google is severely limited because its solution is tightly connected to its book preview service. That means Google will only show you some of the pages with hits, hiding many others and then completely cutting off your view once you reach a certain threshold.

What we really need is something like Google Book Search across an entire library, with full visibility into all the content, featuring an algorithm that’s smart enough to focus on true relevance and isn’t thrown off simply by metadata. The results would show two or three lines of the text surrounding each hit so the reader can appreciate the context throughout.

This uber-search would be powerful for some types of books and totally useless for others. For example, there’s absolutely no need for it in the fiction space, but think about how useful it would be in non-fiction areas like business, science, technology, biography, cooking, etc. I see this as a service a publisher could place on its website, dramatically improving the current metadata-only search results you typically find.

In fact, this uber-search vision is a service my OSV colleagues and I are currently exploring with a third-party developer. Before we get too far along with it, we wanted to describe it for the publishing community to see if anyone knows of a better solution that already exists. We haven’t found one yet, but as we roll it out we’ll be sure to describe the process here so other publishers can learn from our experience and potentially embrace our solution as well.

Republished by permission from Joe Wikert’s Digital Content Strategies.


  1. Librarians have been developing book cataloguing systems for hundreds of years. Good, effective cataloguing is big business and vitally important in the library community. And the majority of books now come with a bar code that can be scanned in a few seconds, providing online access to pre-established catalogues like that of the Library of Congress. But as far as I know no booksellers — paper-based or electronic — have ever attempted to provide the same level of access to their material that even the smallest public libraries now supply as a matter of course.

    Do we really have to wait for someone to re-invent the wheel on this one?

  2. “Why are Search and Look Inside both limited to individual books? ”

    In fact Search isn’t so restricted, but the interface isn’t what you expect.

    In Amazon’s Books search box type say ‘candlemark’

    You will be shown all books with candlemark in the title, author or metadata but also books that contain candlemark in the text provided they have a full text Look Inside.

    Mind you, the text may be Copyright : … United States by Candlemark & Gleam LLC, Bennington, but it works.

    A useful example Redoubt: A Valdemar Novel by Mercedes Lackey

    Page 61 : … of candlemarks before he began to get so tired he was having a little …See a random page in this book.

    For other examples, try Kerfuffle or Cloaking Device

    This can’t be done with Advanced search however.

  3. Mike D., do you feel Amazon’s search provides the most relevant results? IOW, are the books at the top of the results list the ones with the highest density of that phrase? My experiments with their search shows they strongly favor the metadata over the contents. The “Dave Parker” results link I provided in the article is just example of this but I found the same to be true for every search I conducted.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail