2

A random comment on an IRC channel today recalled my attention to Booklamp, and the Book Genome Project, which we’ve mentioned a few times in years gone by. It’s been a couple of years since it was introduced, and I decided to take another look at the site and see what it looks like.

Booklamp is a means of tackling the discoverability problem: how do you find a book you might want to read in the modern world, when you might not often visit a library or even bestir yourself away from your computer? I certainly can’t fault it for that. It’s a problem that needs to be solved; consequently, there are many different efforts put forward at trying to solve it. But Booklamp has a couple of major problems that seem to limit its usefulness to me.

The system under which Booklamp works is intended to be similar to the system the streaming music service Pandora uses: it breaks a work down into specific “genetic” patterns, and then cross-references all books in its library for books that use those same genetic patterns.

The Da Vinci Code has high ratings for “History/Academics/Culture” and “Catholic Institutions/Religious Hierarchy,” for example. When you look for a list of suggestions based on it, you get three other Dan Brown novels, as well as Unholy Grail by D. L. Wilson and The Fire: A Novel by Katherine Neville, which use many similar elements to those from Da Vinci Code. I haven’t read those books, so I can’t tell how well they match DVC, but it at least seems plausible they might. So far, so good.

The two major problems are:

1. Lack of selection. Booklamp relies on publishers opting in and allowing it to index their content. So far, only a few major publishers have. For example, Baen hasn’t, so there’s nothing by my favorite authors Sharon Lee and Steve Miller or P. C. Hodgell, and only a relatively few titles by prolific writing machines David Weber and David Drake. So while it’s possible you might absolutely love Agent of Change or On Basilisk Station based on similarities to other titles that you like, Booklamp isn’t going to tell you about them unless Baen should decide to opt in.

And even then, that’s not going to provide the catalogs of the other major publishers, which must all be added one at a time. If you stick with Booklamp, you’re only going to be recommended a few books out of the much wider pool that you might better enjoy. I believe that Pandora is able to take advantage of the same compulsory licensing that radio stations get to use, which is why it can have such a huge library of songs to play. But there isn’t compulsory licensing for books.

This is the same problem that Google faced when it wanted to digitize books for its Google Books index. If it limited itself to just those publishers and authors who opted in, it would be nowhere near comprehensive enough to be useful. So Google gambled that a complete index would be useful enough to win a fair use case on the merits and went ahead and scanned everybody’s—and so far, Google does seem to be winning.

You would think that categorizing books based on the elements they included would be even more likely to be a fair use than scanning all of them, wouldn’t you? After all, that’s precisely what unauthorized guides do, and those are generally considered fair use unless they appropriate too much from their source material (as did the Harry Potter Encyclopedia.)

But apparently to come up with its categorizations, Booklamp does have to scan them, or at least run the text of their e-book versions through analysis software. It doesn’t rely on volunteers or employees to categorize them manually like Pandora does. And running the books through analysis software could be considered a copyright violation—and Booklamp/Book Genome don’t have deep enough pockets to fight a legal battle like Google did. (But you never know: if Google Books holds up in SCOTUS, perhaps the precedent might help Booklamp too.)

This mechanical nature of the analysis is what leads to the second problem:

2. No Sense of Humor. Not in the sense that the site itself is dry and humorless, but much like Lieutenant Commander Data from Star Trek, it cannot understand humor. Nowhere in the listings of the available book genomes is anything akin to humor, parody, satire, or comedy—because those aren’t things you can detect mechanically. If you look up Terry Pratchett’s Guards! Guards! you find, apart from a zillion other Terry Pratchett books (Pratchett is another writing machine), titles like Brothers in Arms by Don Perrin, The Redemption of Althalus by Leigh Eddings, and Dragons Pearl by Devin Jordan. Looking closer at Brothers in Arms, we find this description:

The innocence of youth lost in war… In the fiery siege of the city of Hope’s End the young mage Raistlin must leave behind his ideals to save himself and his brother. Yet as Raistlin and Caramon train as mercenaries, far away another soul is forged in the heat of battle. Another path is chosen, and a future dragon highlord begins her rise to power. She is Kitiara Uth Matar, the twins’ half sister.

So a Discworld novel, from one of the best-known humorous fantasy epics ever…brings up as its first non-Pratchett listing a recommendation for a Dragonlance novel, from one of the best-known angsty fantasy epics ever. (That’s, like, the opposite of humor.) The other two books are similarly serious fantasy stories. What if I specifically want to read another funny fantasy story? Booklamp isn’t going to help me find it.

Pandora actually had a similar problem early on. When I would search on songs by “Weird Al” Yankovic, I would get not other parody songs, but other songs that had the same musical sound as the songs Weird Al parodied. However, Pandora since remedied this in a big way, by adding on an entire system of comedy classifications which covered not only parody songs but also sketches and shows. I’m not sure I see any way for Booklamp to do the same, given that Pandora categorizes with human beings who can recognize humor when they hear it.

A better bet for a “book genetics” site that is not only more inclusive but more understanding of all aspects of a work’s composition would be TV Tropes. Because it’s made up of user contributions, it is not only not limited to just those books whose publishers permit their inclusion, but also fully understanding of humor and any other quality of a book that cannot be so easily detected by a soulless machine. (Of course, it can also be ruthlessly addictive.)

And a better bet for a site that can help you find other books to read based on your tastes rather than individual elements of the book—or at least, it would be if it was working right now—is something like Alexandria Digital Literature’s Hypatia, which worked by cross-referencing your opinions of your favorite and least favorite books with those of other people to come up with guesses based on what the people most like you also liked that you hadn’t read yet.

Regardless, the Book Genome Project’s Booklamp is a nice try, but I can’t really see any easy solution to the two major problems that keep it from being entirely useful.

 
2