O’Reilly the victim of a Google security hole? And what about millions of other books?
December 28, 2005 | 5:27 am
By David Rothman
Is this true and new? A somewhat well-known blogger has found “a significant” flaw in the security of Google’s book searching service, and supposedly millions of books are in jeopardy. He tested his books-for-free hack on some titles from none other than O’Reilly, whose owner, Tim O’Reilly, just happens to be one of the main advocates of the Google service.
I myself think the service is cool. But could it put in jeopardy the books that are done in short sections–not just programming guides, but also cookbooks and travel books? Or is the hack too much trouble to bother with, and thus no risk at all? What’s more, is this information really new? It certainly jibes with previous concerns expressed by publishers and others. No, I’ll not provide a link to the blog’s how-to, but I will pass the facts on to a friend at O’Reilly and see what the people there have to say–and whether the hole is plugged up yet. O’Reilly is among the most consumer-friendly of publishers and the least deserving of rip-offs against it (not that piracy is justified against anyone).
Meanwhile I remain upbeat on the Google book-searching service, which should increase book sales. This is just a manner of getting in a fix.
Update, 1:01 p.m. EST: Yep, this so-what’s-new? angle keeps gnawing at me. Perhaps O’Reilly even knows what’s going on, but has concluded that the so-called hole would be good for book sales. Tim O’Reilly is on the road, so a definitive comment will have to wait, but my friend at O’Reilly says that “An editor here also caught it pretty early. I think the comments added to the end by readers are very good and lay out the issues.” Like me, my friend believes strongly in fair use. Existing laws would undoubtedly apply–it isn’t just the length of a quote but also such factors as the importance of the material. My friend speculates that “perhaps we could say, ‘OK, you can read two pages of DNS & BIND, but only five lines of Google Hacks.’ That would match technology to what we consider fair use (although it still puts us in control).”
(Time stamp at the top changed, so this item is more visible in the TeleBlog. – D.R.)



Previous

SUBSCRIBE TO RSS
Comments:
this “significant flaw” is that you can
read a few pages consecutively, which
might be “enough” in some situations,
such as in a cookbook or a travel-guide,
or (in this case) to learn a computer hack.
big deal! here’s the comment i left
at the blog, which probably will _not_
be allowed, because the blogger has to
“approve all comments before posting”.
[deleted first name] said:
[I deleted the quote to prevent cheapsakes from tracking down the blog via Feedster, in case the blogger does let Bowerbird's comments get through. - David.]
um, no. are you joking?
i know lots of idiots who
figured this out long ago.
if you’re too cheap to buy
the book in the first place,
as you [the blogger with the how-to - D.R.]
freely admit, then
o’reilly lost no money off
your tactic, so you “beat”
nobody, you little worm.
enjoy the free hacks!
and spread the word,
because you’re helping
tim escape obscurity…
my goodness, the things
that can pass for “clever”
these days are astounding…
-bowerbird
For the most part, I agree with Bowerbird. My original post raised the issue of whether the hack was really new, and I’ve tweaked the lead of the post to emphasize this. As for how much money is at stake, in the case of O’Reilly and other publishers as a percentage of total revenue, I don’t know. Hopefully not much, if such mouchers wouldn’t buy the books anyway. Still, I’d feel uncomfortable spreading around the how-to, so I don’t.
What’s interesting is that O’Relly itself has published tips that interfere with others’ business models. Not much–but some. I’d side with O’Reilly here, but I can appreciate the complexities. What happens if you tell how to turn a disposable video recorder into a reusble one, for example? Are you playing Fagin?
There is a difference between modifying a piece of hardware that you have *bought and paid for* (emphasis important) to put it to other uses, and engaging in copyright infringement. One is legal, the other is not.
[First name deleted] (the author of the blog where this hack is discussed) mentions spending a lot of time reading different O’Reilly books in bookstores without making a purchase, and example of “real world” behavior that really can’t be prevented but pretty much accomplishes the same thing as this hack. Are bookstore browsing and the Google book search hack examples of behavior most likely engaged in by individuals who don’t have any intention of buying books in the first place?
Google’s stated purpose of Google book search is to help people find books, not to circumvent the process of obtaining a legitimate copy (real or electronic).
“What is Google Book Search?
Search the full text of books to find ones that interest you and learn where to buy or borrow them.”
Simply put, Google book search makes books findable. Are the benefits of making books findable outweighed by a small number of people who are going to take advantage of the system to browse without buying? To use the bookstore browsing analogy: Should brick and mortar retailers put all of their books “behind the counter” to foil the browsers who don’t buy?
On the other hand, does indexing the complete text of published works to make them more findable on the web necessarily have to include making excepts visible in the search results? Is this what the Authors Guild objects to the most, or is it the possibility of Google somehow monetizing content without compensating authors and publishers?
Synchronicity and/or irony: I’m currently reading the O’Reilly title Ambient Findability by Peter Morville where he discusses the importance of findability on the web, and I’d be interested to hear the author’s thoughts on this hack so I’ll send him an email.
Thanks for your thoughts, Brian. I myself hope there’ll be a way to maintain the search service and allow for options such as viewing of some of the short sections. Of course, if a publisher feels that all sections should be viewable, then so much the better!
It appears that the author of the “hack” has updated his post. A number of readers have commented that this really isn’t a hack per se, but a function of how Google Book Search works. It should also be pointed out that O’Reilly is a partner in the Google Book Search program (which is opt-in by the way) and provides the books to Google while controlling how much of the book can be displayed http://books.google.com/googlebooks/author_faq.html#4
It sounds to me that if O’Reilly was really concerned about this, they’d limit the number of page views available, which is pretty liberal. While the author of the “hack” implies that entire books can be read using his technique, based on the viewing restrictions (again, controlled by the publishers who opt-in) this is very unlikely or easily remedied while maintaining the value of the service so that this becomes a non-issue.
geez, david, why all the secrecy?
it’s not like a determined person
can’t easily find this themelves…
and your cutting of my quote of
what that blogger actually said
makes my comment unintelligble.
(he thought he had uncovered a
“hole” that was “significant” and
thus had “beat” google’s system.)
for anyone who wants to know
the u.r.l. of this silly blog post,
e-mail me at bowerbird at aol dot com
or — since the blogger _did_ post
my comments, you can probably
just google them. (ironic, isn’t it?)
by the way, a whole flock of people
have now commented that this “bug”
is actually the way the system works,
with “protection” against a person who
tries to read the whole book this way
in the form of “restricted pages”…
-bowerbird
by the way, david, it’s kind of _unsettling_
– to me anyway — the way you so casually
rewrite history and call it “tweaking the lead”.
it seems it would be easy enough to instead
append updates in a clearly-marked section,
thus leaving the dialog-trail free of distortion.
it’s only fair to own up to what you really said.
your blog, your call, but that’s my opinion…
-bowerbird
Bowerbird, I may be POed about Draconian copyright, but I’m gonna run this blog as best I can with both legal and ethical concerns in mind. In O’Reilly’s case, folks have treated Net folks well over the years. Maybe by now this thing has been Slashdotted, but until something like that happens–and I trust you won’t be the one doing it–I’ll withhold info that people could use to rip off O’Reilly. That is why I’ve been so gung ho on not providing links to the offending blog or clues to its identity.
As for your concern with the new lead, what a hoot! Does this mean you’ll go after PG in a big way for not disclosing trifles like the origins of their electronic editions? OK. Happy holidays and don’t overdo the recreational drugs.
David
P.S. Later today I’ll post O’Reilly editor Andy Oram’s comments, and people can see that O’Reilly may indeed want to make things tougher for mouchers. May. Tim O. himself was out of town. Andy’s just speculating.
Within the first six months after Google Print (Publisher) was launched, there were web discussions on how Google Print could be gamed to display entire books depending on the percent of the book the publisher allowed to be displayed. If I recall correctly, some of the hacks may have allowed one to read the entire book even if the publisher had set the limit to less than 100%. I thought perhaps this was a resurgence of one of these discussions, but in reality this is merely someone rediscovering the complaint that many authors have about Google Book Search (Publishers), which is that the publishers set the percentage of the book that may be viewed high enough that entire chapters may be read. In some cases, this means that instead of buying a book, the searchers will read the one chapter they really wanted out of a book at Google Book Search instead of buying it. So if publishers consider this to be a problem, they’ll either decrease the percentage of the books available for viewing, or try lobby Google to modify the book search on the publisher side to be more like the library side when a book is not in the public domain.
i really don’t understand why you give
any credence at all to this whole matter.
there’s nothing new in this “revelation”.
and there’s no need to “keep it secret”;
a bloody idiot can figure out how to do it.
cheapskates i know realized this option
right off the bat. it’s not rocket science.
and if the people over at o’reilly didn’t
see this supposed “hole” before, then
maybe they ain’t that smart after all.
(andy oram, are you listening?)
but i assure you the cookbook people
and the travel-guide people, and most
publishers that put out books that are
purchased on the basis of “a few pages”
sprinkled thoughout are well-acquainted
with the need to “restrict” those pages.
of course, such books are the ones that
have always exploited this “necessity”
for buying the whole book when all you
_really_ wanted was those few pages, so
maybe it’s time to cut off those publishers.
(in the same way that itunes has helped to
hobble the record companies that put out
“albums” with just one or two good songs,
knowing we’d have to buy the whole thing.)
it’s not up to google to “plug” this “leak”.
the publisher just needs to specify that
all these “special” pages be “restricted”,
up to and even including the whole book.
hiding the totality of your content online
is like shrinkwrapping your paper-books –
it’s a strategy that is doomed to failure…
so i _encourage_ all publishers to opt-out!
it’s an excellent i.q. test for the new world,
and the fastest/best way to thin the herd.
-bowerbird