Data-Guy-speaks-at-DBW16-2 - By Porter Anderson
Image by Porter Anderson, Publishing Perspectives

NEW YORK CITY — So what’s “Data Guy” like on stage in person at Digital Book World March 9? And can he win over at least some of the skeptics?

First, a little catch-up for latecomers. Teamed up with best-selling novelist Hugh Howey but still hiding himself behind the nickname, Data Guy has ferreted out byte after byte of provocative statistics about the publishing industry.

One big theme of this pair’s Author Earnings Web site is that self-published writers claim far more of U.S. book sales than the usual suspects say they do in an era of e-books and print on demand. And that’s roiled the waters, along with Data Guy and Howey’s conviction that big publishers underpay writers.

If authors can successfully bypass publishers and sell directly to readers through Amazon or otherwise, then will many houses go out of business? What happens if even an independent writers can search for a Max Perkins – well, maybe not an editor of that brilliance, but at least a pro, given all the talented people whom major U.S. publishers have shed over the years amid corporate downsizing.

Data Guy himself has self-published – under his own name, I learn – so he is walking the walk. But why the anonymity with his DG cape on? He’d rather we focus be on the data and not the guy. In the past he has used a picture of a spider to depict himself visually – a play on the expression “spider,” as in to index and analyze a Web site. Amazon is the one Data Guy most cares about. While Amazon can be shy about the raw numbers, it does offer rankings. And Data Guy’s friends say he can work magic with what is available.

Now I finally see Data Guy on the stage. He is of average height and build, with olive-colored skin and dark hair. Data Guy speaks in an authoritative way in a baritone voice. He has said he worked as CTO of a major video game publisher, and I believe him.

Here’s his DBW presentation, reflecting the change-in-in-the-air message of the Author Earnings Web site. Anyone who listened to his keynote will not forget him anytime soon. He leaves the audience wanting for more even he has not converted everyone there.

I interview him afterwards for TeleRead, and he goes on at length about the publishing industry, where the inspiration to get the data came from, and what opportunities lie ahead for publishers and writers.


When Kobo says that – and I am speculating here – they’re talking about a worldwide basis. The majority of their sales are outside the United States as I understand. As we saw, in the market share breakdown, they count for about 4 percent of U.S. sales. The U.S. take is a significant share of indies. I don’t think that is reflective outside of the United States.


The way I look at criticism is: all for the good. Some of it is reactionary, and there is really no basis to it, and you ignore those. Some of it has been excellent, and saying, “Well, hey, have you considered this factor?”

I will get a lot of contacts – back channel contacts – saying, “You’re not factoring in the fact that under agency the discounts were a lot steeper than your model and your spreadsheets.” So I will go back and look and say, “Is this true?”


“You’re missing this vast number of traditionally published sales in the form of pre-orders and not included in your data.” That kind of criticism I absolutely love. It’s data-driven, actionable. If I can go in and make our process better as a result of it, I love that.

If you’re coming from traditional publishing background, you’re thinking of pre-orders all stacking up on the day of release. Therefore you go, “That number on ranking can’t possibly be 5000 sales.” It’s hundreds of thousands of sales on a big five pre-order. In the Amazon world, those hundreds of thousands of sales are advertised day-by-day over a period of months. The spiders collect those and capture those. So it’s easy to understand that someone looking at it from a New York Times bestseller perspective could say: “Oh no, all of the sales stack up on that day; because that’s how the New York Times and USA Today calculate it.”


I think people make too big of a deal about the guy part and not enough about the data. I can understand that. Numbers are boring. Many of us can look at the numbers and cross our eyes and go to sleep. Focusing on the personality involved in publishing is much more exciting. I can see why so much of the focus has been on who is doing the data and not the data itself. Data Guy could be anybody. I know at least three other people in this industry who are capable of doing this kind of thing.


Amazon’s product pages. It’s all publicly visible. The spider reads the products pages and records every piece of metadata associated with those titles. It’s something we could do with a notepad and a pencil, but when you are doing 200,000 of them in an hour you would need an army. Instead we let the computing power take care of it. And then we look at what comes from it in a spreadsheet standpoint.


Interpreting the data is just Hugh and I. I will make a first pass at it. We will have some conversations about “Hey, it would be interesting to look at this effect” or “What is print sales doing?” or “I wonder what happens with international.” Then I will do a first pass at it. Hugh will come back and look at it and say, “You know what? I don’t think you accounted for this or that’s maybe too nerdy and will bore our audience.”

Then we will both write the report together. He will write some sections. I will write some sections. We will edit each other’s work, and at some point we will say “You know what, that’s good enough,” and put it out there.’


The data collection part takes under an hour now. We are throwing a lot of hardware at it. Then it’s crunching and analysis – if we are doing just what have done before, pie charts and that kind of thing, it’s quick. Maybe three, four, five hours of number crunching. If we are trying to do something new, I might spend a couple of days on it. Trying to figure out what I am seeing, I’m looking at it in different ways. The writing up depends on Hugh’s schedule and my schedule. We will bounce back and forth sometimes for a week or two.


I did. I didn’t actually bring the software over. I had written similar software for the video game industry for mobile game app sales on Apple and Android’s Google Play Store. I worked for a few publishers in the video game world, and that’s how we did our market analysis. We looked at what competitors were selling, what genres there were opportunities in, and this was exactly the technique we used.

When I became an author and started looking at the Amazon market, and going you know what, no one has any data on this. The solution to going and getting that data was second nature.


There was so much emphasis on data that it wasn’t a case of me trying to convince anyone. I was CTO of one of the big video game publishers and they were pushing me, saying we need insight into this. Figure out a way.

The minute we had it, we were having meetings at the executive level, breaking down the market saying there are opportunities in tower defense, in real-time strategy, opportunities in casual runner, but only one that monetizes via advertisement not via in-app purchase. It’s a very different industry, it’s one that has had to react in real-time to a lot of changes. So perhaps they were better equipped than publishing to use that data right away.


It’s very polarized. Right now, the people who feel the most strongly negative about the data (it’s all wrong, it’s agenda-driven and therefore inaccurate, the data is being massaged to tell a certain story), those people are generally fairly silent because they have no incentive to contact us. In their minds, they have a conclusion about what this is and so what are they going to do? Tell us I know you are lying. But the kind of contacts I have gotten that are extremely valuable are the ones that say “I think you are missing this. I think you are not accounting for that. Have you tried looking at this because your numbers are wrong for x, y and z?”


Let’s say I am a big publisher and I am choosing what mix of genres to publish next year. I got a publishing schedule that probably goes out a year or two in advance. I know indies with relatively modest production schedules compared to that of a large publishers that already have their next 20-30 books mapped, planned with release dates. So being able to take real-time data, [the publisher can conclude:] “You know what? I think we are overinvesting in historical romance or paranormal or you know, what this particular series is petering out. Am I really going to go for the bucks in continuation or start a new series in a market that is ripe?”

A great example of how this real-time data is being used is on kboards right now where a bunch of different authors in the romance space are slicing and dicing the our data and going, “You know what? There is marketing opportunity here.” You need to get some titles out there for – I forget what subsection of romance – but they are sharp and sophisticated on how they use this data.

So when your data comes in four months late and you have already got your pipeline planned, you cannot react to it.


You have to do both. You can only be a trendsetter if people follow.


Time. This is primarily a curiosity-driven volunteer thing. Hugh and I both feel passionate about the industry. We love writing, we love writers, we are writers, and so this is our way of giving back to the industry, but there are only so many hours in the that we can dedicate to this kind of thing. And I will look at a lot of questions, and boy curiosity compels me to dig into a lot these things, but honestly I don’t have the time. So that is one of the hidden agendas in putting the data out there in the hopes that people will pick it up and continue the work and look for things that we wouldn’t even have the insight to go look for. And then some of them hopefully contribute that knowledge back. We have seen some of that. Time is the biggest limitation.


dataguydebutauthorsEarlierI think a lot of it goes back to those new authors and those midlisters and making sure they have the opportunity to grow into tomorrow’s best sellers. Their plight is kind of hidden in that big set of purple bars that represents the Big Five at the higher price points. That’s all primarily the big names. We are not going to see that effect now. We are going to see that negative impact two to three years from now when the next crop of sellers and the ones after that fail to materialize in the publishing world, when the big sellers today start to slow down their production or retire – old writers never retire – where is the next crop coming up to replace them? Today, they are all indies. So you get the Andy Weir career path, start out in indie, grow an audience from the grassroots level with reasonable prices, and then make the transition over the traditional publishing to get the casual market and print distribution on a broad scale.

DataGuyDebutAuthors1Andy was probably a lot less expensive that an Andy Weir will be in two to three years from now when this model is a little more out there and this internal pipeline of those midlisters and debut authors have failed to materialize in the traditional publishing realm because no one is buying them. They’re not gaining their audiences. Then all of a sudden an Andy Weir two years from now is gonna say, “That million and half – I got five offers from five big publishers. I think the floor for the bidding starts at 3.”

The profit margin publisher can eke out with these external acquisitions like YouTube stars, etc., will go away. That’s why they need to have this healthy robust pipeline of authors that are being nurtured coming up in the system, giving a chance to find their own audience at a lower price point and then graduate in the same way other authors used to graduate from mass market paperback to trade paperback to hardcover.

Now it’s from $3.99 e-book to $7.99 e-book to eventually 12- and 15-dollar e-books.

I look at my own habits. I will still buy the next Lee Childs that comes out, the next Stephen King that comes out. But if I see a fifteen-dollar book from an author I don’t know, the chances of me buying it from an author I don’t know are basically zero unless I get an author recommendation. In, the meantime, I am plowing through indie e-books and small publisher books, and Amazon imprint, not because they are indie or amazon. I don’t check.

It’s always dangerous to take your own experience and impose on others in the world. I know avid readers. They think like i do.

Fifteen years ago, I was walking through Waldenbooks, Borders, several times a week going “That one, that one.” I read them all here, and there no new books for me. Mass market paperbacks, I would buy online. Now it’s all E. I think that’s very typical behavior pattern for avid readers who weren’t avid readers in the past, and now are still avid readers. They are more likely to have made the shift from physical to digital. More options, more immediacy. You finish a book at two in the morning, you’re not tired, you download another.


I am always a little careful of [causality]. Undoubtedly, we have seen changes in the marketplace that are Kindle Unlimited driven, but to be able to sort out what percentages of those changes are due to to introduction of Kindle Unlimited and due to the higher prices on the trade side; it’s tangled mess. How to untangle, I just don’t know. Having said that, a relatively large portion of the paid downloads today of indie books and the smaller sides of micro presses is Kindle Unlimited. I would say, 25-27 percent, which is significant. About 70-some odd percent that are indies are Kindle-Unlimited participating. From our data, look at page reads, about 54 percent are paid Kindle Unlimited full-read equivalents. Doing that we can see that about half of three-quarters of indie sales are in fact Kindle Unlimited downloads. That is not a small effect on the market.


They had business models that made no sense. Their hope was that they would get enough of critical mass that they could go back to the publishers and renegotiate, and get paid a full whack for a borrow. It doesn’t make sense economically, so how about this? But they never got that point. They never got that critical mass. Amazon never had to face that issue because they had their supply to drive that. And today probably three-quarter of Kindle Unlimited downloads and reads are non-traditional reads. That’s pretty substantial.

Related: Publishing Perspectives on Data Guy.



The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail