Scientists have analyzed what goes into a best-selling or poorly-performing novel, and come up with an algorithm that predicts a book’s commercial success with an 84% success rate. Oddly enough, the criteria for commercial success seem to be the same sorts of advice you get from writing coaches and workshops:

They found several trends that were often found in successful books, including heavy use of conjunctions such as “and” and “but” and large numbers of nouns and adjectives.

Less successful work tended to include more verbs and adverbs and relied on words that explicitly describe actions and emotions such as “wanted”, “took” or “promised”, while more successful books favoured verbs that describe thought processes such as “recognised” or “remembered”.

So, there’s more nouns and adjectives, fewer nouns and adverbs…and even “show, don’t tell.” Funny how that works. The article also mentions using Dan Brown’s The Lost Symbol as an example of a “less successful” book, because despite its commercial success the critics didn’t like it—and besides, what article on literature is complete without a Dan Brown potshot?

I was going to wonder what it might mean for the world if the publishing industry started using this algorithm as a predictor of success for books in their slushpile—but then I realized, they essentially already do. It’s just that they’re called “people,” who use their own internal algorithms to decide whether to accept or reject a particular book. I wonder whether the human slushpile pickers are more or less successful than 84% at picking winners?

But thinking about it further, most major publishers reject slushpile entries without even a glance these days just because nobody has time to go through them and figure out which ones are any good. Perhaps feeding the slushpile into a computer using this algorithm to separate the sheep from the goats could serve as a sort of pre-reading filter, allowing publishers to consider as many “good” works as possible when they would previously have had to reject everything.

Of course, with only an 84% success rate some sheep would be rejected and some goats would make it through, but that’s better than turning away all the potentially good manuscripts, or having to sift through all the dreck in search of the few good submissions. Could this help publishers compete with self-publishing by turning away fewer of the writers who would otherwise go on to publish themselves?

For that matter, in readers’ hands this algorithm might assist in picking through the “Internet slushpile” of self-published titles. It wouldn’t necessarily tell them whether they would like a book, but at least would let them know whether it was badly-written before they bought it. (Though, granted, sample chapters can already do that now at least to some extent.)

I also confess to wishing I could run some of my own Internet fiction through it to see how well I scored…

(Found via The Digital Reader.)


  1. There is one giant sucking hole in this analysis. Narrative has changed over the years. Most of the books that were successful a hundred years ago would drop like a stone in the current market, if it were published. Heck, popular novels of ten or twenty years ago wouldn’t make it by current tastes.

    Narrative has dwindled in importance since the first novels. Compare a novel of a hundred years ago to one today, and you’ll see what I mean.

    What you’ll find is that descriptions, dialogue, sentences, and narrative have all simplified.

    Descriptions aren’t as detailed, and you certainly won’t find long pages of descriptions of the countryside, the houses, or the clothes.

    The narrative has become more intimate with the author less intrusive. The reader is put dead center into the character’s head and thoughts, and the intimacy tends to only be for one or two characters, not every character in the novel.

    Instead of omniscient, the current standard in fiction is third and first person. Most fiction is written in warm third person with occasional forays into cold third person. Hot third person tends to be only used in romance which is about emotions.

    The paragraphs are also shorter.

    The dialogue carries more story weight because it must give the reader more information about what the characters are thinking and seeing as well as advancing the plot.

    In other words, much of the fat of the novel has been trimmed because modern readers want only the meat and bone of the story.

    Paul’s comments section is closed again, by the way.

  2. “There is one giant sucking hole in this analysis. Narrative has changed over the years.”

    That’s what immediately came to mind when I saw the first article. In fact, it was so obvious that I wondered how anyone could take the research seriously. Yet, that’s just what seems to be happening.

  3. While I agree the sample input is questionable, the concept, I think, is sound, if applied to a more modern selection of works.

    Suggesting fewer adverbs is sound advice, although their sample bias definitely shows in the suggestion of more nouns and fewer verbs.

    Did anyone read the study in detail to see if they got into point of view?

    Curious, Marilynn, I’ve not seen your “warm”, “cold” and “hot” third person terms before. Quick definitions? Google was stumped too. Thanks!

  4. It does not matter whether publisher already sort properly through their own slushpile. Sometime they succeed, sometimes they fail. As a wild but educated guess, I would say that any machine learning algorithm would probably perform better than any human reviewer. Why? Because the human judgment has many unwritten and unreliable rules, while a data driven learning algorithm can learn formally that a choice was bad, and mark it clearly as such.
    But well, wild guesses make good surprises too, which would immediately exclude conservative publishers…
    To add a final nail to the human coffin, the algorithm could be programmed to include those excentric wild guesses (it’s damn easy, actually), and to learn from these, just like Amazon does for its pricing algorithms.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail