How Big Data Can Tell You Which Book To Read Next

JP George

If you enjoy reading, but still haven’t foundyour next book to cozy up with, your smartphone might be able to suggest one. Artificial intelligence (AI) is now able to rank literature to predict the next bestseller – a kind of recommendation system, not based on metadata, but on the patterns and themes found in books.

Publishers around the globe are mining all kinds of data, including what’s in the books themselves, in search of the magic formula for evaluating a book’s market potential. With more informed marketing, publishers hope to better target their customers.

Recommending the popular novel

So, how does AI determine what we want to read? It turns out that certain emotional patterns keep us engaged and interested while reading a novel. Kurt Vonnegut first described the curves of emotional plotlines in 1995. Now, with the help of AI sentiment and emotion analysis, such plotlines can be extracted quantitatively. By combining these plotline curves, researchers from the Stanford Literary Lab claim to be able to detect the next blockbuster novel.

Machines think from data

Under the hood of such an AI sits Big Data and machine learning (ML). The concept of Big Data doesn’t just mean lots of data, but also that the data comes from many different data sources and types (e.g., audio, video, images, text, etc.) that are often unstructured (unlike traditional databases with well-defined fields). ML involves statistical algorithms that utilize sets of multi-type, unstructured data to predict class membership. This is possible by either knowing ahead of time which classes exist and training the ML algorithm by example (supervised learning) or letting the algorithm discover the underlying patterns (unsupervised learning).

ML methods include embedded vector space techniques (principal component analysis, K-nearest neighbor, and support vector machine), decision-tree based techniques (classification and regression tree, random forest), gradient and Bayesian-based methods, artificial neural networks (ANN), and others. Many tutorials on machine learning methods can be found here.

ANNs were among the first algorithms to be applied to solve problems in AI, beginning as long ago as the 1940s. For many reasons, their use has waxed and waned over the years, yet interest has recently resurged along with the unprecedented advance of deep learning. This growth in deep learning has lead to what the New York Times calls the great awakening, given Google’s ability to translate text into more than 100 languages.

How AI uncovers sentiment and emotions from text

Imagine automatically extracting the sentiment or emotional impact of a literary work. For a computer to understand a text, what is called natural language processing (NLP), AI algorithms first find a mathematical representation that a machine can understand and that contains maximal information about the text. A simple representation called “bag-of-words” (as the name implies) is a collection of words that appear together, but with no other particular nexus, from which the frequency of word groups could be ascertained. This may provide enough information for classifying themes, but would fail miserably at understanding sentences if word order is important.

Two representations that can quantify information associated with sentence word order are Word2Vec and GloVe. More about NLP representations can be learned from this tutorial, while a tutorial from TensorFlow on Word2vec is found here.

Once sentences are converted to a meaningful representation, a language model is needed that discerns positive emotions from negative emotions. One method would be to use a supervised learning procedure with deep neural networks, as has been done to understand movie reviews. Another way is to allow the deep neural network to discover the emotional patterns by itself. This is the true power behind deep learning: its ability to teach itself, and with more Big Data, to learn more.

Through this process, the ML can understand at text’s major themes (from the word groupings) and emotion. These factors are the fundamental ingredients for an AI application that will recommend a novel.

From creating Animal Farm summaries to discovering who will be the next Danielle Steel, AI is revolutionizing what and how we will read in the future.

For more on using ML to upend the competition, see Why Machine Learning and Why Now?


About JP George

JP George grew up in a small town in Washington. After receiving a Master's degree in Public Relations, JP has worked in a variety of positions, from agencies to corporations all across the globe. Experience has made JP an expert in topics relating to leadership, talent management, and organizational business.