Can We Use Machine Learning To Predict Box Office Success?

Surya Kunju

Machine learning, sentient artificial intelligence, humanoid robotics—all of a sudden these terms don’t feel as strictly “sci-fi” as they once did. Films like Her and Ex Machina offered visions of a digital future that felt almost close enough to touch, in the sense that the very same technology could feasibly be in our own hands soon.

Machine learning in particular has seen strong progress in recent years, with the likes of Google, Amazon and SAP breaking new ground in creating algorithms that learn from data. In the spirit of Toronto International Film Festival (TIFF) and SAP’s Our Digital Future film series, why don’t we get down and dirty with a little machine learning to help us predict the success of a soon-to-be-released movie?

Let’s use La La Land, a comedy drama in which a jazz pianist falls for an aspiring actress in Los Angeles, as our test case. Ahead of the film’s Canadian premiere at TIFF ’16 and its full release in December, how can we determine the biggest factors in how it will perform at the box office?

The answer is in using predictive analytics, an aspect of machine learning that depends greatly on historical data. In today’s world, we can pull historical data about movies from sources like IMDb (an online database for movies) and others. Some of the key data points for our test include the starring cast, genre, the film’s MPAA rating (in this case PG-13), production budget, country of origin, and runtime.

Another factor is the film’s critical reception, both from the media  and from movie database users. There are also technical data points such as sound mix, aspect ratio, camera, laboratory, negative format, cinematographic process, and printed film format. The target variable is movie revenue.

Using the above data, I created a classification model using predictive analytics. Here is one of the most crucial outputs of the model—contributing variables:


La La Land is written and directed by Damien Chazelle of Whiplash fame, is of the same genre, has a similar MPAA rating and has some of the same technical data points. With star power being the most important variable, however, we are left with the burning question: Has the director put together the right cast, and was he right to pair Emma Stone with Ryan Gosling?

To find out, I used a technique called social network analysis. I began by scraping data using publicly available Twitter APIs for mentions of #EmmaStone. I then filtered the data to show only male lead actors as part of the hashtag. Below is the graph I created to show the strongest recommendations to play Emma Stone’s love interest.

lalaland_Emma_Stone_costarThe width of the line from Emma Stone to Ryan Gosling doesn’t lie—it’s a perfect match. Is it simply that the casting director is a genius, or has he been making use of machine learning himself? Either way, our little glimpse into the world of machine learning has resulted in the prediction of success for La La Land. Now we just have to wait for the film’s release to test this theory out.

For more on SAP’s Our Digital Future film series at TIFF ‘16, including the film screenings and related events, click here. For more on machine learning, download this white paper

Surya Kunju

About Surya Kunju

Surya has over 13 years of international experience in helping organizations adapt to cutting edge machine learning/analytics and data warehousing technologies to help them gain competitive advantage. Currently, he oversees SAP's strategic field enablement programs with the responsibility of driving the company's sales initiatives for the enterprise-class Predictive Analytics platform. He often engages in projects where he executes end-to-end proof of concept of analytical solutions, from ETL, modeling building, to deployment into operational environments, to drive optimal business outcomes.