Maine, Margarine, And Mozzarella: Correlation Is Not Causality

Bryan Lapidus

My daughter said something recently that was totally logical to her and unintentionally funny to me. We saw a former NBA basketball player earlier that day who stood about 7 feet, 6 inches tall, after which she said playing basketball must make you tall. At this point, her overly eager and analytic father decided this was a teachable moment and explained that “correlation is not causality.” Playing basketball does not make you tall, and being tall does not make you a basketball player, even if those two traits often go together, i.e., are highly correlated. She nodded her head. She is nine years old.

A correlation coefficient describes the relationship between a set of data points and the line that approximates the data. Imagine a line through a scatter plot or cloud of data points. The line is simplification: a model that attempts to represent the data to make it easier to study. Correlations are measured by the “r value,” on a scale from positive one (perfectly correlated) to negative one (perfectly negatively correlated). A value greater than 0.5 (or less than -0.5) is necessary to infer a strong relationship.

Sometimes, things that appear to be highly correlated in fact are random events that have no real-world connection. I explained this to my daughter, and then for fun, showed her a website for a list of funny, spurious correlations. For example, the number of people who drowned by falling into a pool correlates with the number of films that Nicolas Cage appeared in (r=.666). The divorce rate in Maine correlates with per-capita consumption of margarine (r=.993). And per-capita consumption of mozzarella cheese correlates with civil engineering doctorates awarded (r=.959). She nodded her head again. She is still nine.

How correlations can help data analysis

Correlations can be applied in several ways to help you sort through data and find useful, predictive signals. One common application is to look at various company marketing activities – campaigns, promotions, or advertising as related to sales at various levels, product line, channel, or promotion. Correlations also are helpful in comparing performance relative to benchmarks or peers. A standard regression in a spreadsheet can help to determine the correlation coefficient between data sets. In a supply chain, a business might look for a relationship between a specific part and product defects.

More advanced business intelligence tools are providing access to higher levels of correlation. For example, correlation-clustering looks at sets of data such as customers or product attributes and herds them into groups based on similar characteristics. Those groups, or clusters, will have a high correlation to each other, which allows marketing to analyze them as a group and determine the best way to service their needs based on similar attributes.

Establishing correlations is hard work, and often our understanding changes over time. In each example above, there could be other factors that cloud the correlation. For example, sales may spike due to overlapping ad campaigns that make it hard to tease out the true driver that deserves additional advertising dollars. Company performance may track a benchmark for a period, such as GDP or a commodity, then suddenly change due to hedging or idiosyncratic reasons.

Cause and effect? You decide

Here are a few examples of events that companies claim have impacted their sales. You can decide if they were true factors or spurious correlations: weather (too hot, too cold, snow); congestion in shipping lanes; the Pope’s visit to the United States (a restaurant); the inability to create the perfect cardigan (a retailer of casual clothing); and the superstitious fear of the number 13 limiting the number of weddings in 2013 (a retailer of formal clothing).

This type of data mining is an immensely powerful tool, and enterprises need to use it well. We need to look for meaningful correlations while simultaneously not falling prey to false correlations. This is difficult in a world of imperfect and imprecise data. To be at our best, we must understand our business drivers and review analyses with a critical, or even skeptical, eye to determine whether relationships exist, how strong they are, and whether they are useful in predicting outcomes. After all, as my daughter now realizes, not all tall people play basketball.

Want to see how FP&A can work better with other lines of business? Check out AFP’s “How FP&A Can Become a Better Business Partner” guide for more information.

Bryan Lapidus

About Bryan Lapidus

Bryan Lapidus is the director of the FP&A Practice at the Association for Financial Professionals. He has more than 20 years of experience in the corporate FP&A and treasury space at organizations like American Express, Fannie Mae, and private equity-owned companies. At AFP, he is the staff subject-matter expert on FP&A, which includes designing content to meet the needs of the profession and helping keep members current on developing topics. Bryan also manages the FP&A Advisory Council that acts as a voice to align AFP with the needs of the profession.