Data Science “Paint By Numbers” With The Hypothesis Development Canvas

Bill Schmarzo

When I was a kid, I loved Paint by Numbers sets. They make anyone who can paint or color between the lines feel like Rembrandt or Leonardo da Vinci (we can talk later about the long-term impact of forcing kids to “stay between the lines”).

Now the design world is applying the “paint by numbers” concept using design canvases. A design canvas outlines what’s important given the subject area and then allows the “painter” to color in the right information. A design canvas is a one-page operational template designed to capture all of the different perspectives necessary for successful execution depending upon the problem being solved.

A great example of a canvas is the Business Model Canvas, courtesy of Strategyzer (see Figure 1).

Figure 1:  Business Model Canvas, courtesy of Strategyzer

The Business Model Canvas forces you to “paint in” your business’s value proposition, cost structures, revenue streams, supplier network, and customer segments. Overnight, everyone can become Jack Welch!

Now you are ready to take the next step from a Big Data MBA perspective by building on the Business Model Canvas to flesh out the business use cases – or hypotheses. This is where you can become more effective at leveraging data and analytics to optimize the business. That next step involves the newly created Hypothesis Development Canvas.

Introducing data science paint by the numbers

The one area of under-investment in most data science projects is the thorough and comprehensive development of the hypothesis or use case that is being tested. That is, what are we are trying to prove out with our data science engagement, and how do we measure progress and success?

To address these requirements, we developed the Hypothesis Development Canvas, a “paint by numbers” template that you can populate before executing a data science engagement. The aim is to ensure that you thoroughly understand what you’re trying to accomplish, the business value, how you will measure progress and success, and the impediments and potential risks associated with the hypothesis. The Hypothesis Development Canvas is designed to facilitate the business stakeholder–data science collaboration (see Figure 2).

Figure 2:  Hypothesis Development Canvas

The Hypothesis Development Canvas includes the following:

  • Hypothesis description and objectives – what the organization is trying to predict and its associated goals (e.g., reduce unplanned operational downtime by X%, improve customer retention by X%, reduce obsolete and excessive inventory by X%, improve on-time delivery by X%).
  • Hypothesis business value from the financial, customer, and operational perspectives – the rough-order ROI from successfully addressing the hypothesis.
  • KPIs – measurement of success and progress, and the exploration of the risks associated with potential second- and third-order ramifications of KPIs. See the blog “Unintended Consequences of the Wrong Measures” for more details on second- and third-order ramifications of KPIs.
  • Decisions – the what, when, where, who, etc. – that need to be made to support and drive actions and automation in support of the hypothesis’s business, customer, and operational objectives.
  • Potential data sources – a list to explore, including a brief description and why the business stakeholders consider each source might be useful.
  • Risks – potential issues associated with false positives and false negatives (Type I and Type II errors); risks associated with those scenarios where the analytic model is wrong.

A vision workshop accelerates the collaboration between the business stakeholder and the data science team to identify the hypothesis requirements that underpin the success of data science engagement.

The Machine Learning Canvas (Big Data MBA version)

To complete the loop, I introduce the machine learning canvas. I stumbled upon the Machine Learning Canvas v0.4 from Louis Dorard at the website “Machine Learning Canvas.” Louis has made his canvas freely available, and I will do likewise with the additions that we made to his canvas based upon our unique data science requirements (see Figure 3).

Figure 3:  Machine Learning Canvas (Big Data MBA version)

For purposes of our data science work, we needed to add to the panels:

  • Prescription: Once we have a prediction, what do we do with that prediction?
  • Automation:  How do we automate standard procedures with the prescriptive insights?


A successful data science engagement requires close collaboration with the business stakeholders throughout the development process to:

  • Understand and quantify the sources of financial, operational, and customer value creation (it’s an economics thing).
  • Gain a thorough understanding of the KPIs and metrics against which to measure progress and success, and in particular, the potential second- and third-order ramifications of those KPIs and metrics.
  • Brainstorm the variables and metrics (data sources) that might yield better predictors of business and operational performance.
  • Codify the rewards/benefits and the costs/risks associated with the hypothesis (including the risks and costs associated with false positives and false negatives).
  • Closely collaborate with the business stakeholders to understand when “good enough” is actually “good enough” from a predictive analytics perspective.

You now have the three design canvases that allow you to link your business model to the data science and machine learning efforts, and as well, to give the data science team a direct “line of sight” from how their work impacts the business models.

Figure 4: Linking business model to data science to machine learning

So to quote the famous American philosopher and part-time groundskeeper Carl Spackler, “Now I’ve got that goin’ for me, which is nice.”

Bill Schmarzo

About Bill Schmarzo

Bill Schmarzo is CTO, IoT and Analytics at Hitachi Vantara. Bill drives Hitachi Vantara’s “co-creation” efforts with select customers to leverage IoT and analytics to power digital business transformations. Bill is an avid blogger and frequent speaker on the application of big data and advanced analytics to drive an organization’s key business initiatives. Bill authored a series of articles on analytic applications, and is on the faculty of TDWI teaching a course on "Thinking Like A Data Scientist." Bill is the author of “Big Data: Understanding How Data Powers Big Business” and "Big Data MBA: Driving Business Strategies with Data Science." Bill is also an Executive Fellow at the University of San Francisco School of Management, and Honorary Professor at NUI Galway at NUI Galway J.E. Cairnes School of Business & Economics.