Machine Learning: Is Citizen Data Science Real?

Richard Mooney

We hear a lot these days about the “citizen data scientist.” Everyone wants to use data science and machine learning to understand their business and automate tasks to improve efficiency. But we have a shortage of people with data science skills, so much so that salaries are high for properly qualified people. To chief data officers, it’s an attractive proposition to take people from within their business who understand data and have a strong mathematical background and convert them to data scientists through self-study and online courses.

We have a new generation of visual composition framework tools that enable a business user to visually compose pipelines of algorithms, using techniques such as R and Python selectively to solve more complex problems. These techniques can get impressive visualizations back to the user and help them understand the business using statistics.

Challenges for citizen data scientists

But there are challenges with this approach. It’s not simply a matter of choosing the best algorithm:

  1. It’s very easy for a nonprofessional to misinterpret the results of a predictive model, making decisions based on poor results. It’s very difficult for a manager to recognize it until it’s too late.
  1. They need to master numerous skillsets to maximize model accuracy:
    • They need to understand feature engineering to extract useful insights from the data by deriving variables.
    • The mechanisms needed vary across data types. Date/time is very different from ordinal and continuous variables.
    • They need to extract how these variables change over time.
    • They also need to master complex techniques to make sure the data can be handled by the chosen algorithm and that missing values are correctly dealt with.
    • They need to understand how to deploy the model into production.
  1. They also need to deploy these models to production to generate the needed ROI. They need to understand how to keep the models current on an ongoing basis and how to make sure they are accurate, not just on training data but also validation, test, and new data.

Automation makes it easier

With automation throughout the predictive lifecycle, it’s possible to avoid or simplify these challenges.

  • You can train people to use automated predictive tooling to get a good model quickly and enforce best practices for model accuracy and robustness.
  • You can give them clear guidance on how models perform and enable them to deploy successfully into a wide variety of environments.
  • In parallel, they can hone their skills using a pipeline editor to experiment with other approaches while enforcing the same standards of model debriefing.

Most importantly, this reduces the risk of making a bad decision through an inadvertent but costly error. And the cost of entry to successfully utilizing and deploying predictive analytics is lowered, making it much easier to scale.

Don’t get me wrong, you still need training to take advantage of this. You need to know how to ask the question and how to maximize the results.

Even easier insights with an analytics cloud solution

Finally, business users can take advantage of advanced analytics for business exploration without needing to use any algorithms directly. This can be deployed to normal business users. The interface is set up to give them a simple way to frame the question. The insights are displayed in ways that help the user understand what they can and can’t infer from the data.

Is citizen data science real? 

So, to answer the original question: Yes, citizen data science is real, but we should think about what is the best way to enable people of different skillsets to successfully use data science in their business. This trend will only multiply as the automation techniques and helper tools advance and continue to lower the entry bar for data science and predictive analytics.

Learn more

To learn more about this subject, see:

Learn how organizations are gaining instant financial insights and using them to make better decisions – both now and in the future. Register now for 2017 Financial Excellence Forum, Oct. 10-11 in New York City.

Follow SAP Finance online: @SAPFinance (Twitter) | LinkedIn | FacebookYouTube

Richard Mooney

About Richard Mooney

Richard is the lead product manager for the Predictive Analytics Product Portfolio including Predictive Analytics, Predictive Analytics Integrator & SAP Cloud Platform predictive services. He has 18 years experience in the software industry starting off in development and transitioning to customer facing roles including Product Management, Sales & Marketing. Richard also spent 2 years working as an innovation expert using techniques like Design Thinking, ROI Analysis and Ideation to drive customer innovation and value.