If you have been reading my articles, you will probably notice that I’m fascinated with buzzwords and how they shape the perception of emerging technologies. I find it especially interesting that by the time the buzzword associated with a new technology reaches the most mentions, most people have heard of it but don’t quite understand what it does or how it works. (One example that’s hot at the moment: “blockchain.”)
I’ve also noticed that by the time a technology is better understood and its applications are seen in the real world, it has faded from the pundit blog entries and Twitter. I am guessing this is because it has ceased to be a way to appear intelligent and mysterious during cocktail parties. (Such is the fate of “Big Data.”)
First of all, a caveat; I do have some background in analytics, but only the basics of data science, so if you are interested in Big Data, what you will read here is the account of a fellow traveler who is as fascinated about this area as you are, not a leading pundit. With that said, let’s go for a deeper dive.
As recently as 2014, a very popular quote about Big Data was: Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…” (Dan Ariely, Duke University). This was very true then. Nobody knew what to do with their vast amounts of data, they knew that they needed to be doing something with it to get deep, magical insights, and only a secret cabal of wizards called “data scientists” could unlock the secrets
Flash forward just three years, let’s see where we are. What, teenage sex notwithstanding, is Big Data? I would suggest one aspect is: “Data with volume big enough that you will not be able to get insights using just the regular slice-and-dice method.”
Slice and dice has been with us for decades, and there are great tools out there for it, but when the volume of data grows to such an extent that, no matter how much you slice and dice, you can’t be sure you’re looking at the right thing or getting the right insight, you may need help.
For example, if you are looking at the retail sales of 10 items, you and your brain should find it easy to handle. How about for 16,000 items for your 6 million online customers? What if your supermarket chain has to analyze what product your customers are likely to buy in the winter season, given their purchasing history, income levels, residence postal codes, and whether they own more than one car?
Should you even be looking at car ownership in the first place? Or should you be looking at whether they subscribe to cable? This is where the predictive analytics software and data science tools come in.
What is especially fascinating for me is the concept of machine learning. How this works is that some smart people will come up with an algorithm, and then you train the algorithm by feeding it data. This seems to be an abstract concept, but if you think about it, this is exactly how humans learn. By practicing a certain skill or craft many times, our brain develops the ability to detect patterns to optimize our judgement the next time we see a similar situation.
If this learning process is done right, the algorithm will be able to make educated guesses on what the outcome would be, given a particular set of circumstances. This same method can be used to recommend your next e-book purchase, predict traffic accidents, or decide whether you get jail time or a warning if you run afoul of the law.
In the first iterations of predictive analytics, there were software tools to help you analyze the data and come up with some pretty stunning insights, but you needed the wizards of data science to prepare the data for analysis, apply some highly mathematical formulae and algorithms, and interpret the results.
The number of days this took must have correlated with the number of cocktail parties these data scientists got invited to, but times are changing.
As Big Data hits the mainstream, it cannot afford to be the domain of a select few. But not everybody has the time and inclination to get a master’s in statistics (although a PhD would also come in handy), so companies are now focusing on automating the data science part of the process so that business users can focus on the outcome instead of taking night classes in stats.
This new automated approach to predictive science takes a lot of the guesswork out of the process. You do not need to tell it which algorithm to use, it will apply an appropriate one. You do not need to set the parameters of the predictive model, it will derive them for you. No, we have not reached the point where data scientists are obsolete.
Rather, the automation tools will make the predictive process easier for the end user, but they will also allow the data scientists to perform a bigger number of analyses, build more models, and be much more productive.
This can even help people who still prefer to do a lot of slice and dice (me being one of them). Say you are working with a dataset with a hundred variables, and your boss told you to find out which factor is affecting your profit margin the most.
Sure, you can spend a very happy evening testing lots of different combinations to see which variable correlates the most with profit, or you can save time and load your data into a cloud-based analytics solution and have it tell you in seconds which variables affect profit the most, do a dashboard based on those variables, and still be home in time for dinner.
So there you have it. If you have not looked at predictive tools to harness the power of your data because it has always been in the “too hard” basket, now is the time to rethink that. The data is not getting smaller, and the tools are getting easier to use by the day.
Big Data is morphing into vast data that will lead to insights and correlations that reveal new strategies—even new business models. Learn more about Data Lakes: Deep Insights.