Collecting Too Much Big Data Has Its Risks

Dwight Davis

The emergence of new data sources – ranging from social media to the Internet of Things – and the growing sophistication and speed of data analytics tools have culminated in a Big Data love fest. Companies are racing to discover insights and opportunities submerged within the oceans of data at their fingertips. And many are operating under the same general principal: When it comes to raw data, you can never have too much.

It’s easy to understand how we got to this point. Data storage costs have dropped so quickly they now seem almost inconsequential. The average cost of a 6-TB, 7,200-rpm hard drive fell from over $700 in July 2014 to about $400 in November 2015, according to PCPartPicker. At the same time, major cloud infrastructure providers, including Amazon and Microsoft, offer gigabytes of free storage to customers and cut-rate pricing for larger storage volumes.

When you combine that trend with the ability to derive meaningful information in raw data that was once considered disposable, the incentive to err on the side of collecting too much data, rather than too little, escalates.

Companies that blindly follow this approach expose themselves to many risks, however. Not the least of which is one of simple economics. “Storing information actually costs a lot of money,” notes Timo Elliott, global innovation evangelist at SAP. Even though data storage prices continue to drop, he explains, the amount of data being stored is growing at an even faster rate.

This storage dynamic is an example of Jevons’ paradox, which describes how technological progress that increases the efficiency of a resource (in this case, storage) can result in an even more dramatic increase in the consumption of that resource. “Storing everything possible, and then going back at some point to figure out if it will be useful, simply doesn’t work in economic terms,” Elliott warns.

Of course, the costs associated with storage go well beyond the prices of the storage media itself. Updating, cleaning, reformatting, archiving, and otherwise managing massive amounts of data doesn’t come cheap. It makes little sense to incur additional data management costs unless there is a fairly strong likelihood that the benefits of doing so are based on more than wishful thinking.

Costs aside, there are other downsides to collecting data without first establishing a good business rationale for doing so. Among the risks:

  • The more data you’re storing, the more attractive your organization may be for hackers and cyber thieves, and the greater your liability exposure should a breach occur.
  • Indefinitely storing data may allow damaging information to surface during the “discovery” phase of legal proceedings.
  • More data creates the inescapable reality of more “noise” to sort through in the quest for valuable information. As SAP’s Elliott says, “The bigger the haystack, the harder to find the needle.”

Bottom line: Big Data collection and analysis can offer incredible benefits to organizations that approach it systematically with clear business strategies and objectives in hand. But amassing huge data repositories just because you can makes little economic or business sense. Instead, you need to apply intelligence at the front end of your Big Data initiatives, not just after you vacuum up every bit of data within your reach.

Find out why 70% of available data is not getting analyzed even though customer insight is the top priority for data processing and analysis.

This article originally appeared on CIO.com


Dwight Davis

About Dwight Davis

Dwight Davis has reported on and analyzed computer and communications industry trends, technologies, and strategies for more than 35 years. He worked as a senior editor at several leading computer and business publications from the late 1970s through the mid-1990s. Dwight then took the helm of Windows Watcher, an award-winning corporate newsletter focused on Microsoft and its ecosystem of partners and competitors. Next, Dwight spent 10 years working as a leading industry analyst, first at Summit Strategies and then at Ovum. At these market-research firms, he ran a variety of infrastructure software strategic services. Those services tracked and analyzed leading vendors (Microsoft, IBM, Oracle, SAP, HP, Sun, etc.) and innovative start-up firms, as well as cutting-edge technologies and business models. His areas of expertise include cloud computing, service-oriented architecture, cybersecurity, mobile computing, and Web services. Since 2009, Dwight has worked as an independent analyst and writer.