If the stats are to be believed, there is a veritable tsunami of data headed our way, and those who are unable to adopt the tools and technology to ride the wave will be left floundering. According to some, we are soon entering the brontobyte era of sensor data, examples of which are self-driving cars that alone will generate approximately 1 gigabyte of sensor data per second. Airplanes are already creating 2.5B terabytes of data collectively. CCTV in the London Underground generates around 2 petabytes of data every day, and the Square Kilometre Array (SKA) radio telescopes nestled away on the flat desert planes just a short 315km northeast of Geraldton, Western Australia, will pull down more data in a day than is currently produced by the Internet to date when they come online in 2020. This will require an “Exaflop-capable” supercomputer to crunch through the data created.
Source: HP ‘The Brontobyte Era in the world of interconnected digital things’
Aside from confirming a growing suspicion that data scientists and mathematicians host a wicked sense of humor when it comes to inventing new terminology for data sizing, what does this all actually mean? And why should we even care? It is a sad realization for a person like myself working in the IT industry to accept and come to terms with, but let’s be honest: data is boring for most people. Just by using the term “data” we’re creating a barrier between ourselves and the majority of people that we’d like to be communicating with, because most people will never realize or even care that they are producing or consuming data.
But this isn’t necessarily a bad thing, and it isn’t the reason why people won’t be able to capitalize on a Big Data future. In truth, this is exactly how it should be with technology. The most successful technological advancements are those that become invisible and ubiquitous to our daily lives to the point that you forget they are there.
It’s a lot like electricity. Most people don’t know how electricity works or how many kilowatts their house is consuming by the hour, but when you walk into your house and flick the switch, the lights come on, the computer hums to life, and TV wakes up like magic. Once the infrastructure has been built, all that is required by end users is to plug in and switch on to extract value from it. Data is the same. People don’t care about how much data is being produced, they only care about the valuable end user product, service, or experience that data provides them.
Without the proper context or supporting infrastructure, data on its own is meaningless and the value is trapped. It’s only when data is put into context that it becomes useful information that can be tapped into and extracted by others. Much like a stray lightning bolt of electricity cannot easily be captured, despite having the power to supply a whole household with all their energy needs for a month, the sheer quantity of a brontobyte of data being made available is meaningless if it is not inherently simple for the majority of potential users to access it, search it, put it in context, and use the data productively.
In many instances, we are still at the stage today of concerning ourselves with how much data is being produced, rather than how its maximum value can be extracted. This is typified by our current approach to data sharing, in which we are pursuing traditional forms and publishing on individual portals for specific and targeted purposes rather than looking at data as a vast, interconnected web that can and must be tapped into. There are portals for weather data, healthcare data, census data, and so on. Yet data portals only provide limited value to their users because they do not connect with one another as seamlessly as they could. Often each of these portals are isolated – within the organizations that house them, but also from the rest of the data universe including news, social media, blogs, and other relevant sources.
There are many ways in which we could strengthen our understanding and treatment of data in order to magnify the value contained within it. In the first instance we need to invent better ways to search the web of data. There is a network effect where data becomes more useful and its value is magnified as more people and machines use it, add to it, and maintain it. Searching the vast web of data requires being able to find it, access it, and understand its context. In the future, searching vast portals of data will need to become as simple as it is today to search for news or documents on the Internet.
Imagine if you could join the dots between all data sources and discover hidden connections? This would put data into context in ways today that was never possible before. An organization could move from questioning, for example, “Why did health outcomes in a particular rural area decrease last month?” to identifying, “Did the reduction in health outcomes in a particular rural area last year have anything to do with the storms and power outages?” In this future, the value of data would be maximized as information would flow more easily through an economy for users to derive new insights, find the data they need to design new products and services, and incrementally add more information to the growing knowledge base.
This is the cusp of where we find ourselves today and the challenge that those who wish to capitalize on the vast oceans of data face. As the volume of data grows exponentially, data custodians will need to start thinking about their data as an infrastructure and as an asset, contributing to public value creation in a similar manner as transportation, water, and electrical infrastructure. Doing so can provide a network of contextualised information that people, government, and business can consume to address the problems that matter and create public value.