Understanding Data: Gold Nuggets And Puzzle Pieces

Paul Lewis

I regularly use the colloquial phrase “nuggets of gold in a huge pot” when describing the value obtained from understanding and analyzing data.

It seems like an easy win. The phrase is well-known and highly digestible. Most people in the audience generally appreciate that gold has immense value, and there are whole industries that exist to mine this precious metal from a variety of mountains and streams. It’s also predictable that as you collect these precious nuggets, you won’t be able to carry them around given their collective weight, and a pot is as good as anything to store them. Plus, the whole leprechaun-esque vision it likely creates might bury the phrase in long-term memory for easy recall the next day with colleagues. Like, “I went to a seminar yesterday and this dude talked about value derived from analytics as being like nuggets of gold in a huge pot.” That’s helpful.

Occasionally, like here, I even blog about it. I find repetition to be tremendously valuable in retaining content. Additionally, I also find repetition to be tremendously valuable in retaining content. (Note: embedding subliminal messages in repetitive statements is also tremendously valuable, but I will get to that content later. Trust me, you won’t object.)

Unfortunately, as metaphors go, it’s extremely weak (especially considering pots are much more likely to hold coins versus nuggets.) Let me break it down so you see what I mean:

  • Data has value the instant it’s created, for as long as you hold it, until its demise
  • The final form of data could be deletion or decade-old archiving; the effect is the same
  • The value of data changes over time
  • Adding new data to existing data, more opportunity is created to discover a potentially endless series of value (Potentially)
  • This potential value could be expressed as an undetermined number of “nuggets of gold” (I guess, if you must)
  • The more data you have, the more nuggets of gold you could discover, and the more necessary a pot to hold them (That’s a stretch)
  • The more data you have, the more precise your statistical and mathematical models and more opportunity you will have to find more nuggets (Don’t buy it, sounds complex)

Getting the picture?

The fundamental problem with the metaphor is that I’m treating value-obtained as a direct representation of data-collected; i.e., you are storing various elements of a client, therefore hidden in one or more of elements is a single purposeful and valuable answer, hidden in the fields, row and columns:

  • Data, in the sense of a database, being a single field, in a single row, in a single column, is irrelevant. It carries no weight or value beyond the knowledge of collection. It lacks context and awareness. Whether static or variable, it tells no story and solves no problem.
  • Data, in the sense of unstructured data, bytes of binary information, carrys even less value. In fact, knowing that a single bit is only a small part of a greater whole, predetermines its unlikeliness to impact the entire picture.
  • Data, as a single point in time from a stream of information, is outdated the very nanosecond it’s used, as more current data takes its place, creating a new current reality.

The concept of “nuggets of gold,” by extension, then presumes a specific and direct answer to a question; or a direct and obvious correlation to an action:

  • How many toothpicks are in the container? 173
  • What color shirt matches best with my red pants? None, don’t wear red pants
  • What’s the name of that dude with the crazy beard in that class last year? For the last time HENRY!
  • If you were to spend $5 less, you would have an extra $5 in the bank
  • If we mix these two primary colors, you would have this one secondary
  • If I build more of this product, I will sell more of this product

Lesson learned: Individual elements of data possess little to no value

There is a reason why every company (including yours) has an enterprise information management (EIM) program and a chief data officer (CDO) responsible for stewardship of your most precious technological asset, data. As a reminder, EIM is an integrative discipline for structuring, describing, and governing information assets across organizational and technological boundaries to improve efficiency, promote transparency, and enable business insight. The program includes capabilities to store, protect, architect, manage risk and compliance, manage quality, classify, and organize data. A great EIM program focuses on how organizations derive insight and value from information, either from internal effectiveness and/or growth-oriented goals and activities.

A CDO, or VP of business intelligence, or manager of management information systems (MIS) understands that data, in its elemental form, does NOT equal value. They understand that value is derived from discovering patterns and appreciating the impact of change and time, and that data requires enrichment, not just discovery. The activity required to derive value is implemented in four capabilities:

  • Descriptive: MIS or reporting, focusing on hindsight (what has happened)
  • Diagnostic: Business intelligence or incident management, focusing on current-state insight or understanding “why” it happened
  • Predictive: Analytics combining models of previous data and application to new data, focusing on foresight (what will happen)
  • Prescriptive: Analytics and action, foresight algorithms to implement a business function

The EIM program also appreciates that the effort to create value focuses far less on finding a long-lost and specific piece of data, and instead focuses on studying patterns in static, changing, and moving information and researching correlations, causations, and theoretical application of mathematics and logic to create complex business value from data-centric components. Yes, it’s a science. It’s far less searching for a nugget of gold, and far more about determining that you could make money from gold jewelry… all from the same mine.

So here is my NEW metaphor

And for the sake of inconsistency, I’m not even going to use precious metals. Imagine a pile of random puzzle pieces. Each piece represents a single data point, collected from a variety of sources.

Before value can be obtained, preparatory activity is needed to curate and enrich data:

  • Extraction: Identify all the puzzle pieces in the house: under beds, in vacuum cleaners, in the dog bowl, etc. For data, discover all the sources of information: internally and externally, structured and unstructured, and classify.
  • Integration: Send out all the kids and parents to grab the pieces and bring them back to the pile. For data, connect to hundreds of sources for batch or real-time integration/ETL.
  • Enhancement and cleansing: Dust off each piece, glue back down the picture side, sharpen the edges, number the backs. For data, match and qualify, and add appropriate metadata.

This effort to convert raw data to content, and indescribable fields into describable objects, requires the capabilities of more than just a pile, a box of sorts.

A content platform (the box) allows organizations to bring together object storage (a place to put all data), data mobility (a means to abstract data from its sources), cloud gateways (ability to use multiple deployment models), and metadata (tagging and sophisticated search to create a tightly integrated, simple, and smart data intelligence solution.) You may have heard this being referred to as a “data lake.” I highly recommend this solution set, if you happen to be in the market.

For this new enhanced data set (puzzle pieces), contained in a content platform (puzzle box), the EIM value-creation activities can be described (it’s still the goal to find the Picasso):

  • Descriptive: Create a list of puzzle pieces, organized by shape/color/origin; determine which pieces closely resemble the palette of a master work of art
  • Diagnostic: visualize the current state of completing the puzzle; how far along is the process and/or discover missing pieces
  • Predictive: Given where we are in the process, and the remaining pieces still in the box, determine what picture we might be making and/or predict what might be the picture, even if we have missing pieces
  • Prescriptive: After having made dozens of pictures from these same puzzle pieces, guide the creation of existing and new completed puzzles

Both predictive and prescriptive analytics would use linear and non-linear algorithms (ways of thinking out the problem), would focus equally on the puzzle pieces that exist and the ones that are missing, and combine or use pieces from hundreds of potential sources to create hundreds of different works of art.

In a nutshell: The value obtained from understanding and analyzing data is not that you will find “nuggets of gold” of data or an individual puzzle piece that solves the problem. The value obtained from understanding and analyzing data is the millions of dollars in your bank account from building several masterpieces from all your individual puzzle pieces.

Learn how to derive more value from Data – The Hidden Treasure Inside Your Business.


Paul Lewis

About Paul Lewis

Paul Lewis is the chief technology officer in Hitachi Vantara for the Americas, responsible for the leading technology trend mastery and evangelism, client executive advocacy, and external delivery of the Hitachi vision and strategy especially related to digital transformation and social innovation. Additionally, Paul contributes to field enablement of data intelligence and analytics; interprets and translates complex technology trends including cloud, mobility, governance, and information management; and represents the Americas region in the Global Technology Office, the Hitachi LTD R&D division. In his role of trusted advisor to the CIO community, Paul’s explicit goal is to ensure that clients’ problems are solved and opportunities realized. Paul can be found at his blog, on Twitter, and on LinkedIn.