A Data Platform For Data Sprawl

Matthew Zenus

In a digital economy, business success depends on the ability to extract insight and value from data. But one thing that’s usually standing in the way is data sprawl.

The fact is, data sources are proliferating beyond the enterprise. In the past, organizations could get by with data warehouses pulling data from internal enterprise resource planning (ERP) and customer relationship management (CRM) systems. Today, we need to bring in data from machines (IoT), mobile apps, social media (sentiment analysis), distributed supply chain partners, and much more.

Data, in other words, is decentralized – stored in a variety of clouds as well as on-premises, which makes it difficult to locate the right data at the right time. If you can’t locate your data, you can’t bring into focus a complete picture of your business. In the end, this leads to a counter-intuitive outcome: Despite the existence of more and more data, many people are frustrated by the inability to use it productively for better business awareness.

What’s needed is a single source of truth that supports a consistent view of operations, customers, suppliers, and partners. Achieving this in a world of data sprawl requires a new kind of data platform that supports the following capabilities.

Metadata management

To incorporate all sources of data into the fold – cloud-based or on-premises – the data management platform should include a comprehensive range of adapters to all data sources and data types. The key objective is to access, query, and process data from anywhere in an efficient manner that is also highly secure and reliable.

Equally important, metadata needs to be managed centrally to form the foundation of a unified data catalog for the entire data landscape. This approach also supports your ability to understand the multi-directional flow of data through business processes, allowing your organization to leverage data for maximum value.

Modeling and tooling

Pulling in data from everywhere is not enough. You need to do it in a rationalized manner with control and visibility. For this, your data platform should include powerful modeling tools capable of yielding a holistic view of the end-to-end heterogeneous enterprise-data landscape.

This holistic view, in turn, facilitates statistical modeling and drives intelligent technologies such as predictive analytics and machine learning. One objective in this regard is to automate decision-making. To do this, all relevant data can be combined, cleaned, fed into the model, and deployed for automating business processes. Organizations need to plan and orchestrate data extraction from whatever the source, process the data, use it to train targeted models, and return the updated model to the relevant system. A data platform capable of all this can help your organization seize an advantage by gaining key insights and responding faster than anyone else.

Multi-cloud support

At a time when flexibility is critical to business success, no organization can afford to be locked into one vendor’s way of managing data. A sophisticated data management platform allows you to move applications and data freely across the cloud and on-premises as needed. Support for the major players, such as Amazon Web Services, Google Cloud Platform, and Azure, is critical. No matter where your data lives – or where it moves – you need to be able to access it in a way that allows you to continue to build applications, perform analysis, and generate insight.

Security and openness

Data protection and privacy are non-negotiable lines in the sand when it comes to maintaining customer trust. Look for a data platform that supports data anonymization features that facilitate the work of data scientists building analytical scenarios – while also enabling you to adhere to data protection rules such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA).

At the same time, you can’t afford to lock down your data to the extent that it limits your ability to innovate. This is why an open model that allows users to integrate with multiple open source compute engines is critical. Your platform must work with multiple technologies, libraries, and languages such as TensorFlow, Spark ML, Python, R, and Jupyter Notebooks. This will help drive easy and secure self-service data science.

Learn more

While data sprawl is a modern problem of the digital economy, modern data platforms that face the challenge head-on can help your organization seize the opportunities presented by today’s abundance of data.

For more information on these data platforms, check out the SAP HANA Data Management Suite Buyer’s Toolkit.


Matthew Zenus

About Matthew Zenus

Matthew Zenus is Global Vice President, Database & Analytics Strategy at SAP. Matt has over 20 years of experience in enterprise software strategy, development, sales, and go-to-market. He is currently responsible for the product strategy for the Database, including SAP HANA, and Data Warehousing. Prior to joining SAP, Matt served as a Product & Strategy Manager at the Teradata Corporation responsible for analytics, database security, data integration, tools & utilities, supply chain intelligence, and geospatial. He also has experience as a senior consultant at PricewaterhouseCoopers where he advised many of the global 100 companies on business & operational strategies. Matt has also worked in manufacturing operations as an engineer and is a graduate of the Cooper Industries Executive Training Program.