Data is flowing and the volume is growing. With the massive generation of information from the advent of the Internet and the increasing digitalization of business, there is tremendous opportunity in the new amounts and types of data collected. But this data explosion has also dramatically increased the complexity of the enterprise data landscape, with multiple data lakes, data warehouses, operational applications, e-commerce, online interactions, and so on.
As stated by Gartner’s Ted Friedman in a recent paper*: “Organizations struggle to design a business-relevant infrastructure that is both effective and efficient at mediating differing semantics (e.g., governance) to support data sharing and integration.” You know you have a data landscape management problem when:
- Your data is kept in silos (files, Hadoop, data warehouses) across the enterprise.
- Your user groups can’t access and work with data according to their needs.
- You face organizational boundaries between IT (Big Data vs. enterprise data), as well as lines of business.
Complex problems require complex solutions
Why not just put all the data into one massive, super data lake? That’s what the Hadoop distribution providers want you to do. But as the satirist H. L. Mencken said, “For every complex problem there is an answer that is clear, simple, and wrong.”
Big Data technologies lack enterprise governance, holistic lifecycle management, and security concepts. These providers are just coming up the curve and trying to provide the level of enterprise governance and security that enterprise data warehouse and database providers have been delivering for their offerings for years.
That means that organizations are stuck with limited tools for integrating systems and creating data pipelines. As a result, it takes a lot of effort to create a data pipeline across the enterprise, including:
- High investment in resources and many non-integrated technologies, such as Hadoop, Spark, Kafka, MongoDB, Cassandra, and so on, is needed to address the business needs.
- Integration effort usually prevents business value creation.
Some companies have tried to solve these issues by maintaining two sets of data: one for transactions and one for analysis. But this is not only costly and inefficient; it also leads to discrepancies because it’s hard to keep them in sync. And discrepancies lead to inaccurate analytical outcomes, with the obvious negative impact on decision-making.
Challenges of data landscape and DataOps management
Better meeting the needs of business and the fast pace of today’s demands means that the landscape needs to overcome the following three challenges:
We face the lack of visibility, and ask: Who changed the data? What was changed? Who is accessing it?
- Data pipeline
It is difficult to refine and enrich data across multiple systems. For example, this might involve improving the value of existing data by appending information, such as connecting sensor data with the asset ID and asset profile information, held in a different system.
- Data sharing
Unfortunately, integration is manual, point-to-point, painful, and slow. Changing an integration point usually depends on the agility and flexibility of the IT line.
A modern data landscape management strategy
The solution to better address these challenges would be a data landscape and DataOps management solution that enables agile data operations across the enterprise, and also enables data governance, pipelining, and sharing of all data in the connected landscape.
The vision should be to provide the ability to understand, connect, and drive processes across the multiple data sources and endpoints with which the enterprise struggles today. By providing visibility into the landscape of data opportunities, as well as providing an easy way to connect data sources and create powerful data pipelines that hop across the landscape, businesses would be able to better achieve the data agility and business value that they seek.
Want to learn about how SAP faces these data landscape and DataOps management challenges, and what the future Big Data warehousing is about? Then register for the upcoming Webinar on November 28 at 11:00 a.m. ET/17:00 p.m. CET. You’ll hear from Marc Hartz, product manager of SAP Data Hub. See you there!
*Gartner, Use a Data Hub Strategy to Meet Your Data and Analytics Governance and Sharing Requirements, Andrew White, Ted Friedman, 2 February 2017