Building The Big Data Warehouse: Part 2

Barbara Lewis

Part 2 in the “Big Data Warehouse” series

In the first part of this four-part discussion on the Big Data warehouse, we covered why enterprises are looking to create a Big Data warehouse that unites information from Big Data stores and enterprise data stores. Here in part 2, we’ll cover the key elements of a Big Data warehouse and which issues enterprise technology leaders should keep in mind as they evaluate options.

What is a Big Data warehouse?

Big Data warehouse is an architecture for data management and organization that utilizes both traditional data warehouse architectures and modern Big Data technologies, with the goal of providing rapid analysis across a broad range of information types. While analytics can certainly be run exclusively on Big Data repositories or on enterprise data repositories, it is the combination of the two types of repositories into a unified data architecture that distinguishes a Big Data warehouse.

Forrester defines the Big Data warehouse as: “A specialized, cohesive set of data repositories and platforms used to support a broad variety of analytics running on-premises, in the cloud, or in a hybrid environment. BDW leverages both traditional and new technologies such as Hadoop, columnar and row-based data warehouses, ETL and streaming, and elastic in-memory and storage frameworks.” (Forrester, “The Next Generation EDW is the Big Data Warehouse” Yuhanna, Noel. August 29, 2016, page 6.)

Key elements of the Big Data warehouse

A Big Data warehouse architecture typically encompasses the following elements:

  • A breadth of data repositories. These include repositories for both Big Data and enterprise, structured data. A Big Data warehouse typically draws from multiple data repositories, including traditional relational databases that house structured, enterprise data; columnar data stores tailored for rapid enterprise data aggregation; and Big Data stores (such as Hadoop) that handle both unstructured and structured data in massive volumes.
  • Compute/processing. Fundamental processing can happen at multiple levels in a Big Data warehouse architecture. For example, Hadoop platforms contain processing capability that can deliver aggregated information to the enterprise relational database. Fast-turn analytical processing can also happen at a higher layer, such as using the Spark engine on Big Data. Machine learning analytics can also be applied at a higher level in the stack.
  • Data management capabilities. The data management capabilities necessary for an effective Big Data warehouse include: data integration (tying systems together), data quality (ensuring a level of cleanliness or correctness of information), data transformation (ensuring consistency of data format), data security, and data governance (ensuring compliance with appropriate policy and regulatory rules).
  • Interactive analytics. Interactive analytical capabilities include in-memory analytics, ad hoc interactions, or the ability for analysts to do self-service analytics on the underlying data.
  • Advanced analytics. In addition to traditional data analysis techniques, organizations can also add advanced analytical engines to data managed by the Big Data warehouse architecture. This includes predictive analytics, graph analytics, and spatial analytics, for example.
  • A variety of data environments. Big Data warehouses typically span a variety of data environments often combining on-premises databases, cloud data stores, and hybrid environments that have already been integrated. While it is possible for some organizations to have all on-premises environments or all cloud environments, this is increasingly unusual.

Big Data warehouse general architecture

Figure: Generic Big Data warehouse architecture. (Forrester, “The Next Generation EDW is the Big Data Warehouse” Yuhanna, Noel. August 29, 2016, page 8.)

Driving analytics and business intelligence across the organization

Generally, the goal of the Big Data warehouse is similar to the traditional goals of the enterprise data warehouse: delivering intelligence and analytics to decision-makers to drive business efficiency and effectiveness. While the goal may be the same, there is also typically a goal of making analytics and reporting more broadly available across the organization.

In order for an enterprise to remain agile and respond to emerging opportunities and threats, enterprises typically cannot afford the time delays required for decisions to be made only at the top of the organizations. As a result, to meet changing expectations regarding speed and responsiveness, companies are increasingly providing analytics and reporting tools to additional layers of management or to divisions that did not have this level of insight or autonomy before.

Key issues to keep in mind

Ease of integration. By definition, a Big Data warehouse requires the integration of a wide variety of data repositories, processing capabilities, and analytical capabilities. Thoroughly investigating the ease of integration of major components of the Big Data warehouse will be key not only to initial deployment success, but also the ongoing success of the architecture.

Extensibility. There has been rapid innovation in data management, data storage, and analytics, all happening simultaneously. Ensuring that the architecture can be easily extended to incorporate emerging technologies will be important to ensuring the ongoing relevance of the overall data architecture.

Orchestration. How easy is it to create data pipelines that cross the different elements of the data warehouse? And how easy is it to manage and update those pipelines?

With this overview of the key elements of the Big Data warehouse architecture, the next blog will cover the challenges of implementing a Big Data warehouse architecture and how they can be overcome.

Learn more


Barbara Lewis

About Barbara Lewis

Barbara Lewis is the VP of Marketing for SAP Cloud Platform Big Data Services and a thought leader in SAP’s Big Data practice, with expertise in cloud, Big Data solutions, data landscape management, Internet of Things (IoT), analytics, and business intelligence. Barbara led the launch of SAP Data Hub, the latest Big Data offering from SAP, and is active in SAP’s Big Data Warehousing initiative.