In the first installment of this series, we examined how mid-sized enterprises can quickly get started on their journey to data maturity by implementing an operational reporting platform in as little as 4 to 6 weeks. The target users for this type of data service are the mid-line operational managers looking for actionable, tactical insight into system operations. The next user group to reach on the data maturity journey are the decision-makers at the departmental level (finance, sales, marketing, supply chain, etc) who require strategic insight for planning and resource management. The architecture required for this next level of data analytics is the 2-tiered subject-oriented data warehouse.
The data warehouse has existed for a long time in many forms. Quite simply, it represents a designed repository of integrated data from multiple sources. The disciplines around the design and development of data warehouses have evolved over many years, but at their core are the need for historical retention and “non-volatility” (i.e. the data is “set in stone” and cannot be changed or updated, thus creating a point-in-time “snapshot”). The most important piece of creating a usable and meaningful data warehouse is in the design . . . the planning and mapping of the data’s architecture and how it will be meaningfully presented to the users. This design is also reflected in the presentation or semantic layer of the warehouse where it becomes visible to the business. It is important to remember during this discussion that the data warehouse serves as a tool for business value and that this value can only come from its utility to the business users.
Much has been written about the apparent demise of the traditional data warehouse, particularly considering recent developments in the “modern data stack” (of which cloud solutions are an integral part). First there was “Big Data” (Hadoop and its ilk) with its kitchen-sink approach that moved the burden of deriving business value to highly skilled regression coders. Then came the data lake (both on-prem and cloud) where data is retained in its native form for insight by data scientists (or just as often a staging area for an enterprise data warehouse . . . more about the data lake in Part 3 of this series). Now we have the data mesh approach to organizational data management, with decentralized domain-focused data ownership and data-as-a-product “subscription” models. While the governance disciplines around the data mesh can be applied to any data strategy, many mid-sized businesses lack the resources for a full-blown data mesh implementation.
In the end, for a mid-sized business, none of these approaches on their own satisfy the simple need of an easily-consumable, historically trended data set that the business can readily ingest. The “good old” data warehouse does just that. Structurally, a subject-oriented data warehouse can be approached in one of two ways: the Bill Inmon methodology of normalization, where tables are defined by related entities within a common category (customers, sales, finance, etc), or the Ralph Kimball dimensional model of facts (historical measures) and dimensions (business attributes), allowing for the relation of various transactional values across common attributes. There is also the more hybrid approach of the data vault, where numerous normalized “hubs” of unique business keys share many-to-many relationships (“links”) with each other and business metadata (“satellites”).
No matter which approach you use, there are two elements that are necessary in the planning and development of the enterprise data warehouse: data modeling and ETL. This is due to the transformative and integrated nature of the DW. The data model maps the source data into logical business relationships, and ETL is the technology that makes those relationships happen through transformative operations. Running through this process is data governance oversight that ensures the quality and consistency of the data being loaded into the DW (and this oversight should include business stakeholders, not just the dedicated data team).
Finally, there is the question of how the data warehouse will be presented to non-technical business users. The concept of a “semantic layer” has been in use for decades by various business intelligence tools to map the technical data into business metadata nomenclature. This can be achieved directly in the DW by use of “virtualization”, or views (which themselves can be a means of data integration). This has the double effect of not only making the data easy to understand and consume, but also as a data governance tool by standardizing data definitions and enforcing common usage among multiple departments.
Utilizing the Snowflake/Matillion/ThoughtSpot cloud platform that we discussed in the last installment, the enterprise data warehouse can be built out by creating distinct subject-oriented databases within Snowflake (or separate fact and dimension stores that can be combined at the query level) and using SqlDBM to create models to serve as mapping guides for the ETL as well as reference architecture for the users. Jobs are then created in Matillion to extract, load and transform the data from the source systems based on the models. With self-service analytics available to users via ThoughtSpot, the business now has access to curated data for meaningful insight and decision making.
With the right guidance, an organization can reach this second level of data maturity in relatively little time (3 to 6 months, depending on the number of subject areas involved). New subject areas can be developed iteratively though ongoing managed services. Before long, the organization is ready for the next level of the modern data journey: the three tiers of raw, curated and analytics-ready data.