Three Levels of Data Maturity – Part One

Today’s businesses continue to strive for data maturity and to create a culture that is data-driven and data-literate. But the perception remains that starting such a journey is expensive and time-consuming. There is a belief that a combination of high upstart costs and lengthy implementation time prevents the business from seeing any near-future value in its data and analytics investment. The reality, however, is that today’s cloud-based data platform technologies, specifically the triumvirate of Snowflake (data storage and compute), Matillion (data loading and transformation), and ThoughtSpot (data analytics and insight), enable rapid analytic ability with minimum initial investment of time and capital. In fact, once accounts for these three applications have been established, an organization can run interactive analytics from shared or direct-loaded data in 90 minutes or less!

In this series of blog posts, we will present the three levels of data maturity that an organization can quickly and easily achieve by pursuing a cloud-based data architecture:

  • Level One: Single-tier architecture with an operational data store and near-real-time operational reporting.

  • Level Two: Two-tier architecture with a modelled data warehouse and semantic layer.

  • Level Three: Three-tier architecture to include a “raw” data lake and multiple “curated” data zones for deep insight and predictive modeling.

A company can reach Level One in as little as 4 to 6 weeks, depending on the number of operational systems it wishes to incorporate. Overall, the full three-level journey can be blueprinted as a three-to-five-year plan that iteratively adds new analytic capabilities as the data needs of the organization grow.

This installment looks at creating a Level One “operational” data architecture. This is often an easy win for organizations that are just beginning their journey to data maturity. The objective here is to capture transactional data in near-real-time (whether that’s up-to-the-minute, hour, or day, depending on the nature of the business) in its native form as an operational data store. There is no transformation or curation between the source and the data repository, nor is the data integrated across source systems. The data would also be non-persistent in that it would be overwritten with each load, since the objective is for tactical, actionable reporting and not historical trending.

An operational repository can be spun up fairly quickly within Snowflake by essentially creating a simple landing area for the system data. Matillion Data Loader can then be employed as an “intake pipeline” to schedule automatic replication loads at a meaningful frequency. Finally, ThoughtSpot would be overlaid on the Snowflake database to present updated visual snapshots for front-line workers or anyone else needing immediate, applicative data.

This then is the first part of the journey to being data-driven, one that is relatively simple and quick in its execution. Operational workers and mid-line management will see immediate benefits from the new visibility into system operations. The next step is to engage decision-makers in reflecting on the performance indicators that are vital to monitoring and understanding their business domain. This will lead to the requirements for Level Two: the subject-oriented data warehouse.