The Data Integration Portfolio - Part Four: Putting It All Together (In The Cloud)

This blog series has examined the hybrid data portfolio as a mix of technologies and approaches to a data foundation for the modern enterprise. We’ve examined a variety of strategies and technologies in data integration, including virtualization, replication and streaming data. We’ve shown that there is no “one size fits all” approach to an integrated data foundation, but instead have seen how a variety of disciplines that suit specific business and technical challenges can make up a cohesive data policy.

This final chapter puts it all together under the umbrella of “time-to-value" and its importance to the agile enterprise data platform. No matter what the technology, data strategies invariably involve moving or copying “raw” (operational or transactional) data from Point A to Point B, with transformations either between the points (ETL) or upon consumption (ELT). Historically, Point B has primarily been the enterprise data warehouse, but can now be anything from a data vault to a data lake. Various “Point C’s” can occur as well, in the form of data marts, analytic data cubes, cloud compute clusters, virtualization layers, etc. In any case, it is paramount that this lifecycle take place in as short a time as possible in order to quickly bring data value to the analysts and business users who consume it.

Enter the cloud data warehouse. By offloading data management to the cloud, businesses can take advantage of many of the data strategies discussed here, without heavy up-front cost, deployment time or complex configuration. ETL takes the form of bulk file loading for batch data loads, and even streaming data can be uploaded using continuous data ingestion services (e.g. Snowflake’s Snowpipe). On-prem-to-cloud ETL technologies such as those of Talend and Matillion offer a full suite of transformations not unlike traditional ETL platforms. Virtualization layers, for business semantic presentation, analytic manipulation or departmental segmentation are often provided as native compute clusters or can be enabled via third-party programs such as Denodo. Cloud DW vendors such as Snowflake also combine authentication, security, metadata, and infrastructure management to provide true “Enterprise Data Warehouse-as-a-Service" (EDWaaS).

Whether your data is managed on premise, in the cloud, or via a hybrid architecture, whether you use traditional batch ETL, virtualization or replication in moving or presenting data, disciplines around modeling and logically designing your data for optimal enterprise value remain the same. In a future blog series we will focus on these disciplines as your “roadmap” when building and navigating your data integration portfolio.