The challenges facing organizations to integrate and make sense of the plethora of internal and external data continue to grow. Not only is there diversity in data sources, but the requirements from business units involve a mix of batch, real-time, on-demand and virtualized capabilities, often from the same source for different use cases. It is rare to find a "one-stop-shop" solution to these varied data integration needs, so organizations end up with a hodgepodge of redundant, overlapping products, or worse, rely on custom internal coding with no traceability or modularity.

In this series of blog posts, we will examine the "hybrid data integration portfolio" as a planned approach to handling multiple data integration requirements with a managed portfolio of solutions. Through a combination of on-premise and cloud integration strategies around data replication, synchronization, virtualization, ETL and streaming delivery, organizations can leverage best-of-breed approaches in combination to meet their full spectrum of data use cases. The series will examine each of these integration solutions separately, with examples of how they can be cross-integrated. The final installment will focus on the strategic approach to implementing the full data integration portfolio to meet the needs of the enterprise.

Part one of this series focuses on data virtualization, a rapidly-growing technology that uses an abstraction layer to integrate and present data from a variety of sources. A recent Forrester survey shows that in 2017, 56% of technology decision-makers across the globe have already implemented or plan to implement a DV solution, up from 45% in 2016. Driving this adoption are a multitude of benefits that a virtualization platform provides, including:

  • Simplification: The abstraction layer can be used to standardize and codify complex data into common business nomenclature, as well as maintain an enterprise metadata dictionary and comply with regulatory requirements.

  • Real-time Integration: High-latency data integration methods like traditional batch ETL cannot serve the new demands for streaming analytics and real-time, actionable platforms such as those related to IoT. Since virtual data layers are built solely on a logical model and are divorced from physical deployments they can present data structures at any velocity.

  • Agility: DV can rapidly respond to changes and additions to business use cases since automated processes can be adapted to different data sources and outcomes. In most cases, these processes are optimized for whatever source data vendor/architecture is required.

  • Security: Sensitive data can be more easily masked and controlled in a common virtualization layer. Standard GDPR-compliant "pseudonymization" and other data masking techniques are more readily incorporated into a logical layer without physically changing the source data systems.

So how does data virtualization fit into a modern data integration platform? DV can co-exist with traditional batch ETL that may be loading a historical data warehouse or archiving data repositories such as a data vault. A good use case for this integration is virtualizing source edge devices (e.g. manufacturing sensors) for use in IoT analytics. Instead of batch loading directly from such devices (which will likely have a wide variety of architectures and proprietary structures), DV can provide a single logical model representing standard transactional data. Traditional ETL can then be used to integrate this data with other enterprise sources (e.g. SCM, ERP, etc.) to create a responsive/predictive enterprise-wide analytical platform.

In part two of this series we will examine data replication and how it can complement virtualization and message-oriented data movement for application integration.

About the Author:

Joe Caparula is a Managing Partner at Pandata Group, and continues to work along side our clients as they grow their data and analytics competency. He can be reached at