The Data Integration Portfolio - Part Three: Streaming Data

In previous installments of this series we examined recent trends in data integration, specifically data replication and synchronization, as well as data abstraction through virtualization. Taken individually, all of these approaches are suited for high data latency requirements around historical reporting and trending analysis. In this chapter, we look at real-time streaming data, and how it can complement high-latency data integration approaches to create a complete enterprise data foundation.

Streaming data delivery is often perceived to be the "holy grail" of data integration in that it provides users with immediate and actionable insight into current business operations. In reality, streaming has primarily been utilized in conjunction with sensor data from IoT sources, particularly in manufacturing and logistics. Its use cases revolve largely around operational, event-driven decision-making and "shop floor" transaction management. However, when combined with batch-oriented, higher-latency data integration approaches, the potential for streaming data can be fully realized in the context of modern enterprise analytic strategies.

Real-time data flows are most effective in a data integration portfolio when combined with bulk/batch data at variable granularity, then subsequently virtualized for user consumption. The advantage of this is that real-time data becomes more than just “in the moment” . . . it becomes part of a historical analytic platform. A viable approach to this architecture is to derive data from an event stream processor, then integrate with historical data warehouses and enterprise applications using an enterprise service bus (a messaging service-oriented architecture used to integrate diverse systems). The combined streaming/historical/application data can then be virtualized and presented for user analysis. This would allow for use cases such as real-time fraud monitoring or responsive purchase-pattern marketing.

In the final installment of this series, we will look at combining modern data integration technologies with traditional batch ETL to put together the complete enterprise portfolio.

About the Author

Joe Caparula is a Senior Consultant - Data Engineering with Pandata Group. He works alongside client teams to evaluate and recommend the most appropriate data integration option that best aligns to a cost-effective and efficient delivery of data to a modern data platform.