The Warehouse

Short reads. Big insights.
Industry trends. Thought Leadership. Opinions. Hot Tips. And so much more.
 

Sep
30
Easily Connect to Any API Source From Matillion ELT

If you are like many ETL developers you’ve struggled with an easy way to source cloud services data via REST API. Although standards are in place for REST API web services protocols, it seems that every vendor has their own variation of them, creating new challenges for each new source. Matillion’s cloud ELT product has long featured an API profile creator that sources from JSON files and creates RSD (Real Simple Discovery, an XML format) scripts for use with API query components. The effectiveness of this approach, however, is only as good as the quality of JSON files provided by the vendor.


Now, with version 1.47, Matillion introduces much more simplified functionality for extra...


Mar
23
Where's the "T?" A look at ETL vs. ELT

In a previous blog post, we examined the differences between traditional ETL (extract, transform and load) and ELT, where the “heavy-lifting” of data transformation is handled by the robust and scalable (usually cloud-hosted) target platform. In today’s modern cloud data warehouse environment, ELT maximizes the speed at which data is staged and ingested, while leveraging massive computing power in the cloud to cleanse, aggregate and otherwise prepare that data for general consumption. But where is the best place to manage that transformation piece? Is it using cloud-friendly ETL tools, or is it within the management consoles of the cloud DWs themselves?


A common perception a...


May
29
Talend v. Matillion for Cloud Migration

The two leading ETL/ELT tools for cloud data migration are Talend and Matillion, and both are well-positioned for moving and transforming data into the modern data warehouse. So if you’re moving to any type of cloud-hosted DW, whether it is a cloud-dedicated warehouse such as Snowflake, or part of a larger cloud platform such as AWS Redshift, Azure SQL Data Warehouse or Google BigQuery, which tool should you use to move your existing on-prem data?


Both Talend and Matillion can source any kind of on-prem data and land it in a cloud-hosted data environment. They can also move data to and from AWS’s cloud data-storage S3 as well as Azure’s Blob storage (which can be used to s...


Mar
08
Data Integration Roadmap - Part One: The Logical Data Model

Our recent blog series on the data integration portfolio introduced a variety of new architectures that help the enterprise manage their data resources, including replication, virtualization and cloud data warehousing. Organizations are now able to integrate multiple data management solutions to address a variety of business sources and requirements. But it is important to understand that the foundation of any enterprise data management portfolio remains the same . . . a roadmap to data management must be created that is independent of the underlying technology. This series of blogs will examine the three main elements of the data integration roadmap: the logical data model, master data ma...


Oct
18
ETL vs. ELT - What's The Difference and Does It Matter?

For most of data warehousing’s history, ETL (extract, transform and load) has been the primary means of moving data between source systems and target data stores. Its dominance has coincided with the growth and maturity of on-premise physical data warehouses and the need to physically move and transform data in batch cycles to populate target tables efficiently and with minimal resource consumption. The “heavy lifting” of data transformation has been left to ETL tools that use caching and DDL processing to manage target loads.


However, the data warehouse landscape is changing, and it may be time to reconsider the ETL approach in the era of MPP appliances and cloud-hosted D...


Aug
15
The Data Integration Portfolio - Part Three: Streaming Data

In previous installments of this series we examined recent trends in data integration, specifically data replication and synchronization, as well as data abstraction through virtualization. Taken individually, all of these approaches are suited for high data latency requirements around historical reporting and trending analysis. In this chapter, we look at real-time streaming data, and how it can complement high-latency data integration approaches to create a complete enterprise data foundation.


Streaming data delivery is often perceived to be the "holy grail" of data integration in that it provides users with immediate and actionable insight into current business operations. In reality, st...


Sep
28
AGILE AND ETL? LIKE PEAS AND CARROTS.

Data Architecture, SAP Data Services, Agile data mart, ETL Development



Can ETL Be Agile?


Business intelligence projects benefit greatly from an agile development approach. Since BI closely aligns IT with business, an iterative delivery model ensures that business stakeholders are always involved in the design process and that a constant dialogue is maintained. The objectives and benefits of agile project management include:




  • Response to rapidly changing requirements




  • High degree of customer involvement




  • Quick results




  • Progress measurement




  • Team motivation




This approach has traditionally applied to the development of the presentation, or “customer-facing” layer of BI. But how does an agile pr...