The Warehouse

Short reads. Big insights.
Industry trends. Thought Leadership. Opinions. Hot Tips. And so much more.
 

Jun
20
Change Data Capture and Matillion Data Loader

As enterprise data volumes continue to grow and the velocity of data change increases, companies are more challenged than ever to provide timely and complete data sets for business analysis. Change Data Capture is a methodology that has been employed for some time to effectively extract changes into target data environments in near-real time by avoiding regularly scheduled batch cycles and loading source data as it updates. The objective is to have a provisioned data store be as reflective of the current source system data as possible. There are several approaches to CDC employed by data engineers, but in the end, which is the most effective and least costly strategy that insures complete ...


Mar
22
Three Levels of Data Maturity – Part One

Today’s businesses continue to strive for data maturity and to create a culture that is data-driven and data-literate. But the perception remains that starting such a journey is expensive and time-consuming. There is a belief that a combination of high upstart costs and lengthy implementation time prevents the business from seeing any near-future value in its data and analytics investment. The reality, however, is that today’s cloud-based data platform technologies, specifically the triumvirate of Snowflake (data storage and compute), Matillion (data loading and transformation), and ThoughtSpot (data analytics and insight), enable rapid analytic ability with minimum initial inv...


Sep
30
Easily Connect to Any API Source From Matillion ELT

If you are like many ETL developers you’ve struggled with an easy way to source cloud services data via REST API. Although standards are in place for REST API web services protocols, it seems that every vendor has their own variation of them, creating new challenges for each new source. Matillion’s cloud ELT product has long featured an API profile creator that sources from JSON files and creates RSD (Real Simple Discovery, an XML format) scripts for use with API query components. The effectiveness of this approach, however, is only as good as the quality of JSON files provided by the vendor.


Now, with version 1.47, Matillion introduces much more simplified functionality for extra...


Jun
22
Matillion Data Loader: The Fast, Easy (and Free!) Way to Populate Your Cloud Data Warehouse

There is no question that companies are moving their on-premises data warehouses to the cloud at an increasing pace. The benefits of a cloud data warehouse (instant scalability, minimal up-front costs, rapid deployment, ubiquitous access, etc.) are being fully realized and appreciated by a growing number of enterprises both large and small. The major players in cloud DW (Snowflake, AWS Redshift, Azure Synapse, Google BigQuery) are all vying for market share, and the customers are seeing benefits from the varying and competitive costs and features of each platform.


But how do you get your data to the cloud? Isn’t the time and cost comparable to any data load project, whether it is on-pre...


Mar
30
Lakes, Swamps, and Puddles: The "Data Wetlands" Ecosystem

If you feel like you’re “drowning” in jargon and buzzwords surrounding the recent developments in data lakes and their ilk, you are not alone. A recent TDWI survey showed rapidly increasing adoption of data lakes as a source of big data analytics, though it also revealed barriers to success and confusion around implementation value. Much of this confusion stems from myths and misperceptions around the technical and business uses of a data lake. This article will examine the proper use of a data lake, and how proper governance can prevent it from becoming the dreaded data swamp.


To be clear, a data lake is not a data management platform, in that it is not an integrated, ce...


Mar
23
Where's the "T?" A look at ETL vs. ELT

In a previous blog post, we examined the differences between traditional ETL (extract, transform and load) and ELT, where the “heavy-lifting” of data transformation is handled by the robust and scalable (usually cloud-hosted) target platform. In today’s modern cloud data warehouse environment, ELT maximizes the speed at which data is staged and ingested, while leveraging massive computing power in the cloud to cleanse, aggregate and otherwise prepare that data for general consumption. But where is the best place to manage that transformation piece? Is it using cloud-friendly ETL tools, or is it within the management consoles of the cloud DWs themselves?


A common perception a...


Pandata GroupLess

Chicago

WeWork/ Fulton Market

220 N. Green Street

Second Floor

Chicago, IL 60607

Madison

316 W Washington Ave

Suite 525

Madison, Wisconsin 53703

Send Message