There is no question that companies are moving their on-premises data warehouses to the cloud at an increasing pace. The benefits of a cloud data warehouse (instant scalability, minimal up-front costs, rapid deployment, ubiquitous access, etc.) are being fully realized and appreciated by a growing number of enterprises both large and small. The major players in cloud DW (Snowflake, AWS Redshift, Azure Synapse, Google BigQuery) are all vying for market share, and the customers are seeing benefits from the varying and competitive costs and features of each platform.
But how do you get your data to the cloud? Isn’t the time and cost comparable to any data load project, whether it is on-prem or not? Fortunately, there is a way to leverage the benefits of cloud technology to make direct loads fast and painless. Matillion is a company known for its 100% cloud-native ELT tool, but it has recently released a cloud “data pipeline” product called Matillion Data Loader. Data Loader is a no-code, cloud-based platform that retrieves data from numerous proprietary sources and lands it seamlessly in your cloud DW. It can source from a mix of on-prem platforms (e.g. Oracle, Microsoft SQL Server) and cloud-based environments (e.g. Google Analytics, Salesforce, etc.). Once a source is provisioned, you can then specify the account and credentials of your target cloud DW (whether that is Snowflake, AWS Redshift or Google BigQuery). Depending on the structure of the source data, you are free to specify data ranges and parameters for your load. You can then establish load frequency (even down to a single minute) as well as set up failure notification settings. Once loads begin to occur, a convenient dashboard provides a record of the load history.
That’s it! You now have data in your cloud DW. Now that it’s there, the use cases for Matillion Data Loader as part of an overall data management strategy are many and varied:
- Land source data in a staging area: you can periodically (daily, hourly, etc) stage data in the cloud DW for further use in ETL transformation. Some data sources have readily-identifiable CDC (change data capture) fields such as timestamps that can make incremental loads possible.
- Create a data lake: Data lakes capture and store data in their native, untransformed format for analysis by data scientists. Having multiple data sources consistently feeding it through Data Loader is a great way to quickly make your organization’s raw data readily available for insight.
- Social media analytics: Data Loader can source from Facebook (and soon Twitter) to provide content and advertising insight into your online social presence.
Coupling Matillion Data Loader with Matillion ETL provides the full spectrum of data transformation tools, from the sourcing pipeline to staging to transformative data integration. And it’s all cloud-based with minimal footprint and deployment time. Begin your journey to the modern “data-in-the-cloud” architecture with these tools today!
About the Author:
JOE CAPARULA is a Senior Consultant with Pandata Group who specializes in delivering data modeling and data integration services for clients across several industries.