As enterprise data volumes continue to grow and the velocity of data change increases, companies are more challenged than ever to provide timely and complete data sets for business analysis. Change Data Capture is a methodology that has been employed for some time to effectively extract changes into target data environments in near-real time by avoiding regularly scheduled batch cycles and loading source data as it updates. The objective is to have a provisioned data store be as reflective of the current source system data as possible. There are several approaches to CDC employed by data engineers, but in the end, which is the most effective and least costly strategy that insures complete ...
In the first installment of this series, we examined how mid-sized enterprises can quickly get started on their journey to data maturity by implementing an operational reporting platform in as little as 4 to 6 weeks. The target users for this type of data service are the mid-line operational managers looking for actionable, tactical insight into system operations. The next user group to reach on the data maturity journey are the decision-makers at the departmental level (finance, sales, marketing, supply chain, etc) who require strategic insight for planning and resource management. The architecture required for this next level of data analytics is the 2-tiered subject-oriented data warehou...
If you are like many ETL developers you’ve struggled with an easy way to source cloud services data via REST API. Although standards are in place for REST API web services protocols, it seems that every vendor has their own variation of them, creating new challenges for each new source. Matillion’s cloud ELT product has long featured an API profile creator that sources from JSON files and creates RSD (Real Simple Discovery, an XML format) scripts for use with API query components. The effectiveness of this approach, however, is only as good as the quality of JSON files provided by the vendor.
Now, with version 1.47, Matillion introduces much more simplified functionality for extra...
The rise of personal computing in the 1980s and 90s led to a boom in business productivity that was transformative in its scope. Suddenly businesses had the power of what were formerly room-size computers on their desktops. This period saw the rise of the “knowledge worker” and the digitization of business.
But it was the impact of the internet that really drove business to the next level. All of those isolated desktop computers were now connected via a world wide web to enable communication, marketing and commerce without barriers. An open exchange of innovation and ideas fostered rapid growth, collaboration, and the global marketplace. The inter...
There is no question that companies are moving their on-premises data warehouses to the cloud at an increasing pace. The benefits of a cloud data warehouse (instant scalability, minimal up-front costs, rapid deployment, ubiquitous access, etc.) are being fully realized and appreciated by a growing number of enterprises both large and small. The major players in cloud DW (Snowflake, AWS Redshift, Azure Synapse, Google BigQuery) are all vying for market share, and the customers are seeing benefits from the varying and competitive costs and features of each platform.
But how do you get your data to the cloud? Isn’t the time and cost comparable to any data load project, whether it is on-pre...
The data vault has long been viewed as a model best suited for historical and archival enterprise data. Its “insert only”, business-process approach to raw, unadulterated data is ideal for low-maintenance storage of all enterprise-generated information from all systems. Use cases for data vaults have traditionally revolved around historical tracking and auditing . . . however, the perception has largely been that it is ill-suited to analytics due to its many-to-many relationships and dispersed structure. In fact data vaults are often used as a “lightly modelled stage” for traditional star-schema data warehouses.
But the data vault may be best suited for a use case that...
In a previous blog post, we examined the differences between traditional ETL (extract, transform and load) and ELT, where the “heavy-lifting” of data transformation is handled by the robust and scalable (usually cloud-hosted) target platform. In today’s modern cloud data warehouse environment, ELT maximizes the speed at which data is staged and ingested, while leveraging massive computing power in the cloud to cleanse, aggregate and otherwise prepare that data for general consumption. But where is the best place to manage that transformation piece? Is it using cloud-friendly ETL tools, or is it within the management consoles of the cloud DWs themselves?
A common perception a...
The two leading ETL/ELT tools for cloud data migration are Talend and Matillion, and both are well-positioned for moving and transforming data into the modern data warehouse. So if you’re moving to any type of cloud-hosted DW, whether it is a cloud-dedicated warehouse such as Snowflake, or part of a larger cloud platform such as AWS Redshift, Azure SQL Data Warehouse or Google BigQuery, which tool should you use to move your existing on-prem data?
Both Talend and Matillion can source any kind of on-prem data and land it in a cloud-hosted data environment. They can also move data to and from AWS’s cloud data-storage S3 as well as Azure’s Blob storage (which can be used to s...
Our recent blog series on the data integration portfolio introduced a variety of new architectures that help the enterprise manage their data resources, including replication, virtualization and cloud data warehousing. Organizations are now able to integrate multiple data management solutions to address a variety of business sources and requirements. But it is important to understand that the foundation of any enterprise data management portfolio remains the same . . . a roadmap to data management must be created that is independent of the underlying technology. This series of blogs will examine the three main elements of the data integration roadmap: the logical data model, master data ma...
This blog series has examined the hybrid data portfolio as a mix of technologies and approaches to a data foundation for the modern enterprise. We’ve examined a variety of strategies and technologies in data integration, including virtualization, replication and streaming data. We’ve shown that there is no “one size fits all” approach to an integrated data foundation, but instead have seen how a variety of disciplines that suit specific business and technical challenges can make up a cohesive data policy.
This final chapter puts it all together under the umbrella of “time-to-value" and its importance to the agile enterprise data platform. No matter what the techn...