Data governance is a discipline that continues to get a high level of attention from CDO’s, data consultants and data practitioners. Almost all data-driven organizations recognize the needs for it, many have attempted to implement it, and many have failed. Why? The reasons given are familiar: not enough executive buy-in, not enough data-user buy-in, not enough time to devote to it, no comprehensive technological solution, etc. It seems that the importance of data governance is at odds with the difficulty of its implementation. Does it need to be this hard? Perhaps a rethinking is in order around data governance. Perhaps we need a “Data Governance 2.0”.
Traditionally, data governance has been a “top down” approach of setting policy, identifying data stewards, and holding them accountable for data aligning with business rules. This led to the dreaded “data police” and a culture where analysts feel hamstrung by governance policies rather than enabled by them. Data analysts and data scientists, on the “front lines” of the enterprise’s data infrastructure, should be active participants in data policymaking. In worst-case situations, IT is driving the policy from a purely security-minded perspective. And while data security is more important than ever, this narrow view of data governance removes the business stakeholders from the equation and reduces it to purely technical oversight.
What is needed for modern data governance is a business-centric, collaborative program. This approach broadens data governance to include data cataloging as well. Where data governance is around policy and monitoring, data cataloging pertains to collecting, publishing and sharing data knowledge collaboratively. These two pillars, paired with technical data and process management, form the enterprise-wide data governance strategy. It is a pro-active, people-driven approach . . . it is not driven by technology or rote policymaking. Most importantly, it helps promote a data-driven culture by involving all stakeholders and data-users in the oversight and ownership of enterprise data.
A good data catalog tool such as Alation allows for easy setup of stewardship and smart collection of not only structured data sources, but also file systems, reused queries, BI applications, etc. In other words, whenever data is utilized by the organization, it gets cataloged. Couple this with a “crowdsourced” glossary of business terminology, including how that terminology relates to the data (“data dictionary”) and to other terms and documentation (“taxonomy”) and you have a full picture of the enterprise data landscape and how it “maps” the business landscape. The next step is then a curation process by the stewards; this involves anything that makes the data assets more meaningful to the business and adds value: e.g. optimal usage techniques, tags, descriptive metadata, semantic labeling, etc. A well-curated data landscape helps organizations move from a data-searching and reporting culture to one of analytics and discovery.
Data governance does not have to be a necessary evil. Instead, it should be viewed as a methodology for collaboratively enabling data democratization through cataloging, curation, and sharing. In our current “lockdown” climate of remote work, analysts need the support of a collective governance solution to empower and enrich the data environment. Cloud-hosted data catalogs like Alation are ideally suited for robust data cataloging for the remote analyst. Perhaps today “governance” is the wrong word . . . let’s learn to embrace “data comprehension”.
About the Author:
JOE CAPARULA is a Senior Consultant with Pandata Group who specializes in delivering data modeling and data integration services for clients across several industries.