Good data governance is important not only for compliance, but also for data-driven competitive initiatives that business leaders care about. In the wake of Sarbanes-Oxley, few companies have been able to thrive without at least a serviceable culture of corporate governance. The same is true now for new data protection and privacy laws such as General Data Protection Regulation.
While every company is time-bound to meet compliance requirements, the most successful ones view effective governance as a business imperative, and they tend to do one thing right. In addition to governing their data and securing it such that no data, especially personally identifiable data, falls into the wrong hands, they also ensure that the right data is accessible to the right person at the right time.
Beyond defining access privileges, effective data governance means that companies need to be able to document or label their data assets, much like the labels on pharmaceuticals. Business users must be able to search across data assets, but before they access a given data set, they should be able to read about it to know what it contains, how trustworthy it is, who the data is intended for, when and how the data can be used, and other critical pieces of information. Finally, effective data governance means that data is labeled by common, standard definitions, enhanced by usage and user information.
What companies need is a centrally accessible catalog or marketplace for data assets from both within and outside the enterprise, updated in real time. But what is the best way to implement that and what is holding organizations back?
Impediments to an Effective Data Catalog
Clearly, a catalog such as this would have to overcome some challenges or risk failure: as breadth of sources, unified views, dynamic updates, and direct link to actual data delivery platforms are all potential barriers. If it were a BI dashboard, for example, reporting on a data warehouse, it would not necessarily have access to the latest data, depending on the scheduling of the appropriate ETL updates.
Also, several data governance tools would not necessarily be able to access all data sources, such as cloud repositories, transactional systems, or unstructured Big Data sources and social media feeds. A data catalog that is disconnected from data delivery platforms would also be a passive window on the infrastructure, unable to provide usage statistics, suggest dynamic uses, or prevent multiple individuals from creating multiple, ad hoc definitions for the same data sets.
The Ideal Data Cataloging Platform
Today, organizations are creating these catalogs in just a few clicks, using advanced data virtualization technology that provides unified data delivery. Data virtualization is a modern data integration approach that creates real-time, integrated views across a myriad of disparate sources, without replicating any data. It works with existing data warehouses to establish logical data warehouses or data services, which can also access data scattered across multiple sources including cloud-based and transactional sources.
Data virtualization enables a single layer from which to access all data across the enterprise for analytics, digital applications, and directly to users for information self-service.
With this architecture, companies can establish enterprise-wide security privileges and governance rules from a single control point closer to the point of business data consumption and directly linked to data delivery. As a result, the catalog adds dynamic usage information, user feedback, and can suggest uses automatically.
Data virtualization is also expanding its data governance capabilities by enabling users to enrich the data’s business definitions and trust ratings on top of what data stewards can do. Because the data virtualization layer provides real-time access across the enterprise, the catalog is kept up-to-date and adds dynamic usage statistics, automatically. And because the catalog is implemented in a common, shared semantic layer across the enterprise, it can easily enable the enforcement of standard definitions that are in business users’ chosen terminology.
An active data catalog based on data virtualization can also coexist with other passive catalogs of disconnected data stores, by sharing metadata with them.
Leveraging the data virtualization layer, the catalog contains complete documentation in terms of data type, suggested (or required) usage, and other parameters, depending on an organization’s needs. The data catalog shows the relationship between different data sets or the associations among different =types and can display the full lineage of any data, from its origin across different sources, to how it got combined with other data to finally arrive in its current form. The data catalog can then track any ongoing changes and add those changes to the lineage records to provide a much more holistic view of enterprise information than is possible using traditional tools.
In addition to establishing a key foundation for effective data governance, a data catalog also solves a fundamental challenge shared by many organizations today; they often lack a single place to go for learning about all data assets throughout the company. By providing real-time views across a company’s disparate data sources, data virtualization enables catalogs that can easily serve as the single source of information about an enterprise’s data assets.
Modern data virtualization platforms can support an extremely wide range of sources, including databases, data warehouses, on-premises and cloud applications, flat files, NoSQL, and many other modern and legacy sources.
Information Self-Service, at Business Users’ Fingertips
Information self-service is the precursor to self-service BI. A full-featured data catalog provides clear benefits to data consumers. To access the data, the catalog prevents consumers from first having to chase down the data owner (assuming that the person is even known, and still with the company) and instead simply search and browse for information assets as easily as we search the world-wide web. As a result, and without even accessing the data, they can learn detailed information about it.
A centrally accessible data catalog, unified across disparate sources, updated in real-time and linked to agile data delivery enables true information self-service. It collects and curates the right data for the right users, depending on their access privileges. It provides common definitions, even across business units and multiple data management and BI tools - all while supporting centrally controlled data stewardship and strong governance capabilities enriched by users at the edge.
These are the critical “guardrails” for not only information self-service for analytics and applications, but also continued success in a world where companies must go beyond compliance to empower business users, partners, and customers.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access