Cloud computing is the next generation “analytic data infrastructure,” according to a study recently published by Dresner Advisory Services. Whether on-premises, private, hybrid or multi-cloud, organizations taking advantage of cloud benefits require an alternative data governance approach.
Even with the promise of cloud improving data availability, Dresner’s report indicates nearly 50 percent of those surveyed have difficulty “locating and accessing data,” and in a new report on data catalogs, there is a “direct relationship” between those that have a catalog and the perceived success of “BI initiatives.”
Why? Potentially several reasons:
- Proliferation of disparate data that may be co-located, but not necessarily integrated.
- Plethora of technologies and APIs that provision, consume or syndicate data, including open source software, many of which have limited metadata capabilities.
- Security monitoring to continuously assess threats and vulnerabilities.
- Distributed protection of personal and other sensitive data.
- Dissimilar complex supply chain workflows that could create data anomalies.
- Third-party applications that cannot be integrated.
- Insatiable demand for more information.
- Burgeoning data diversification.
- Analytic complexity.
- More onerous regulatory, audit, and compliance requirements.
Cloud is a multiverse – a confederation of distributed apps. But regardless of a cloud implementation approach, while some apps may be co-located, they are not necessarily integrated. Therefore the cloud often complicates data management by creating distributed, non-integrated data environments, which require more governance – not less.
If data governance is yet to be an inherent part of cloud planning, strategy, design, and implementation, then addressing challenges like those above are harder.
Like with all new technologies there is an adoption, activation and adaptation cycle. Today, cloud benefits are self-evident, and the uptake will accelerate. This article isn’t about cloud virtues, but why data governance – to increase data availability and accessibility – is essential for cloud enablement.
Some things don’t change, including traditional data governance concerns such as charter, scope, principles, organization, roles, responsibilities, operating model, etc.
So what does change? Several things:
- Data polices – Are they sufficient to address security, storage, syndication, regulatory, and retention requirements? If such data management policies exist, are they consistently followed across the enterprise?
- Data quality – Is the data improved as result of legacy app to cloud migration? Are the business rules documented, and consistently applied?
- Data architecture – Are changes to conceptual, logical, and physical models coordinated and synchronized?
- Data security – Is the data consistently protected wherever it is stored? Does the data conform to security policies?
- Data stewardship – Does all of the cloud data have a steward? That is, someone who is accountable for the integrity of the data wherever it originates, and is stored or processed?
- Data science – How are models effectively managed in the distributed environment? Are quants using the same data consistently? How can cloud improve quant productivity?
- Data management and operational procedures – How will cloud impact data archiving, backup, recovery, business continuity? What is the impact on service and operating levels?
- Master data and metadata management – Is the data defined well enough to be understood by all stakeholders? Does the data have consistent format, rules, etc.
- DevOps – Is the development methodology, data transformation, data integration approach, operational procedures, and gate reviews sufficient to prevent data anomalies?
- Technical architecture – Introducing and integrating cloud stack(s) with existing stack(s) also presents challenges. Not all stacks are created equal, and given the nascent nature of many of new technologies, are performance expectations adequately managed?
One other ADI characteristic often overlooked is financial management. For example, what are the implementation, consumption, maintenance and support costs? Would chargeback improve cost allocation?
So what? Dresner’s findings suggest a “strong correlation” between those organizations that have a catalog and BI success. For many organizations, with hundreds, if not thousands of disparate systems, often distributed across heterogenous platforms, knowing and keeping track of what data exists where is very difficult.
Clearly, finding data, and remediating data anomalies, cannot be performed manually. Therefore, automation and orchestration are the only reasonable, practical solutions to intelligently address quality issues for “data in motion,” and catalog key “data at rest.”
Without sophisticated analytics, improving data quality is daunting. Through the use of advanced “scanners,” organizations can automatically catalog and classify all types of data, making sense of both similar and dissimilar data, and storing this information in a central metadata repository, which can be then be searched and enriched with additional context.
Contextualizing the content for consumers completes the 360 degree view of the metadata. Once this view exists, an organization can then decide how best to address data anomalies, integration, synchronization.
Dresner’s report reminds us that the “top priority” ADI use cases are: 1) “reporting and dashboards” and, 2) “discovery and exploration.” Done right, the cloud enables the data-driven decision making. And, therefore, successful mastery of data – governance – in the cloud necessitates some form of enterprise data catalog.
How else can decision-makers trust cloud ADI if they don’t know where the data comes from, what rules were applied, what’s the data quality, who’s accountable for data integrity, and so on? Data is new economic fuel, and the data catalog is the new jet propulsion engine. Savvy executives, who recognize today’s data-first, cloud-driven world, know that good ADI = better business.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access