Reimagining data governance in the age of AI and machine learning
The growth rates of emerging technologies in artificial intelligence, machine learning and the Internet of Things have drastically exceeded the current speed and ability of businesses to govern and protect their data and information assets.
In many cases, business leaders think that this is an either/or decision: they can either innovate rapidly using emerging technologies, or govern and control using existing governance models. Unfortunately, the larger the gap between the advancements in technology and the underlying data that is governed and protected, the greater the risk and potential loss for the business.
Is there a way for the two to co-exist? Can capitalizing on AI, ML and other emerging technologies be done in a manner that ensures governance?
Traditional data governance provides a rigorous framework designed to establish data standards, business rules, data protection policies and a set of organizational roles and responsibilities such as data stewardship. Like conventional methods, new architectures should also serve the principles of trust, data quality and overall protection of data, but they need to be reimagined for the current context organizations find themselves in.
Onboarding Data Despite the Lack of Predefined Rules
With agility at its core, the future landscape of any data governance architecture should not business definitions to be provided a priori.
Take IoT data for instance. IoT devices are both data gatherers and generators. Wearable devices, sensors and smart electronics collect data by the millisecond and stream that data into a cloud of infinite possibilities. With proper design, IoT data can become the foundation of disruptive modes of customer engagement, new product and service offerings, business models and ultimately to drive pervasive digital transformation initiatives.
The fact is that onboarding of IoT devices or the ingestion of data from these uncertified data sources is extremely difficult in an environment governed by conventional data validation requirements. In these early stages of the data lifecycle, conformity with predefined rules or standards should not be a data governance objective.
Instead, governance should enable quick and efficient incorporation of new data and provide tools that facilitate exploration, pattern detection, and discovery of risk-causing anomalies that eventually will shape governance rules and policies.
Starting with the Data Before Automating Governance
Unlike the old world, where governance came first followed by business applications and analytic models, the new world requires the process to be reversed. Today it’s best to start with a killer application of data, define its success metrics, validate and test it in the market and, finally, create a governance framework around it.
This new world of “apps first, governance second,” has emphasized the need for automated governance, where the hallmarks of traditional data governance, such as data lifecycle management, data lineage and data quality remain important, but happen as a natural byproduct of the data management and analytic cycle without much human oversight.
In lifecycle management, data lineage and the ability to swiftly trace data’s journey is critical to support compliance. Modern data governance capabilities are able to automatically capture when and where data enters the system, the transformations that it goes through, the applications it lands into and ultimately the decisions or processes it supports. This automated data governance also provides audit trails that record all changes to the data, ensuring that changes to data will always alert the organization to shifts in key metrics affecting business decisions and strategies.
In the area of robotic process automation, where you are enabling business processes to take place without requiring a human touch, this becomes even more critical. After all, an audit trail of how the robot has performed its learned human tasks can be invaluable for root cause analysis as well as regulatory reporting.
Bridging the Trust Gap
Transparency and explainability is another part of the new requirements.
Picture this: during a board meeting at a large financial institution, the line of business leader responsible for consumer loans is asked to explain the rationale behind the denial of loans to applicants of a certain age group or race. Behind the scenes, an AI algorithm produced the results or augmented a decision by loan officers in the field. Unfortunately, this is hard to explain because an AI algorithm produced the results or augmented a decision based on information provided by loan officers in the field.
Moments similar to this are playing out in recruiting, consumer marketing, healthcare, higher education and college admissions. As organizations build advanced, continuous-learning technologies that far exceeds the capabilities of the human mind, it is imperative to pay very close attention to the governance aspects of these models, and the trust made in the data as algorithms can be black boxes. That’s why, in the midst of enormous excitement around AI, confidence in knowing how the machines have made decisions is a big consideration.
In the wake of the General Data Protection Regulation, California Consumer Privacy Act, and other regulatory accords, governance has become increasingly mission critical too. For example, if an organization suddenly needs to start tracking geo data, they need a way to make sure geo data is indexed in the system and can be searched easily so they can act on that information. The same requirements apply to data retention policies.
Today we have reached a tipping point in creating effective governance models that can easily be adopted alongside our AI and ML initiatives. With opposite forces at play – i.e. the speed, agility and better decision making that the new emerging technologies offer and the pressures organizations are under to control, govern, and explain the data and insights that are generate, there has to be a better way.
Governance in the age of AI and ML has to become the catalyst for responsible adoption of these technologies, addressing the risk that complex algorithms could introduce and building trust, confidence, and credibility for our organizations and our decision makers.