Avoiding one of AI's greatest challenges: the misuse of data

Register now

It’s clear that the next evolution of enterprise investment will be dedicated to the automation of business workflows. More specifically, investment will be targeted on the augmentation of human operations with machine learning and artificial intelligence, or ML/AI.

This investment is predicated upon the notion that business-critical insights will derive from powerful, complex algorithms crunching huge amounts of data, creating the speed and efficiency to out-compete, or at least keep pace with, new technology-driven competition (Hint: I’m thinking of Amazon).

The move towards ML/AI is not about yet another database, or about consolidating all of your data into a single “analytics” environment. This development will be focused on addressing a long backlog of critical business questions that need answers, which previously took humans too long to address or required data far too complex for humans to digest.

The first big movement will be in the data science platform space. Enterprises will invest heavily in next-generation predictive tooling, much like they have in business intelligence tools over the past 10 years. The torch will be passed to companies like DataRobot, which automates machine learning for advance business analysts, and to Dataiku, which simplifies the data scientists’ prototyping environment.

While all of this is good news, there is a catch: the rise of AI as a mainstream enterprise technology has real potential to drive greater misuse of corporate data. Take Equifax for example, where poor handling of customer data led to the firing of C-Suite executives, and not just the CISO.

For the first time, to my knowledge, the entire executive suite within major corporations will have to ask hard questions about data protection, regulations, cyber insurance, and more. Their jobs will literally depend on it. Risk will take on a whole new meaning to corporate executives.

Ultimately, however, the AI hype cycle is doing a real disservice to a number of organizations by obscuring the importance of risk, setting them up for failure in 2018. Too many executive are, for example, myopically focused on being “AI-driven” without focusing on what data is fueling their algorithms, how that data is being managed, and the corresponding risk is of each algorithm.

Historically, through business intelligence tools and reporting, enterprises could go back and fix issues within their data repositories. Master data management (MDM) and self-service data preparation ballooned under this premise, where users would “update” data within data warehouses. For all intents and purposes, this model worked. It became easy and painless to fix BI reports - most are easy to interpret and users can work backwards to find the data and fix any problems that may emerge.

The problem with ML/AI is that data scientists need access to a lot of raw data from many disparate systems. Each one of those systems implements rules on the data without centralized, understandable policies controls. And each algorithm is frequently treated like its own snowflake, with no single source of understanding, data access or control.

This approach to risk management frequently stymies data science projects, holding data science programs back and stifling innovation in the exact departments tasked with innovating. Data project are often delayed, for example, until data science teams finalize per-project policies on the data they need and how it can be used.

Once approved data is accessed, the slow speed by which data is processed and the distributed nature of the results typically means that it's too late to go back and fix erroneous data once any errors are discovered.

Making matters worse, data science teams quickly lose insight and understanding into how many of their models are making decisions. If rules on the data change, the impact on models isn’t trivial. Frequently, the model must be brought down, re-trained, evaluated for accuracy, and then re-deployed.

Understanding these risks up front is going to be critical for enterprises in 2018. The most successful organizations will not only leverage machine intelligence to make better, more accurate decisions, but they will do it ways that foster the durability of models and minimize their risk.

This can be done by evaluating the metadata pushed into the model and determining the likelihood that the rules on that metadata could be impacted. Frameworks like LIME, which provide insight into the model itself, will be a major factor in helping companies understand where their algorithms create new sources of risk. If a highly risky piece of metadata is heavily weighted within a model, algorithms may have to be reevaluated or re-written, potentially creating operational impacts.

Put simply, ML/AI is coming. The best companies will look at risk early and begin to adopt processes focused on managing the risk of their AI. This will be done by bringing lawyers and data scientists together, especially within companies affected by the EU’s GDPR, which will begin to be enforced in May.

Model review boards will be a common gathering across IT, line of business, and executive suites. Organizations that don’t address these new risks early on will, at best, move slower than their competitors, and at worst, will create severe harm to their business and their bottom line.

For reprint and licensing requests for this article, click here.