© 2019 SourceMedia. All rights reserved.

Robust data governance is key for machine learning success

The terms artificial intelligence and machine learning are often projected as two sides of the same coin. Principally though, whilst the ML algorithms enhance AI capabilities, and enables them to do more cutting-edge and intelligent computing, there is an additional layer of perceived impenetrability which now cloaks the machine’s capability to reason and arrive at impactful decisions.

Industry pundits speculate about machine learning algorithms being a potential ‘Black Box’, primarily due to the scepticism around trusting an ecosystem which exhibits limited transparency to its data compliance and decision making processes.

The global data analyst community has helped design semi or fully-automated analytics systems that are AI or ML driven. However, the core and often-niggling issue of data quality may always prevail. Add to this, the multifarious and disparate data sources, immense data volumes, and unstructured data types that augment the already existing data management problems, especially those relating to data governance.

As ML gains momentum and continues to be at the forefront of transforming the way organizations operate, it may be advisable to exercise some caution. In the absence of robust data governance processes, the zeal to allow ML to take over the decision-making process entirely has the potential to unleash some critical issues – unreliable and misleading information and unexpected expense overheads.

So how do we do this effectively:

  • Should the gap between the necessity to build, organize, and implement effective and robust ML models be bridged?
  • Should we cater to the exponentially growing demands, and the need to comprehend and decrypt how those models work?
  • How do we comprehend the data that is being accessed and harnessed by the ML algorithms? Also, what are the long lasting and often irreversible consequences?

Data governance is undoubtedly the most logical answer.

Data governance as a framework defines, and helps implement the overall management of the obtainability, usability, integrity, security and effectiveness of data used in any ecosystem.

governance 10.jpg
Network servers hdd in a data center. Swallow depth of field

In today’s competitive world, every organization needs a well-designed and sustainable data governance model that strikes the right balance between strengthening data governance and not limiting the far-reaching potential of machine learning.

So how does Data Governance impact ML and AI?

The initial steps in implementing data governance models may be the hardest and face the most resistance. Data governance is no longer just about compliance alone, it is also a discipline that can accelerate the ML efforts and make AI a force to reckon with:

What are the pros?

  1. Data governance offers an unpretentious and direct method, to track and safeguard usage of the right data, but also recognizes data errors and promptly raises red flags and helps eliminate those errors.
  2. It empowers an organization to spend less time unearthing the accurate data source needed to feed the ML algorithms and dedicate more time to creating and refining the AI models.
  3. The biggest benefit from Data Governance is that it certifies that data is reliable and consistent. This is imperative, as more and more organizations now rely on large volumes of data, to make business decisions, augment operations, generate new business, and enhance profitability.

What are the cons?

  1. Too much governance may be limiting, as one of the biggest drawbacks in trying to govern the flood of data is to lose sight of business needs and objectives. Organizations may end up wasting a lot of valuable time and resources, filtering through unimportant data, and to finally arrive at the data of significant value to fuel ML algorithms.
  2. Data governance models may also impose strictures on how data is handled and can become controversial and often limiting in organizational progress.
  3. ML algorithms may find the data filtering and curating rules too stringent and the constraint may limit its intrinsic capability to perform effectively in modern and dynamically evolving data environments.

As the usage and scope of ML and AI evolves, and newer technologies are implemented, data governance will gain wider application and acceptance. The recent spate of numerous high security data violations has made data security a vital part of the data governance efforts. The European Union's (EU's) directive concerning General Data Protection Regulation (GDPR) is a prime example of data governance measures and reinforces the need for establishing more robust models.
We still have a long way to go to discover ML and AI’s complete potential and true capabilities for the enterprise. At the end of the day in a world of disruptive data, intelligent ML algorithms and swiftly evolving AI environments, data governance is the only way to provide some much-needed method to madness.

For reprint and licensing requests for this article, click here.