Understanding the data enablers critical for success with AI
Organizations wanting to take advantage of artificial intelligence technologies will quickly learn that there are several enablers for data to be effectively used for AI. They include:
- Data quality - Accuracy, completeness, validity, currency, availability, coverage, structural and semantic consistency.
- Data governance - Corporate guidance and policy.
- Content management – Relevancy and lineage.
- Architecture - Data integration and aggregation.
But what do these terms mean in practical terms? Consider the following:
Accuracy: Clean data is crucial in order to get a desired outcome from machine learning capabilities. Scale and diversity in data is also another important aspect. A major question to consider then is how accurate is the data to give a usable outcome.
Coverage and availability: What is easy to access are the machine-learning services and algorithms, but data is still the prime constituent of AI. The basic predictive efficiency of AI models is defined by diversity, scale and quality of input data.
Structural consistency and semantic consistency: Most of the data held by information aggregators or large institutions is not consistent across systems and processes, and is also not consistently formatted across the organization.
Integration: Data or information in a common financial service landscape, for example, is usually available across disparate systems. These systems create, acquire, store, maintain and archive data in varied ways. The challenge in terms of integrating and aggregating data may lie in a common data lake as an input to AI-based services.
Lineage and currency: AI is driving the need to build real-time data flows across institutions to access essential data. Real-time data flows are still a far cry for most organizations. Here the challenge is architecting data flows that can assist in making available streaming data with less Information lag to AI-based services.
Completeness: We are not just referring to internal data that can be partly trusted but also to external and public data that is required for scaling the data to AI use. Organizations must fix irregularities in missing data and in-valid data.
Data Governance Gives the direction to an organization to embrace AI
Corporate guidance and policy: Organizations would also want to monetize their data as it is proprietary data, while AI would necessitate that this data must be shared with competitors to reach minimum requirements of efficiency. Monetizing and data sharing need to be addressed with great efficiency in direction and guidance.
Data governance: The financial services Industry is one that is making large-scale investments in artificial intelligence. However, regulators are eyeing substantial uncertainties that need to be regulated through guidance in the form of policy, in the use of AI in the banking and financial institutions.
Collaborative solutions built on shared data-sets will radically increase the accuracy, timeliness and performance of non-competitive functions. But the question remains: is there governance, guidance and oversight over the collaboration of data?
Relevancy: All the data might not be fit for purpose or contextual to an AI use-case.
Let’s refer to an insurance firm that uses alternative data like channel usage characterstics, rather than traditional and passive data to price insurance products for cyber security risk. The vast sources of external and alternative internal data (perhaps unstructured) might not be relevant to the context of the outcome that the model would provide. This makes it even more important to simplify and understand the data better before applying it for purpose.