Data governance in the age of AI: Beyond the basics
If you want some basic advice about launching a data governance program, it’s easy to find: Get leadership buy-in, appoint some data stewards or champions, and make sure everyone meets regularly to talk about it.
But there’s a lot more to it than that, especially in the age of data-hungry tools born from artificial intelligence. When you’re dealing with data on a machine learning scale, you’re dealing with “garbage in, garbage out” on steroids.
“The only thing worse than not having enough data is having too much,” says Colin Zick, who co-chairs the healthcare data privacy and security group at the law firm Foley Hoag. “That’s the challenge as we move into a world where there are increasingly large data sets—the data governance. How do you manage the data, how do you ensure its quality?” With AI, he adds, “you really have to get into making sure the data and the analysis and the quality are there, because, if not, you’re just going to make bigger mistakes.”
Data integration, security, integrity and quality takes a lot of work, Zick says. “On the other side, if you put in the effort, you can get very valuable results. But you cannot expect that to happen without putting in a lot of time and effort.”
Cary Smithson, director of digital transformation and management at Grant Thornton, an accounting and consulting firm, agrees: “You won’t really get the full benefit of AI ... if you don’t have your data model well defined and have the data that you need to leverage it,” she says.
“AI content management and information management and associated governance is foundational to getting the benefit out of AI and other technologies, even analytics. You need to really focus on defining what your master data is, sources of record, who owns the data and then get a governance model in place,” she adds.
So, what specific steps can organizations take to avoid the pitfalls and reap the rewards of data governance in the age of artificial intelligence? It starts with a strong foundation.
Build the right team
Organizations need buy-in from leaders. And yes, they need data stewards. But that’s not nearly enough.
C-suite executives, for example, aren’t necessarily the right people to lead data governance programs. Data stewards often lack even basic marching orders. And while communication is important, endless meetings could be counterproductive if the right people aren’t in the room.
“Nine times out of 10, it becomes data governance by committee,” says John Walton, senior solution architect for IT consulting firm CTG. “Don’t try data governance by committee.”
Walton cites as an example a data governance program that kicked off with great intentions and a bevy of C-level titles at the table, from the CEO to the CIO.
“How long do you think that lasted?” he asks. “After two meetings with no progress, everybody’s arguing about [algorithms]. They finally said, ‘Well, this is a waste of our time,’ and eventually it just died on the vine. Honestly, I can’t tell you how many times I’ve seen that.”
And who should be at the table?
“The right people are subject matter experts or business data stewards. They’re the people that have subject matter expertise in the data domain that you’re trying to manage,” Walton says.
But, he adds, you can’t bestow the title of data steward and call it a day. “For those that don’t take the governance-by-committee approach, they identify data stewards and don’t really tell them how to do their job—nor do they make it part of their job description, which is essential,” Walton says. Data governance will fail “literally 100 percent of the time” when the appointed data steward doesn’t understand business processes or workflows.
One solution: Ensure governance team members have defined roles, including tactical and high-level strategy responsibilities, Smithson says.
Split data champions into two groups: data stewards, who make recommendations about formulas or algorithms, for example, and director- or VP-level data owners who make the decisions, Walton adds. And put roles and responsibilities into job descriptions. “The job responsibilities come from the workflows and the tasks that need to be accomplished.”
Those job descriptions should fall into two buckets, he says: data quality assurance and information consistency. For the former, tasks include identifying a data quality issue, remediating that issue with a workflow change, for example, and monitoring to ensure the effectiveness of the data governance initiative. For the latter, tasks include creating a business measure to support key performance indicators, to modify it when business rules change, and to sunset any items that are no longer relative.
A bonus tip: Tie data owners’ bonuses to data quality. “That will get people’s attention,” Walton says.
When it comes to data governance—or any project, really—you hear a lot about the importance of culture and change management. But what does that mean, exactly?
“Organizational change is a key that tends to get forgotten as part of a lot these initiatives," Smithson says. It’s about “defining who owns what data and looking across the key stakeholders in the organization. Where are they from a readiness and change perspective? Maybe you need to put a communication plan in place to … tailor the information to different levels and roles.”
The American Society of Clinical Oncology’s (ASCO) CancerLinQ initiative collects and analyzes massive amounts of data from patient encounters to give oncologists a quality monitoring system to ensure patients are getting the best care.
“As the representative body for oncologists we wanted to be able to build something that they can trust. Ultimately, you live and die on the trust that you have,” says Alaap Shah, an attorney at Epstein Becker Green, who was the ASCO’s chief privacy officer at the time.
How to build trust?
Step one was to develop a set of governing bodies to provide guidance on issues, to formalize them into policies, to govern how the organization operates and how it builds and uses technology, he says.
The data governance team delved into policy, ethical issues, and legal and regulatory requirements. Those were distilled into principal documents, then more fleshed-out policy statements.
“Those became the internal bellwether by which we operationalized a lot of this program,” Shah says. “They were living and breathing documents which we revisited from time to time, partly through those committees we formed but also through the staff that was operationalizing around it. We were essentially developing a culture of compliance, a culture of privacy, a culture of data protection, a culture of responsible data stewardship. All these are the high-level principles by which we started to operate and think and live and breathe.”
Step two was to “think about, internally and externally, what responsible data use and stewardship looks like and carry that out. We don’t want to say to participants ‘hey, give us all your data, we’re going to do great things with it’ and then go and sell it to some bad actor somewhere in the market,” he says. “The point is, what do we need to be thinking about and doing to make sure that we’re ... acting responsibly relative to the data we’re getting from participants in our network and also using that data for downstream purposes that are responsible from a public perception perspective?”
Setting boundaries means disclosing up front what the organization planned to do with the data and not deviate from that promise. “Ensure you’re safeguarding the data technologically and otherwise and then ultimately create something good with that,” Shah says. “You’ve been charged with this great responsibility because you have all this data; make sure that you don’t screw it up.”
The ASCO’s first and foremost goal was to serve oncologists and their patients. But it also had to work with pharma, medical device manufacturers and the government—and figure out how to do so responsibly.
“All these stakeholders have roles that will benefit the ecosystem if they get the right access to the right data and [can] use it in the best way possible,” Shah says. “We had to think critically about what are the boundaries within which we’re comfortable doing that?”
The data governance team recognized that need to share with the pharma industry, for example, but set up guardrails—data would always be de-identified and they would respond only to specific requests that had clear time and scope limits and aimed to solve specific, concrete problems.
The guiding question, Shah says: “Is there a specific segment of the population that has an immediate need for innovation in the drug space that we could help support?”
A research and publications committee created a process to vet requests for legitimacy, feasibility and even the character of the organizations before agreeing to share data. The team would also look at the technology an organization would use for the analysis and ensure they had appropriate security safeguards in place. Organizations that didn’t might be credentialed to view limited data in the ACSO’s own system.
In addition to vetting, contracts are crucial, Shah says.
“You need to have contract vehicles that contemplate and hammer out some thorny issues [such as] the data ownership pieces, so everyone knows who owns the data, who’s just licensing it and borrowing it for a short period of time so that you don’t have disputes about that later. You have to hammer out issues [about] de-identified data. You have to be very careful to put in clear language about not linking it, not reidentifying it, not sharing it with other third parties without … assurances in place,” Shah says.
“One of the big problems with the industry today is people don’t appreciate this issue because they don’t want to go through the time and expense of vetting,” he adds. And organizations can contract out that function, but then they run the risk of losing sight of patients’ data privacy.
“We can fight about who’s going to carry the bag at the end of the day. But at the end of the day is when the patient’s privacy interest is lost,” he says.
Use technology—to a point
Too often, organizations don’t equip their data stewards with the technology tools they need. Data profiling tools, for example, might be expensive—but will pay for themselves quickly, Walton says. “I just can’t emphasize enough how important it is to have a tool that can help the data stewards do their job.”
Other tools might include enterprise content management solutions or services for unstructured content, Smithson says. Look for solutions that integrate with other systems, such as customer relationship management tools or the major enterprise resource planning applications.
“These content management solutions that go enterprise-wide, they can have content services that are already built to work with some of the leading tools that are out there. So the documents that get generated or used by these other solutions … they can just manage that along the process and serve up documents in the other applications wherever the eyeballs are and where you’re working along the business process,” she says.
Then again, technology isn’t a cure-all.
At Facebook, scientists famously deployed a generative adversarial network to get two AI systems to communicate with each other and solve a problem, but didn’t stipulate that the systems should do the work in English. To earn rewards faster, the systems developed a kind of shorthand that humans couldn’t easily understand.
“The AI has done something completely unforeseeable and unexpected and actually sometimes problematic,” Shah says. “You can see that there could be a scenario where you give AI this power and all this data and it goes off and does something really wild with it.”
To be fair, online reports of creepy computers making up their own language to the shock and dismay of researchers were overstated. In fact, getting unexpected results when deploying learning systems is a feature, not a glitch. Surprises could lead to a medical breakthrough, after all.
Then again, surprises can also cause a data breach. Zick recounts working with a client that thought they scrubbed their large data set of all personal health information, for example. But it turned out physicians were entering PHI in a field that was supposed to be PHI-free.
That’s why humans have to keep an eye on AI outputs and react in near real-time to the unexpected—and it’s why data governance is so important.
Although there are huge benefits to AI and machine learning, when left to its own devices, “the parade of horribles is extensive,” Zick says. “The moral of the story is you’ve got to go slow [and] verify the data and you can’t just trust the technology. … If you don’t check it with something tangible you’ve got a problem.”
There are different facets to managing data in the age of AI, Shah says. “You can’t solve it with any one thing. It’s not a technology fix; it’s not a policy fix. You need to have a holistic data governance plan that deals with people, processes and technologies, with all of it working together in concert.”
Don’t be “enamored by some of these shiny tools without working [on the] foundation first,” Smithson adds. “If you don’t … then your information is not going to be governed.”
(This post originally appeared on the sister publication Health Data Management, and can be viewed here).