Governance for your data platform: The sooner the better
To meet today’s competitive demands, companies are rebuilding their data platforms, hoping to achieve new levels of business insight by eliminating data silos, utilizing the cloud and ensuring high performance, scalability and agility.
During the initial stages of development, a data governance strategy to ensure compliance with internal controls and regulatory requirements, including new privacy regulations, is often not a priority. Since few users and limited amounts of data are involved, the logic goes, data risks are low, so the emphasis should be on faster iteration. Governance can always be added later.
This logic is risky—and wrong. In the same way that better software testing speeds development, imposing data governance early on can speed data platform iteration, while also protecting data and ensuring compliance with the increasing demands of privacy regulations. In short, by investing early in data governance, you can build faster and stay out of trouble.
Governance Can’t Wait
Privacy regulations are here to stay, and there will be more of them. The General Data Protection Regulation (GDPR) went into effect last year and non-compliance is proving to be fiscally painful. The U.K. Information Commissioner’s Office (ICO) levied a $230 million fine on British Airways and a $123 million fine on Marriott Group. And this doesn’t include the cost of legal fees, repairing damage to the brand, or lost opportunity costs.
A higher number of larger fines is expected in the future, and the California Consumer Privacy Act (CCPA) goes into effect next year, with more regulations to follow. Companies ignore these regulations at their peril.
Yet a typical data platform development process may start out something like this:
- Install software.
- Build some automation for loading data.
- Determine the initial tools that can access the data.
- Add useful data to the platform for specific users.
- Iteratively add more tools, data, users and automation for further testing.
- Establish initial metadata management.
- Add data governance capabilities.
The problem here is clear. Adding data before a governance solution creates risk, no matter how “careful” developers insist they will be to avoid using sensitive data. Enterprises are typically pretty good at distinguishing clearly non-problematic data—zero personally identifiable information (PII)—from clearly problematic data—SSNs, credit card numbers, etc.
However, in the age of soaring data volumes and increasing regulatory complexity, a large and growing gray area exists depending on a company’s industry and regulatory status, making it hard to know what sensitive data exists and where it is - something we tried to make easy for data lake owners. As a result, adding data to a new platform before a governance solution can lead to the misuse of that data, as well as to the need for potentially time-consuming and expensive fixes as the platform grows.
For example, it is easy to imagine the following scenario. At Stage 3, a developer enables a small group of testers to access a curated subset of customer data containing no PII. Following a successful test of the use case, the developer expands the platform to include a group of business users and a larger subset of the data without realizing this new subset has not been curated and includes unencrypted SSNs.
However, even if developers understand the risks inherent in the data they are using and take extraordinary care, preventing inadvertent exposure requires a significant amount of manual gatekeeping. For example, they may create a bespoke point-to-point solution for transforming data to hide the SSNs from the business users. Yet this is still an error-prone and risky process requiring constant updates whenever new tools, datasets or users are added.
A comprehensive governance solution needs to be deployed anyway, so by doing it early, developers have guardrails in place that automatically ensure only the right data in the right form is exposed to the right users, no matter what tools are being used. This eliminates the need for developers to engage in gatekeeping, freeing them to iterate faster, even as the solution protects the organization.
Foundations for Governance
Whether or not you heed the advice to invest in governance early on, when the time comes, be sure you are building, buying or subscribing to these foundational governance capabilities:
- Visibility into user activity – You need to have insight into who is accessing what data in order to keep sensitive data out of the wrong hands. You also need this visibility to be able to comply with new privacy regulations, such as ensuring data is used only with the proper consent for an intended purpose.
- Discoverability/cataloging of data – If you do not know which datasets contain sensitive information, such as SSNs or credit card numbers, you cannot protect them.
- Access controls – Now that you have insight into what data you have and how users interact with it, you need to be able to impose granular control over this access. The more granular the control—down to the individual level for users and the individual cell for structured data—the more protection and flexibility you will have.
- Tool agnosticism – Your governance solution must be tool agnostic. Otherwise, you’ll have a new bespoke project every time you want to add a new tool.
At a minimum, if you cannot acquire all these capabilities at the earliest stages of your development process, by insisting on user visibility and data discoverability, you will at least be able to see any potential privacy risks before they damage your organization.
Given the new sensitivity to data privacy and protection, we need a new mindset. “Move fast and break things” is no longer a strategy for success. It’s a recipe for disaster. For data platform development, the only safe strategy is building in governance early on. Better yet, make Privacy by Design a core element of your corporate culture and put privacy at the heart of everything you do.