What is data privacy, really, and what tools are required for it?
During the first decade of the 21st century, organizations came to realize the value of the data held in their IT systems. That data could be used to support a variety of organizational goals, including individualized marketing campaigns, improvements to operational efficiencies, and assurance of regulatory compliance. Accordingly, methodologies and tools were developed to manage, maintain, and leverage data, such as data governance, data quality, and data analytics tools.
In the second decade of the century, we have witnessed a growing social and corporate awareness regarding organizations’ use of data. These range from consumers who want to limit the use of their personal data to corporations which require that certain data potentially impacting their reputation or brand be closely managed.
This emerging discipline is called data privacy and it is distinct from data security which focuses on technical methods designed to protect data from unauthorized access.
Three general types of data are addressed by data privacy:
- Regulatory Data: Data covered by specific governmental or institutional regulations. Typically, this is personal information (PI) that may be used to identify individuals.
- Contractual Data: Data restricted by contracts between organizations. Examples of such data might include marketing plans, product development plans, etc.
- Sensitive Data: Data held within an organization which, if made public, would negatively impact the organization’s reputation or brand. Examples of such data would be pending litigation not subject to legal disclosure requirements.
Data privacy regulations are designed to protect an individual’s personal information (PI) – data that can be used to identify the individual – and will be the focus of this discussion henceforth as it is representative of the tools and methodologies required.
The template for such regulations has been the EU’s General Data Protection Regulation (GDPR), effective May 25, 2018, which governs the use of all EU citizens’ data, regardless of where the using organization is located. The California Consumer Protection Act (CCPA), effective January 1, 2020, similarly governs the use of all CA residents’ data, again regardless of where the using organization is located. Other similar regulations are under consideration by various US states.
These regulations broadly outline:
- Consumers’ rights regarding the collection, storage, usage, and retention of their personal information
- Reporting requirements of the subject organizations regarding protection, usage, and breaches of that protection
- Fines and penalties for compliance failure, including fines for not reporting data breaches within specific timelines
While organizations located in the EU and CA are clearly impacted by these regulations, they apply more broadly to organizations with consumers from those locations. For example, the Chicago Times website was, for over a year, inaccessible to users in the EU. Marriott, a US-based corporation, has been fined $123M for a data breach under the UK Data Protection Act (similar to GDPR). It is therefore imperative that organizations are in compliance with the relevant data privacy regulations. What tools do they need to ensure compliance?
In addition to data security software, compliance with data privacy regulations requires the use of two broad categories of data tools:
- Data catalog
- Data lineage
Data catalogs are tools that store metadata, i.e., data about data regarding an organization’s critical data, including information such as the data owner, data classification, data location, business usage, data sensitivity, etc. Gartner defines a data catalog as:
"A data catalog creates and maintains an inventory of data assets through discovery, description, and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists, and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value."
Data catalogs are necessary to maintain identification, location, and acceptable usage of data impacted by data privacy requirements. Without an organization-wide data catalog, compliance and privacy regulation is difficult, if not impossible in that the privacy impacts on individual datum are unknown.
Data privacy requirements necessitate not only identifying the location and nature of impacted data, but also the flow and transformation that data takes throughout the application landscape. This functionality is addressed through data lineage tools, which provide various representations of how data flows through an organization’s IT ecosystem and the transformations that are applied.
Understanding data lineage is also vital to comply with data privacy requirements in that as impacted data travels through the application landscape, it may reside in multiple locations in addition to its system of record. Moreover, transformations on impacted data may be reversible, thereby multiplying the number of data locations.
As with any major industry trend, data privacy opens numerous opportunities for consulting services as well as tool implementations. Potential consulting engagements range from support for tool selection, acquisition, and configuration, to implementation-focused projects populating the data catalogs and lineage information, to advisory services focused on the development of privacy policies, guidelines, and standards.
Additionally, organizational training and certification in data privacy is available from several commercial organizations as well as the International Association of Privacy Professionals (IAPP).