Continue in 2 seconds

Privacy Compliance and Databases

Published
  • June 01 2005, 1:00am EDT

Perhaps it's appropriate that the privacy community seems comprised mostly of people talking to themselves.

In one corner are public policy advocates who examine every new technology for privacy risks and inevitably find some. Their usual recommendation is to regulate or prohibit the new technology's use. Another corner holds academic and industry researchers working to build a detailed conceptual foundation for comprehensive privacy management. Yet another corner is reserved for software vendors offering standalone products with specific privacy-related functions. In a final corner, or maybe another room altogether, are corporate technology professionals whose only real goal is to satisfy their compliance departments. The corporate managers rarely interact with the other groups except when searching for vendors to help solve an immediate problem.

Each group performs valuable work. The policy advocates are right: the privacy risks of new technologies do need to be considered. The researchers are also right: reliable privacy can only be provided if it's built into technology and business infrastructures. However, the software vendors are useful too: absent comprehensive infrastructures, their point solutions are better than nothing. And the corporate technology managers really can't, and probably shouldn't, do anything more than meet actual business requirements.

Still, the disjointed nature of the privacy discussion has a cost. The policy advocates often seem unconcerned with the practical implications of their suggestions, even though some advocates are themselves quite knowledgeable about business and technology. The researchers' conceptual frameworks could be very helpful to corporate systems designers, but only if they relate to infrastructures that actually come to exist. The value of the software point solutions is limited when there is no larger standardized framework for them to fit into.

The only place where everything actually comes together is in corporate systems themselves. The privacy components of these systems are driven by compliance requirements, which are determined by a hodgepodge of legal standards and regulations. The details vary with each situation, making a single solution impossible. However, the general approaches are similar enough that researchers and vendors can design, and corporate systems staff can look for, technologies that will make all types of compliance easier. One interesting set of technologies has been developed by IBM researchers under the heading of "Hippocratic Database" (see www.almaden.ibm.com/software/quest/index.shtml). Although far from a comprehensive privacy solution, it does illustrate several components worth having. These include:

  • Middleware to manage access to protected data. IBM calls this "active enforcement" and describes a module to intercept queries from application systems, rewrite them to comply with privacy policies and return only the permitted results. Policies are defined in terms of permitted users, recipients and purposes for each data element, and may be limited further by preferences of individual data owners. IBM has worked on a privacy specification language to express such policies effectively.
  • Audit trails to prove compliance. IBM has a technology called "Eunomia" that can reconstruct query results using a combination of logged queries and data changes. It argues this takes less storage than just keeping copies of all query results. Eunomia also gives more complete information, as seemingly innocent outputs can still reveal private data. For example, a query that selects a sensitive segment of customers could return an innocuous data element. Whether Eunomia or another method is used, audit capabilities are a critical part of privacy assurance and legal compliance.
  • Identity-protective data exchange. One privacy issue is how to share data in ways that expose only the minimum information necessary. IBM proposes "sovereign information integration," which lets two entities identify records they share without revealing records they do not. This might find matches between airline and terrorism watch lists without letting the government know who else is flying or revealing the watch list to the airlines. IBM's solution is for each party to encrypt its own data and then send it to the other party to encrypt again. If the encryption methods are commutative, meaning you get the same result whichever encryption is applied first, then a name or ID number appearing in both files would have the same double-encrypted value and be recognized as a match. Non-matching values would be unreadable by either party, because they would be protected by the other party's encryption.
  • Privacy-protective data analysis. There are many ways to analyze data without revealing private information. IBM mentions three. "Order preserving encryption" allows accurate analysis of ordinal values, such as ages, incomes or dates, without revealing the values themselves. "Privacy-preserving data mining" adds random noise to individual values while preserving accuracy of the aggregated data. "Data de-identification using BA k-Anonymity" ensures that a minimum number of records share any unique set of characteristics so that individual identities cannot be inferred. For example, this might involve replacing specific birth dates with age ranges. "BA" are the initials of the researchers; "k-Anonymity" has to do with the size of the minimum group.

The Hippocratic Database represents something more than academic research and something less than a commercial software product - although portions are being tested by IBM clients. It doesn't address major issues such as authentication or data owner access. However, it does illustrate how components of a privacy-sensitive system could be deployed to make compliance easier, without waiting for an overall architecture that may never appear.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access