For structured data, the problem is even more challenging when considering current database management practices. For every production database application, many IT organizations create multiple copies of the database for production support. These copies are used for test, quality assurance (QA), standby, training and new application development. In many cases, the copies are created in environments that do not have the same security controls as the production environment. If the production copy contains sensitive information, so do all of the copies. This poses a greater risk of insider theft or tampering of sensitive information.
During an e-discovery process, data is searched, classified and presented as evidence in a legal case. A large component of presenting electronic data as evidence is proving that the information is authentic and that a company has placed the proper controls around the data, protecting it from insider theft or alteration. Many application and database vendors provide features that allow IT departments to implement controls to prevent fraudulent activities, but if the features are not deployed properly in the production copies, the risk of theft or tampering still exists.
Examples of controls available in database applications include encryption, digital certification, read-only mode and auditing features. Many of these controls, if deployed improperly, may have adverse effects on application performance. The controls may also increase the cost of the application if the features incur additional license fees. To mitigate performance implications, IT departments may upgrade application servers by increasing the number of CPUs, also driving the total cost of ownership higher. When evaluating the type of information stored in these database applications, implementing these controls on all data in the database may not be necessary. Deploying data classification policies on the database data addresses many of these issues.
Data classification for structured data requires a deep understanding of the database schema - the table structures and the interrelationships between the tables - and the application logic - mapping business policies to how the data is stored and manipulated. Tools to assist in the data classification for structured data require three components: object definition, criteria and policy. Each component is described in more detail below.
The object definition encapsulates which tables of an entire database application represent an encapsulated business object. For example, a general ledger (GL)transaction does not include every single table in a financial application. Rather, it includes an organization identifier, balances, journals and a booking period (i.e., month-year). The object definition includes all of the tables in the database that contain these four components. Another example of an object definition for a patient record in an electronic patient record database includes the patient's personal information, symptoms, diagnosis, prescriptions and physician notes. When database data is classified, the object definition translates into the SELECT statement in a structured query language (SQL) query.
A criterion defines how the data is classified within the tables of an object definition. Continuing the example of the GL object definition, criteria are defined as those transactions in the GL tables where the booking period is closed. When businesses close their books, there is a process where all transactions are reconciled. When the closing process is complete, the transactions in the GL tables should be placed in a read-only mode, and controls should be in place to prevent modification. The criteria for identifying closed GL transactions are defined as those transactions for a particular booking period where the booking period is closed and the status of the booking period is stored as a value or a combination of values in a database table. The criteria translates into the WHERE clause in a SQL query, SELECT all data from the tables in the object definition WHERE the booking period is closed.
The policy involves mapping the business context with the object definition and criteria. Examples of policies are a data retention policy - how long is this data required to be online or available; a security or audit policy - who should have access to what data and how is the access event tracked and audited; an availability or disaster recovery policy - how available should this data be in the case of a disaster or equipment failure? Policies, also called service level agreements (SLAs), vary across industries, corporations, departments and types of data. It is important to know what policies are required due to government regulations and what policies are required for corporate best practices.
Continuing with the GL example, an Internal Revenue Service (IRS) data retention policy for GL transactions in the United States is seven years. All GL data needs to be kept available for seven years. In Germany, the GDPdU requires financial data to be stored for ten years. Another example of a policy for patient records is all patient record information needs to be stored in an environment where the access is controlled under the HIPAA and the information is retained and available for the life of the patient. When database data is classified, the policy is translated into parameter values in a SQL query that place the data in buckets in context to the business policy to be executed. The business policy may change over time as new laws and regulations are created or updated. It is important that the data classification tool to be used provides an easy way to apply changes of business policies to previously classified data. For example, if the data retention of GL data increases in the U.S. from seven years to ten years, the data retention controls at the storage tier need to be updated to reflect the change to the data the retention period.









