Free Site Registration

Avoiding Data Warehouse Performance Detours on the Road to Data Security

Information Management Magazine, October 1, 2008

Dan Sandler

The information in a well-conceived data warehouse is like a must-have piece of furniture - once you pull the trigger on the set you'd like, the furniture must be transported, stored and on display for everyone to see as soon as possible. At the same time, the process of transporting, storing and displaying the furniture requires the utmost precaution and care to protect the furniture as well as the surrounding infrastructure. The essence of the information and furniture is the same – you want to both protect and use it as soon as it arrives.

 

This article attempts to separate measures that can disrupt the balance of speed and protection during the transport, storage and use of the information in a data warehouse environment. Figure 1 illustrates the typical data flow from the source through the data warehouse environment and highlights the data security threat level (during transport (i.e., from the source), storage (i.e., the data warehouse/marts and cubes that may be points of exposure for unencrypted sensitive data) and use (i.e., business functions), assuming data is secured at the source. For the purpose of this article, threat level is assessed in terms of the ability to secure the data in the particular area of the environment versus the potential for theft.

Advertisement

 

 

A Prime Target

 

A mature data warehouse is a prime target for any data thief. It contains enterprise-wide data, integrated and modeled according to enterprise business definitions. The breadth of the data is immense. It includes industry agnostic yet data-sensitive areas like HR, finance, customers and other subject areas that are increasingly sensitive, such as insurance claims. The depth of the data is equally profound, making the warehouse a target for such a security breach. The data warehouse includes transaction level detail, and historical snapshots of dimensional data.

 

Addressing data security is a must, but efforts to implement tight controls in the data warehouse can potentially have a negative impact on performance. Over time, a mature data warehouse has been optimized and tuned for common usage patterns. Indexes, partitions, 64-bit processing and other performance-boosting measures help the data warehouse contend with growing data volumes and expanding usages. But just when data warehouse managers have optimized its environment for internally driven business requirements, data security breaches from the headlines threaten to undermine prior performance optimizations.

 

Fast, Secure Transport

 

When delivering data to the data warehouse, there are several staging points en route to the target. Ultimately, each persistent staging point represents a security threat and a performance hit. In terms of performance, over-staging the data produces a high-degree of input/output (I/O) activity, which ultimately delays the transformation and delivery of the data. If the target staging area is a database, encrypting the data adds overhead to the overall processing and persisting sensitive data in the clear exposes the data in a staging area which typically is unmonitored. Thus, to avoid performance and security pitfalls in transit avoid over-staging the data.

 

In addition to limiting I/O intensive processing, limiting the data volumes in the extract, transform and load (ETL) pipeline constrains the amount of data exposed to risk and reduces the overall cycle time. To limit data volumes, change-data capture (CDC) should be implemented for all source data acquisition. CDC optimizes the overall processing time for obvious reasons – less volume from the source means the minimal amount of transformation and loading is required. While optimizing performance, CDC also minimizes the amount of data that is staged along the way. By persisting the minimum amount of data required, if unwanted users gain access to the data, it would be sparse. More importantly, not all the data would relate to each other if you consider that most slowly changing dimensions change infrequently and facts change independent of the dimensions. Thus, data thieves have to gain access to the data warehouse to gain knowledge of the complete transaction. If full refreshes are processed, data thieves have all the data required at a dimensional and transactional level.

 

Page 1 of 3.

Advertisement

Advertisement