NOV 14, 2008 10:36am ET

Related Links

Predictive Modeling Making Insurer Inroads
February 8, 2012
Biting the Bullet for a Core Upgrade
February 6, 2012
The CRM Shift
February 3, 2012

Web Seminars

Getting Started with Big Data
Available On Demand
Transactions & Interaction: The Correlation of Structured and Unstructured Data
Available On Demand
Deliver Better Enterprise Data through Better Reference Data Management
Available On Demand

Data Transformation in the Intelligent Enterprise

Print
Reprints
Email

Many companies, particularly those undertaking business intelligence (BI) initiatives, struggle with data issues. The problem is rarely a lack of data, but rather an overabundance. To make matters worse, despite decades of experience in using databases to manage information, disparate data is often still spread across the enterprise, which poses serious challenges. Thus, the difficulty business analysts struggle with when looking for the few gems of valuable information that will help propel the organization toward its goals is beyond “finding a needle in a haystack.” It is more akin to finding the head of a pin in a haystack and the shaft in a wheat bale and then assembling them into a completed pin.

 

Strategies that Create Disparate Data

 

The problem has many constituent parts that result from both legacy systems and current actions. Considering the legacy issues first, organizations still suffer the negative impacts of past efforts to decentralize processing. When low-cost servers and client/server computing entered the scene, many companies delegated some or, in a few cases, all IT decisions to user departments. This strategy had advantages. User departments were more likely to choose applications that actually fulfilled their true business needs, rather than the needs perceived by the IT department.

 

One disadvantage of a decentralized approach is that there often aren’t enterprise-wide standards. The “customer_number” field in one department’s database might be “cust_id” in another. One department might use Oracle, while another uses DB2. The engineering department and all European subsidiaries might use metric measurements, while the American marketing department uses U.S. measures. One department might store dates in a “yyyymmdd” format, while another uses “dd/mm/yy.” The list of possible disparities is almost endless.

 

Decentralized computing isn’t the sole cause of disparate data. For example, a company that adopts a new enterprise-wide database standard typically can’t replace all of its legacy databases immediately. Or, when mergers and acquisitions unite companies with different applications, they may need to be maintained until all of the systems are migrated, and a few applications from each predecessor company may be retained beyond that point. In addition, some organizations tolerate disparate data because they believe that the benefits achieved by using best-of-breed applications rather than a comprehensive, integrated application suite exceed the costs of having to deal with dissimilar databases.

 

The Need for Data Integration

 

Data format and platform inconsistencies are not issues as long as data use is isolated within individual applications. However, data needs to be aggregated, and discrepancies must be reconciled for enterprise-wide query, reporting and analysis. Furthermore, some operational applications require the integration of data from a variety of diverse sources. For example, an effective customer relationship management (CRM) application aggregates data from sales, accounts receivable, customer service and other databases to provide users with 360-degree views of customers and prospects. When integrating data, any incongruities need to be reconciled by transforming the data coming from one or more of the source databases to conform to the data schema and definitions used in the merged or target database.

 

Even in the absence of inconsistencies, data transformations are typically required to support BI initiatives. For example, in the name of efficiency, data may be denormalized in a data warehouse to avoid the frequent, massive table joins that business analysis tasks would require if they were run against databases structured to support transaction processing. Other transformations, which are common when moving data from production databases to data warehouses, include changing codes to meaningful labels to avoid the need for frequent table lookups and deriving search keys that will be useful to business analysts.

 

It’s important to note that even in the absence of a need for data transformations, when an organization decides to undertake ongoing comprehensive BI initiatives, a data warehouse that is isolated from the production databases is essential. Otherwise, the strain that business analysis places on the databases would cause transaction processing to slow unacceptably. Stocking this separate data warehouse is the job of extract, transform and load (ETL) or data replication software, which is also the most appropriate component of an information infrastructure to host data transformation functionality.

Filed under:

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.