Continue in 2 seconds

My customer information is spread over multiple systems on different platforms.

  • Joe Oates, Sid Adelman, Chuck Kelley, Clay Rehm, Les Barbusinski
  • February 03 2003, 1:00am EST


My customer information is spread over multiple systems on different platforms (Windows, UNIX, mainframe). The requirement is to create a customer data warehouse for dimensional analysis. Once the data warehouse is completed, the user group would like to use it as a master database for a single version of truth. They would like to change the customer information first in the data warehouse and then populate the corresponding source system for that particular customer. Does this make sense? What type of architecture should one follow? What tools are available? What is the possibility of using a tool like MQ series?


Sid Adelman’s Answer: A single version of the truth makes sense and since your customer data is spread, your challenge will be integrating those different systems and reconciling the differences. This is hard, laborious work and the tools won’t help you much. There will be inconsistencies, different formats, very different rules, valid values, and standards from one source to another (they were probably all created differently at different times by different people, each with their own idea about how best to do it).

Most organizations take a different approach from the one you suggest. They create that single version of the truth (SVOT) in a staging area, maybe an ODS, and then populate the data warehouse. Rather than populate the source (operational) systems with customer data, why not have these operational systems directly access the SVOT customer data directly? This means you will only have to maintain customer data in one place but it will require changes to those operational systems.

Clay Rehm’s Answer: What you are trying to build is really a customer relationship management (CRM) system. It would be best to design an operational type database or operational data store (ODS) that houses the current customer data and have that feed the data warehouse.

Les Barbusinski’s Answer: Providing a "back feed" of cleansed and integrated information from a data warehouse to operational systems is actually a very common practice. This is especially true of Customer data. It’s a good way to leverage the power of the DW and gain new economies. Once a DW has done the hard work of cleansing and integrating Customer data from a variety of operational systems, it’s a relatively simple exercise to add a few online inquiry/update functions to allow direct manipulation of the data, and a data distribution module to disseminate the integrated Customer information back to the operational systems.

As to architecture, there are two pieces to consider: 1) the online inquiry and maintenance functions, and 2) the data distribution functions. For the first, consider employing a Web services approach using either J2EE or .Net technology. Design and develop a set of Web service functions that can query and/or update various portions of the integrated Customer record in the DW, while enforcing all relevant business rules. This will, in effect, provide you with a platform-independent mechanism for allowing any Web portal and/or ERP application in the company to retrieve or update Customer information in the DW in a consistent manner. Two caveats: 1) make sure your Web service functions provide adequate security provisions, and 2) limit your update functions to dimensional data (i.e., allowing direct updates to facts and metrics in a DW is very dangerous).

For the data distribution functions your best bet is to employ a "message bus" architecture utilizing message oriented middleware (MOM) software such as MQ-Series or Sonic-MQ. Design and develop a set of XML-based messages and publication processes that will notify subscribing applications of new and/or updated Customer information in a near real-time basis. This approach has several advantages including guaranteed one-time delivery, asynchronous processing, automatic queuing, etc. Hope this helps.

Chuck Kelley’s Answer: From what I understand you want to clean the data before it goes into the data warehouse (this is good), but then you want to change the data in the data warehouse (this is not good) and move it out to the source systems. I would strongly recommend that you first build the single version of truth in the data warehouse and then you would have nothing to change in the data warehouse. Afterwards, if you want to update the source systems with the new data, I think you have to build a process to do that. There will need to be close consultation with the development groups from your source systems, but there is certainly nothing wrong with that (with the exception that some places require exact audit trails, so it may be harder than you think to do!).

Joe Oates’ Answer: Certainly, a single source for customer information is very desirable. However, it may be unwieldy to set up the single source and populate it directly. This would require changes to many systems.

What has worked well for several of my customers is to keep the add, change, delete functions in the legacy systems and do the normal cleansing and merging steps in the ETL process but to add an output from the data warehouse for every system that would be affected by a change in the specific format of each legacy system. The staging area should have a cross- reference table that would identify every legacy system that had the person or organization as a customer. You would probably not want to send new customer records to legacy systems that did not have the person or organization as a customer.

The advantage of this approach is that there is no effect on the legacy systems. Of course when the data warehouse is initially loaded, a lot of records would be sent to the various legacy systems because each may have been updated at different intervals with different information. This implies that a good deal of work will have to be done to identify the authoritative source for customer information.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access