Continue in 2 seconds

What is the best solution for a data capture issue, where the source tables don't have proper timestamp capture for the updates?

  • Chuck Kelley, Tom Haughey
  • April 10 2006, 1:00am EDT

Q: We are in the design phase of a data warehouse, stuck with a change data capture issue, where the source system is JDE and the source tables don't have proper timestamp capture for the updates and inserts, which deprives us of proper incremental load capture. It is not an option to use the source transaction logs for change data capture. A third option of full-table comparison leads to major overhead, so even that is ruled out. What would be the best solution other than the three options above? Do you have any suggestions regarding the "check sum" method? How feasible is the method and how would we achieve it?

Chuck Kelley's Answer:

I did some testing and wrote about the check sum method about three years ago. I think that it works and may be the only solution you have left to do. You may be able to use a different algorithm (I like the MD5) to do this. Also, the full table comparison should not be ruled out. There are great techniques that can deal with this, unless you are running on a Windows server.

Tom Haughey's Answer:

Figure 1 is a summary of different change data capture methods, modified from Microsoft. The check sum method is not among them. Perhaps you can find an appropriate method that will suit your needs.

Figure 1: Data Capture Methods

In the check sum method, you create a hash check-sum for the existing records and stage it. You only need to stage the key and the check-sum for the existing records, say a dimension. A check-sum is generated for each incoming record, based on columns whose change you care about. It is possible to generate a check-sum on the entire record as long as you care about any change in the entire record. The new check-sum is compared with the staged one. Clearly, if the input extract is huge, there is some level of performance issue in generating the incoming check-sum. If however the keys of both are indexed, and the check-sum included in a covering index in both, performance of the compare can be improved. The bottleneck is the volume of input records and the number of columns you need to use to generate the check-sum, which could only be those whose change you are interested in.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access