Question: I'm trying to implement change data capture (CDC) on the Oracle Source Database to achieve the real-time staging database and loading this changed data into a specific "data layer" through the extract, transform and load (ETL) tool for the intra-day reporting. I found two types of methods for CDC: synchronous and asynchronous. Asynchronous further classified into four methods: hotlog, distributed hotlog, autolog archived and autolog online.
I've implemented all the methods on small source databases, but Im not able to judge the performance of the individual methods.
Now, my query is: Which method is the best to achieve my goal (staging database with changed data) taking into consideration the issues of performance, latency, impact on source database etc.
Chuck Kelleys Answer: I think that it depends on your requirements and how long before it has to be in the data warehouse. If it must be immediate and always in sync with the source system, then use synchronous. The negative is that it will slow down your source systems, since you are in effect applying a two phase commit between source and staging. I have found that, in most cases, distributed hotlog is probably the best of the four asynchronous methods.
Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at chuckkelley@usa.net.










Be the first to comment on this post using the section below.