Continue in 2 seconds

My question is in the realm of operational reporting.

By
  • Chuck Kelley, Clay Rehm, Les Barbusinski, Scott Howard
Published
  • October 09 2003, 1:00am EDT

Q:

My question is in the realm of operational reporting. My client (a state government) does not want the expenses of an ODS or a data warehouse, but they want to be able to copy the production database, index a few fields to make the reports run faster and run reports on it. What tools are there to make copying the database easy? Are there any tools which can just apply the changes on a day- to-day basis instead of having to copy the complete database?

A:

Les Barbusinski’s Answer: There are tools that will accomplish what you want, but they’re expensive. Given that your client wants to minimize costs, you may want – instead – to utilize one or more of the data management tools embedded in your RDBMS. These fall into three categories: data copying, export/import and mirroring.

Data copying involves using either a COPY utility (not available in all RDBMS packages) to refresh a target table or simply truncating the target table and performing a bulk insert using the following command:

SELECT * FROM INSERT INTO

Banal as this may seem, the bulk insert approach works surprisingly well as long as you drop the indexes on the target table before the operation, and re-create them afterward. Indeed, I’ve seen this approach copy well over 100 million rows in under 20 minutes on an Oracle database using the /*+ APPEND */ hint. Furthermore, both the COPY utility and the bulk insert approach allow you to limit the number of columns that are copied.

Oracle has one additional approach to maintaining table copies called "materialized views." The concepts are too complex to cover here, but it provides an efficient mechanism for maintaining customized copies (rather than exact replicas) of various source tables. If your client is an Oracle shop, this option is well worth investigating.

Export/import involves using an EXPORT utility to dump a source table into a flat file and an IMPORT utility to load the contents of the flat file into a target table. Handy as these utilities can be, they’re usually S-L-O-W … so limit your use of this approach to small tables only.

Finally, mirroring involves creating a mirror of a selected database, then splitting it off and mounting it as a separate and static read-only database. Once mounted as its own database, unused tables can be dropped, views can be added and tables can be re-indexed to optimize reporting queries. On a nightly basis, the read-only database is remirrored (i.e., synchronized with the operational database), split-off and remounted as a separate database. DB2 calls this cloning, SQL Server calls this server-less snapshots and Oracle calls this Dataguard mirroring. This approach is a little involved and will tax your DBA’s abilities, but it can be a very fast and efficient means of copying an operational database.

Please note that none of these approaches performs "net change" updates to table copies, but their speed negates the need for such updates. Hope this helps.

Scott Howard’s Answer: You should look at the replication tools provided by each production database vendor. These solutions are usually low cost, if any, and very capable of recognizing changed data in a way that’s non-disruptive to the operational sources. Working with these sources in a non-disruptive manner, that is not effecting source transactional throughput, is a requirement. An example of this would be Data Propagator, a tool set provided at no additional cost to DB2 on UNIX, Linux and Windows users. Oracle and Microsoft also have equivalent replication technologies.

If these tools are not available to you or your vendor wants to charge you additional fees for their use, you could consider poor- man’s replication. Simply create a new timestamp based column on all your source tables of interest, setting that column to NOT NULL with default value of current timestamp. This requires no changes to the source application and will systematically timestamp each change with the current timestamp. You then would periodically copy the rows with a timestamp greater than the last copy time over to your reporting system. Now this is not a fool proof system managed method like those provided by the replication tools, so you’d need to recognize that you’d miss changes to hot spots, that are rows that change multiple times or appear then disappear between your copy cycles and that you’d miss deletes. You’ll need to accommodate these problems in addition to figuring out how to handle in-flight transactions that could be copied over should you choose the non-disruptive dirty read on your source tables. See why I recommend the tools approach?

Chuck Kelley’s Answer: this is the old: Let’s not do it right the first time, we can always do it over syndrome. Michael Schiff did a presentation in the fall of 1990 about the evolution of the data warehouse, and Phase 2 (out of 6) was about how IT copied the production database to another place and used that as their reporting database. As the advertising suggests (paraphrased), "We have come a long way, baby." The problem that I have with this approach is that there is no standardization and integration process being put on the data. OLTP databases were created to get the data in as fast as possible, hence the ability to duplicate customers, have bad referential integrity, etc. Data warehouse databases are about creating an accurate place to query data that is integrated, accurate, and historic in nature. Copying the OLTP database from one system to another every day does not work.

However, if you really have to do this, you could use replication, bulk copies (i.e., export/import, bcp, etc.) or approaches used by databases for failover – journal shipping, store and forward, etc. Hopefully, you won’t do this on a single machine! The performance will be at best, abysmal.

Clay Rehm’s Answer: I have been in similar situations. If you don’t want to copy data, have you explored views or materialized views?

Are there one or two tables that are really the performance problems that could be redesigned, and thus made to query faster?

If you are looking to copy only changes and do not have the budget to purchase ETL tools, your best bet is to create or modify existing programs using company approved programming languages. This includes SAS, COBOL and RDBMS Utilities.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access