Continue in 2 seconds

My company is now investigating whether you could use one database model and/or one database for both OLTP and OLAP worlds.

By
  • Sid Adelman, Les Barbusinski, Scott Howard
Published
  • May 03 2002, 1:00am EDT
More in

Q:

My company is now investigating whether you could use one database model and/or one database for both OLTP and OLAP worlds. Can you give me your opinion? Am I living in the past? Could we do justice to both worlds with a single model and/or database?

A:

Sid Adelman’s Answer: You are right! You cannot use the same database or even physical database model for both OLTP and any data warehouse (including OLAP) for the following reasons:

  1. The designs are different. Trying to develop a design to satisfy both will be a compromise neither will like and the performance will be bad for at least one of them.
  2. OLTP and data warehouses have different timeliness requirements. You do not need real time data for a data warehouse that you do for OLTP. In fact, analysts do not like a changing data warehouse.
  3. A data warehouse query can sometimes suck resources to such an extent that you may severely hurt OLTP response time. Once that happens, the OLTP folks will kick you off their database.

The data warehouse has more stringent data quality requirements than are required in the OLTP system.
Les Barbusinski’s Answer: A "one size fits all" approach to database design never works. The database structures for OLTP and OLAP are totally at odds with each other because of the nature of the systems they serve. Bill Inmon covers this very basic dichotomy in his landmark book, Building the Data Warehouse.

Whereas OLTP database structures:

  • Are "atomic" (i.e., detailed)
  • Are transaction-oriented
  • Represent the current state of an entity
  • Serve the clerical community
  • Serve well-defined processes

OLAP database structures:

  • Are aggregated and/or summarized
  • Are analysis- oriented
  • Represent a historical view of an entity
  • Serve the management community
  • Serve undefined ad hoc processes

As the saying goes: "form follows function." A database structure must reflect the function it is intended to perform, or it will not work.
Scott Howard’s Answer: Single Database? Possible. Single Model? You are living in the past.

Let's start with the easy part, single database. It is possible to have a single database engine, especially a parallel RDBMS handle both OLTP and OLAP needs. However, this is seldom recommended because the two workloads are very different. OLTP systems usually have a high constant transaction rate usually consisting of very simple read/write transactions. Systems administrators tune these systems to take most advantage of the resources available at constant rates, thus drive CPU and I/O usage as close to 100 percent as possible. This is in contrast to OLAP systems use which is inconsistent, primarily long- running and complex read-only transactions. This pattern leads to peaks and valleys in resource usage that when combined with OLTP usage can cause usage spikes well over resource capacity. These spikes can result in service-level violations for your OLTP system. Now this is a general scenario that may not apply to your specific implementation, so that's why I reserved judgment and claimed it's still possible to combine systems.

You can't combine models. The OLTP model is one that is intended to capture and efficiently manage the current state of your business. Short-term transactions, current inventory, monitoring current manufacturing processes and the like are the focus of most OLTP applications and systems. OLAP systems represent history and need to function in a way contrary to OLTP systems. That is they need to capture everything that goes on within our business including the net business result of an OLTP transactional update or delete, and represent and preserve that net meaning in a historical model. They also need to combine that with external events (promotions) and special external events (holidays, weather, manufacturing floor conditions, economic conditions etc.) so business analysts can make sense of the changes in our business captured from the OLTP models. These external events are also not generally represented in the OLTP models. Now we don't have room or time today to expand on how to do just that, but that's what OLAP or data warehouse modeling is all about and why if differs so from OLTP modeling.

Mike Jennings’ Answer: In most cases combining OLTP and OLAP traffic to a single database structure would be a mistake. Assuming that your OLTP application is running some segment of your company’s business, you risking impacting its performance and ability to quickly process transactions by combining OLTP transaction processing and reporting with OLAP. In order for transactions to be processed efficiently, the data store would have to be in third normal form. This construct works fine for transaction processing but is inefficient for OLAP queries due to the excessive amount of joins that will be required to answer business questions. Both OLTP transactions and OLAP reporting will be competing for disk I/O which will degrade performance. End users running OLAP queries will experience ever- changing result sets of information in their queries as OLTP transactions are processed throughout the day. Aggregation, calculations, derived data and multipass processing will have to be performed during an OLAP query further degrading performance. In many cases, such as ERP systems, operational reporting against an OLTP system is performed in a secondary data store separate from where transaction processing occurs just to avoid the performance impact of operational reporting. Your company may decide to go down this path initially to save money but will quickly see the need to create a secondary data store for OLAP in order to be able to analysis strategic information in a efficient manner.

Chuck Kelley’s Answer: You are living in the now. NO, you should not do both in a single database. MAYBE you can use the same data model, as long as your model deals with historic views of data as it changes over time (which it probably doesn’t – very few do). Vendors of middleware talk about this all the time, but there are some problems as I see it. Here is an excerpt I wrote as a Letter to the Editor of Computerworld published November 20, 2000.

  1. "Do you really want hundreds of end users doing analysis of millions of rows asking queries into your transaction system, which is probably already undersized? I think not.
  2. Do you have multiple definitions of the same object (Gender = M/F; 0/1; 1/2)? If so, do you really want users to be interpreting these in the product each time a user runs a query? I think not.
  3. Do you have multiple applications that have different definitions for customers using different data types? If so, how is that handled within the product you are using? Do you really want that product to interpret "12345" to be different things to different systems each time a user runs a query? I think not.
  4. Do you really want to keep 10-plus years of history in a transaction system, slowing it down? I think not.
  5. Are your measurements in different metrics (currencies, metric vs. U.S. measurements)? Do you really want conversions on the fly? I think not.
  6. Do you really want to process the same set of requests every time a user issues a query? I think not.

Granted, if you have a small single application that has an integrated environment (as very few do), then these products may work (though I would still be leery because of No. 1 above)."
Of course, then my last statement has to discuss tuning. How do you tune an operating system with applications that does five reads, seven writes (typical transaction) and reads 100,00 rows and aggregate (typical data warehouse) at the same time? OK, if you could get past the operating system (Yes, I know all about MVS and the ability to run multiple versions of the OS, but there are some major limitations!), how do you do that for the database?

Well, I guess you can tell I have strong opinions on this topic.

David Marco’s Answer: You can do justice with a single logical model; however, different physical models will definitely be needed. The key to managing all of this data is a meta data repository. It is the system that manages your systems.

Joe Oates’ Answer: The simple answer is that you should have a separate database and machine for transaction systems and analytical systems. Data warehouse analytical reporting can often saturate I/O channels. This would certainly have a severe impact on transaction processing, especially OLTP. The same can be said of running a lot of reports while the OLTP system is up and running. I have seen many cases where the volume of operational reports made it necessary to duplicate the database on another computer and run reports only from that computer.

There are a couple of less desirable alternatives to the two separate machines. First, some of the larger hardware platforms can be partitioned so that a certain group of processors can be dedicated to OLTP and another group of processors can be dedicated to the data warehouse or other reporting functions. However, depending on the architecture, there still might be adverse impact on the OLTP systems because of the high I/O requirements of the data warehouse or other reporting requirements.

Second, analytic reports could be run at night when the OLTP systems are not running. However, this would probably interfere with nightly batch processing. Also, most employees would not be willing to come in at night to run ad hoc queries.

Clay Rehm’s Answer: In a perfect world, there would be one data model and one database. However in the operational world, companies are bought and merged, operational systems are retired, enhanced or newly built and it would be impossible to have a single database. The beauty of a data warehouse is that it is a separate database that integrates all of the operational databases into one, and it is designed for ad hoc query performance, not OLTP transaction update performance. I know there are RDBMS vendors who are working on improving their database system to handle both; however I am not sure we are there quite yet. And even so, for the reasons stated above, it just does not make political or financial sense.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access