Continue in 2 seconds

Tuning the Application-Centric Data Warehouse

Published
  • June 01 1998, 1:00am EDT

Database performance seems to be the "black box" of business applications. When discussing the topic of application performance, you invariably hear: "Well, it's been in production for almost three years!"

Data warehouses are instrumental in the consolidation of business information across the enterprise, allowing information sharing and providing for decision support. The performance and availability of the underlying database is critical to the productivity and efficiency of the business personnel and applications that access it. This is also true for operational databases across the organization. One of the major benefits of using a data warehouse in an open systems environment is the ability to integrate both homegrown and third-party application information into the enterprise automation scheme. As applications grow and the number of users and amount of data increase, it is critical that a database continues to be tuned for optimum performance.

There are three steps that can be taken for dramatic performance improvement of production data warehouses and operational databases accessed by business applications.

1. Tune "offensive" SQL statements.

Experience has shown that a small number of SQL statements are responsible for most of the database activity. The first tuning effort should, therefore, focus on that small number of "offensive" SQL statements, analyze what they are doing and take corrective action. This eliminates the need to learn and understand every program in its entirety.

Unfortunately, the collection and analysis of currently executing SQL statements is not trivial. The amount of work required by an SQL statement seems to be in direct relation to the amount of "logical reads" and "disk gets" involved in its execution. Offensive statements could be either short-running statements executed many times or one long-running statement. But identifying the amount of work is not enough; you also need to correlate the amount of work to the execution path. It is both complicated and time-consuming for a DBA or developer to capture and analyze every SQL statement in this way. Instead, commercial tuning solutions can be used quickly and easily, with great results.

2. Tune long-running applications.

Once the most offensive SQL statements are either tuned or accepted for what they are, you need to look at individual applications and long-running batch jobs with poor response times. You can try using trace options. However, some applications may not support dynamic tracing, and you do not want to have to alter production versions of applications. When multiple developers and batch jobs generate trace information, the amount of output produced can be so large that it is almost impossible to correlate and sift through the information. Again, by using a commercial tuning solution, you can specify criteria to retrieve information on only the most active SQL statements, as they are executed, without putting a heavy burden on the system.

3. Tune the access to individual tables.

In every application there is a set of common tables used throughout the application. The access to these tables should be monitored regularly to determine how tables and their indexes are being used. The location and condition of tables and indexes should be reviewed periodically to ensure load balancing and to avoid I/O bottlenecks. Most database vendors do not provide ample facilities to monitor the relationship between the SQL statements and the tables being used. A good tuning tool will solve this problem.

When working through the above steps, a number of common areas typically cause the biggest part of the problem in tuning production databases and the applications that access them. These include:

Missing indexes. The number one cause of poor performance is missing indexes. Missing indexes mean that simple accesses to a table result in full table scans, which are both time-consuming and result in excessive I/O. Missing indexes can generally be identified by collecting and analyzing the most "offensive" SQL statements.

Indexing scheme does not match the business use. The original indexing scheme may not match your ever-changing business profile. As tables grow and applications mature, the selectivity of the indexes changes. By tracking the distribution of work among all access paths to a specific table, you can find which indexes are used the most. To accomplish this, all the SQL statements that access many of the main tables can be collected over a few days and analyzed. Typically, some tables have too many indexes and, in many cases, nobody knows why these indexes were created. Indexes that were rarely used, but not required, can be dropped. Other problems result from erroneous SQL statements or an improper choice of primary key columns during database design. Also, if indexes contain columns with few distinct values, it is better to put the non-selective part of the index as the last column.

Table stagnation--the "high-water mark problem." The high-water mark of a database segment is a pointer to the last occupied block in the last extent of the segment. When a table is scanned, some databases only read through the high-water mark, rather than through the last extent. However, if all the rows are deleted from the table, the high-water mark is not adjusted. The result is that the table is actually much smaller than it appears to be. The high-water mark can be reset by truncating the table or by recreating it. Tables used as temporary data containers within a batch process are especially susceptible to this problem. Such tables benefit from periodic reorganization or the reorganization script can be incorporated into the end of the batch job.

Index stagnation. Since indexes are always sorted, every update to a table that modifies indexed columns translates to delete and insert in the index. The freed space in the index block may not be used again, and the block is split to add the new index value. A good performance monitor will indicate high database activity when accessing a table, even when only a few rows are retrieved. In many cases, many of the index blocks are empty. Again, index reorganization should result in much less overhead on the system.

I/O bottleneck. After the "offensive" SQL statements are tuned and many of the tables and indexes are reorganized, poor response time may still be present when accessing some of the large common tables. A good SQL tuning tool will indicate if most of the I/O is directed to a small number of tables and indexes. Additionally, a good diagnostic monitor will show if the I/O rate to the disk is approaching saturation. Scripts can move these tables and indexes to dedicated tablespaces on separate disks. The most active tablespaces can be put on RAID disk utilizing raw devices. This should alleviate the response time problem.

Multiple database writers. When monitoring the system during peaks of activity, database user activity can "hiccup," suddenly freezing for a fraction of a second and then continuing. By increasing the number of buffer writers, the "hiccups" usually disappear, the buffer busy waits are reduced and more batch jobs can run simultaneously.

Chaining. For tables with rather high activity, reorganizing the indexes may not help very much. A good diagnostic monitor will identify if there were many chained reads when accessing those tables. Once again, reorganization of the tables will solve the problem. For 24x7 applications, it is important when choosing a table space reorganization tool that select table spaces may be reorganized without requiring the entire database to be stopped.

Tuning views. Many implementations use views to simplify complex database access. Once again, it can sometimes be difficult to obtain view definitions when needed. The idea is that if some of your "offensive" SQL statements use views, by tuning the view you can improve performance significantly, without changing the application source code.

Tuning production and operational databases can be likened to walking on a tightrope. You don't want to make changes that shake the stability of the system, but you need to constantly improve performance where possible. The three-step approach outlined in this article can accomplish just that. Unfortunately, the tools provided with a standard installation of your database may not provide all the capabilities necessary to accomplish the job. Therefore, third-party tools are needed to pick up the slack where database vendors leave off. An experienced DBA can do much of this work; but with the drastic shortage of DBA resources, every minute of a DBA's time needs to be utilized to the fullest. Routine tasks should be automated whenever possible, and less experienced DBAs need to be able to perform many of the tasks of more senior personnel.

By having a handful of good, solid database utilities, DBAs can extend the reach of existing resources and achieve increased system reliability and performance. As a result, management reaps significant productivity gains across the company. For the quickest return, the DBA toolset should start with: a good SQL tuning solution, a diagnostic response time monitor, a tablespace reorganization tool and a comprehensive impact analysis solution. With these tools, DBAs can open the "black box" of packaged applications and aging homegrown applications, and easily look inside the guts of the database. By doing so, they can learn a great deal about what the programs are doing and how the tables and indexes are being used. These tools will not only increase database and application performance, but they will also improve the overall quality of your database environment.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access