JAN 1, 1999 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

Achieving Real-Time Agility with Operational Warehousing
June 21, 2012
Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand

Proactive Performance For the Data Warehouse

Print
Reprints
Email

Diseases are named either after the doctor who discovers the disease (Alzheimer's, Crohn's) or after the unfortunate patient on whom the disease or disorder is diagnosed. (There is an Eastern European curse, "May you have a disease named after you.") Baron von Munchausen was unfortunate enough to have given his name to a debilitating disorder. His disorder was psychological. He believed he had a variety of diseases and, in fact, displayed the symptoms of those diseases but with no underlying illness. His legacy is the Munchausen Syndrome.

A more complex and unfortunate syndrome is Munchausen by Proxy in which a parent causes his or her child to become ill or injured, and the manifestation can then be treated medically. The parent becomes the center of attention as the child is cured and all is set right again. A more benign version of Munchausen by Proxy is a form of neglect (malnutrition, no immunization, etc.) that most probably will require medical care.

An extreme example of Munchausen by Proxy in IT was the case of a DBA who set database parameters so that when volumes exceeded an anticipated threshold, performance became intolerable. This DBA would then come in and fix the problem. He was then heaped with accolades, which he otherwise never received. He had set up the problem and was now getting the same attention as the parent in Munchausen by Proxy would get.

Gerald Hodge, the president of HLS Technologies in Sugar Land Texas, specializes in anticipating performance problems in the DB2 world. He points out that most organizations seem to have no interest in keeping performance problems from happening - only in fixing them once the problems occur. This is a form of benign neglect, somewhat akin to not immunizing your children.

Performance service level agreements (for example, "two seconds or less for 90 percent of the transactions") are the norm for large organizations that recognize the heavy costs of terrible response time. Hodge observed, "The main problem with service level agreements is that they have no economic support. If they are not adhered to, someone may lose his bonus, but no money is put in play the solve the problem." This goes to the heart of the problem; few organizations are proactive about poor performance, but only wait until response time become unacceptable.

Hodge also addressed threshold management. "If the service level is sub-second, then tuning should start well before the [sub-second] threshold is reached." There are early warning systems that can alert the installation to impending doom long before the users notice anything. Only some organizations have established service level agreements for data warehouse ad hoc queries as the access paths and the resources used can be highly variable. Some have established benchmarks for selected predefined queries, and the response times on these queries are monitored.

Data warehouses have their own set of performance problems, albeit somewhat different than those in operational systems. When left to fester through benign neglect, they can lead to the failure of the data warehouse.

Let us take a closer look at some of these problems. Data warehouses are becoming far too large. In many shops with a mature data warehouse, their sizes exceed those of the operational systems.

Queries and reports are not as evenly distributed as they are in operational systems. For example, for financial and sales data warehouse queries, the activity is usually concentrated around month end and the beginning of the month.

Query volumes are difficult to predict. Users have a difficult time estimating how many queries they will run. In fact, if the data warehouse is successful, users will generate far more queries than they might have predicted. Interesting results will raise new questions that need to be answered, and more queries will be generated.

It may even be difficult to get a good estimate of the number of users. As the data warehouse becomes more popular and the word spreads, more and more users will log on and try it out, at first with simple queries. These simple queries may quickly grow into long, complicated and resource-intensive queries. If access to the database and user activity is not monitored and analyzed, these users can remain incognito for quite awhile.

The access patterns of ad hoc queries are difficult to predict. A new query may cause an access for which no index has been created, and the result may be a long-running sequential job with some expensive joins.

The load/refresh/update time often exceeds the allowed window. This has been one of the major performance problems with the data warehouse as the number of source files increases, as volumes get larger, as the data cleansing becomes more extensive and timeliness requirements become more stringent. This is also a reason why hardware capacity planning for a data warehouse is difficult and cannot be considered as definitive as capacity planning for an operational system.

Poor performance does not appear all at once. The problems gradually increase in size and magnify their impact on the system. The timing of when these problems surface is predictable, but only if the performance of the system is measured. A surprisingly small number of data warehouse installations measure performance. The issue is usually one of lack of management awareness, concern and attention.

The following steps should be considered to keep your data warehouse from becoming ill. Sell management on the need to have good performance in the data warehouse. This may seem superfluous, as we assume that management would always recognize the need. Usually, management will give it lip service but will not allocate the right resources to make it happen.

Sell them on the idea of being proactive to eliminate performance problems before they become apparent to the users. You may also consider what some organizations have done, which is to quantify the cost of poor performance and communicate that cost to management. Establish the role of a person responsible for data warehouse performance, usually a DBA. This person would work closely with people responsible for the hardware, the operating system and the network.

Keep the data warehouse from growing uncontrollably. See "The Data Warehouse Database Explosion" in the December 1996 issue of DM Review.

Work with your performance management tool vendors, query/report tool vendors and DBMS vendors to provide a robust monitoring capability. You want to know who is generating the queries and reports, how many they are generating, CPU time, number of I/Os, time of day the query/report is executing, what data is being accessed, the access path, the size of the result set and the response time.

Set some series of service level agreements (SLAs) - even if they are only internal IT agreements - for performance. These would include SLAs for benchmark queries and reports that are: 1) simple, 2) medium and 3) complex. These benchmark queries would have known characteristics for CPU time and number of I/Os. Do not try to establish an SLA for undefined ad hoc queries, as there is no way to know if they will generate ten I/Os or ten million I/Os. In the class that trains the users responsible for generating ad hoc queries, include a module on performance that explains the use of indexes, how to avoid Cartesian products, the impact of asking for more than what is needed, the use of meta data and the use of existing canned queries/reports.

Establish a canned query/report library with a responsibility for its administration. This responsibility would include an acceptance and performance test prior to the query/report being added to the library. Be sure to communicate to all users that such a library is available and describe all the queries/reports in it. This would significantly improve the use of the library.

Establish lines of communication between the power users and the person responsible for data warehouse performance. The power users would indicate the intention for access to new data or to data in a different access pattern. This would give the person assigned to data warehouse performance time to estimate the impact of the new query and determine if anything should be done, even before it is run for the first time.

Be sure the database subsystem (DB2, Oracle, etc.) has the proper attention. DBMSs have a variety of tuning knobs, choices for sizes and numbers of pools, initialization parameters, free space management, indexes, partitioning, disk configuration and data set placement that will have a marked impact on performance. There are ways to encourage efficient access paths - those paths being entirely dissimilar between data warehouse and OLTP. Since the data warehouse and OLTP environments have such different characteristics, well-tuned and well- configured DBMS subsystems will have significant differences between their data warehouse and OLTP versions. Most large organizations have people or even whole departments devoted to controlling performance. They are responsible for estimating performance, capacity planning, monitoring performance and tuning and correcting the problems. They work closely with other members of the organization who are directly responsible for the system, the database and the network. Some of these organizations are now focusing on the performance of the data warehouse.

A healthy data warehouse can remain healthy, but only if IT takes proactive steps to monitor, measure and tune the data warehouse databases. In addition, there needs to be a mind-set and standards (SLAs) for excellent performance where activity should be invoked prior to hearing from complaining users.

Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or visit his Web site at www.sidadelman.com.

Larissa Moss is founder and president of Method Focus Inc., a company specializing in improving the quality of business information systems. She has more than 20 years of IT experience with information asset management. Moss is coauthor of three books: Data Warehouse Project Management (Addison-Wesley, 2000), Impossible Data Warehouse Situations (Addison-Wesley, 2002) and Business Intelligence Roadmap: The Complete Project Lifecycle for Decision- Support Applications (Addison-Wesley, 2003). Moss can be reached at methodfocus@earthlink.net.

Filed under:

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.