Building the Performance Assurance Tests
Last month, I discussed defining the metrics that will become the basis for ensuring that your warehouse continues to meet its performance requirements even as it organically grows and evolves with your organization's changing needs. This month, I'll discuss how to measure the actual performance of your warehouse. There are actually a few different performance areas that must be tested, but since I already discussed the performance assurance metrics last month, I'll begin with those.
Benchmarking the Performance Assurance Metrics
To determine whether or not your system satisfies your performance metrics, you need to build some benchmarks. These benchmarks must be written to either replicate or simulate the specific aspects that you're trying to test. For example, to test load performance, you can use your real data and replicate the real load process to test how fast data can be loaded into your system. Or, to test the ability to support the required number of users, you can write a benchmark that simulates users executing various queries at a specified rate. These benchmarks are generally not complicated to build, and many can be built using simple scripting languages or by writing short programs in traditional programming languages. In fact, many organizations already have some benchmarking programs that they've used in the past, and these can be modified to suit the needs of the warehouse's Performance Assurance tests.
To be able to proactively test the scalability of your warehouse, these benchmarks must also be able to test different scales. That is, you must be able to test load times not only for X amount of data, but also 1.5X, 2X and so on. And, you must be able to simulate not just 25 users, but also 50, 75 and 100. Each time you run a larger scale test, you also want to add the appropriate amount of computing resources to keep performance stable. That is, if the workload doubles, then you will probably want to double the number of processors, double the disk I/O bandwidth, etc. The goal is to determine not just how your data warehouse performs today, but also how it is likely to perform in the future based on your growth estimates. What you want to look for is unchanging response times as your workload increases. If you increase your workload and increase your computing resources accordingly, then response times should maintain their current level if your system has good scalability. However, if performance begins to suffer, then you have a scalability problem lurking in your system. Figure 1 illustrates this. In the scenario displayed on the left, I show an example where you have good scalability. I increase the amount of data to be processed and also increase the number of CPUs (and any other necessary resources) accordingly, so the response times are unchanged. However, in the scenario displayed on the right, I also increase the amount of data and the system resources, but I see response times increase, alerting me to a scalability problem.
A real-world issue arises when we try to test different scales for a warehouse: access to the required amounts of hardware. If your machine only has four processors, how can you test scalability up to six processors? Or, if you only have 100GB of disk space available, then how can you test performance with 200GB of data? If possible, borrow resources from another system in your enterprise. Or, see if the hardware vendors will loan or lease you additional hardware for a brief period. However, it's a fact of life that this isn't always possible. The goal is to test as much scalability as your computing resources will allow. Draw as much of the graph shown in Figure 1 as possible. If you simply don't have the available resources, then you must skip this step. However, even drawing a little of the scalability graph is better than nothing at all.
Using System Measurements
The benchmarks I have been discussing are critical toward ensuring that your warehouse meets its performance goals, but by themselves they are insufficient. To ensure scalable performance, you must also look at a few critical system measurements that cover how the CPUs and the disks are being used. To do this, you need not write any additional benchmarks you need only use the system measurement tools that are supplied as standard utilities with all major operating systems.
Focusing first on the CPUs, you need to know CPU utilization across the system. If you're at 50 percent utilization, you have room to grow your warehouse a bit before you'll have to purchase additional CPU power. But, if you're at 95 percent utilization, you're close to hitting your performance limits; and, in fact, you probably already are hitting these limits during peak periods, and you'll need to buy additional CPUs soon. (Incidentally, you could argue that if you have 50 percent utilization, you bought a system too large for your current needs since you have an excessive amount of headroom. In reality, targeting 80-90 percent utilization gives you a good balance between having enough headroom and not buying too much hardware.)
Another important system measurement regarding CPUs is the load balance between the CPUs. In a well-performing system, the workload should be evenly distributed across the processors; and, therefore, all the CPU utilizations for the individual CPUs would be roughly equivalent. If one of the CPUs has a utilization that is markedly higher than the others (that is, one of the CPUs is "hot"), you will have a scalability problem regardless of whether or not you are currently meeting performance requirements. If one processor is hot, that means that the way the workload is being divided is ineffective. Sometimes, in a shared-nothing database architecture, this can be caused by incorrectly striping/partitioning the data such that frequently accessed data is being managed by a single CPU. Or, sometimes the culprit is a piece of code which was incorrectly written and which is only able to run on a single processor. In this situation, adding more processors will not improve performance (at least not very much), because the workload is not being evenly distributed across all available CPU resources. Essentially, the one hot CPU becomes a bottleneck. You need to find the culprit and eliminate it.
There are also identical system measurements that we must look at regarding the disk drives. First, you need to look at overall disk subsystem utilization. (By utilization, I am referring to how much of the I/O bandwidth is being used, not how much of the storage capacity is being used.) If your utilization is too high, you will need to add more I/O resources. Also, if your utilization is too low, then you want to either determine if there is a bottleneck elsewhere in the system (which causes the disks to sit idle) or increase the level of parallelism in the system to either increase system throughput or decrease response times. Also, similar to CPUs, you need to look at the load balance on the I/O subsystem. If one of the disks is hot, then you will have a scalability problem, because adding more disk drives by themselves will not improve performance for the same reasons we saw in our discussion of CPU load balance. To solve this problem, you need to choose a new disk striping/partitioning scheme that better balances the distribution of the data for your current workload.
In summary, creating a Performance Assurance Test Environment (which includes writing benchmarks to test performance metrics and tracking CPU and disk I/O utilization numbers) is a critical part of not just building a successful data warehouse, but also maintaining it successfully. It will give you a tool to allow you to proactively test the incremental additions to your warehouse and will alert you to performance problems before you release the latest incremental additions to your end users. As your warehouse grows over time, the Performance Assurance Test Environment is the primary mechanism you will use to proactively ensure that your system does not drop below your end-users' performance requirements.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access