In the IT world, wouldn't it be great if there was some magical solution to all our performance problems? Wouldn't you just love to have a button on your hardware platform labeled "Increase Performance" that you could push as often as necessary? Or, how about being able to improve your database performance by simply setting the value of a "Go Faster" parameter to "True?" In reality, some people mistakenly think of parallelism and parallel queries as this type of magic performance elixir. For those IT shops that are fortunate enough to own a scalable hardware platform and a parallel database, if one of their queries isn't running fast enough, they initially react in one of two ways. Either they turn parallelism on for that query; or if it was already being executed in parallel, they increase the level of parallelism to get more processors involved.
At first blush, this seems to make sense. A query should complete faster if you increase the number of processors working on the query, since each processor has to perform a smaller portion of the overall query workload. But, why then do I get calls from clients who tell me that they recently increased the level of parallelism for a particular query, but the performance didn't improve at all (or, even worse, the performance degraded)? Usually, these clients call to complain about the database, thinking they've found a bug. But, it's usually not a problem with the database. Nor is it a problem with the concept of parallel queries. The problem is caused by an incomplete understanding of how parallelism speeds up queries.
Previously in this column, I have written about how parallel queries use techniques known as partitioning and pipelining to make remarkable improvements in the time it takes to process a query. (See "Parallelism in the Database Layer" in the December 1996 issue of DM Review.) But, it is important to realize that parallelism does not reduce the amount of work that needs to be done. It just divides it up into a number of smaller pieces, each of which can be worked upon simultaneously. Actually, parallel queries add overhead, due to the fact that a large query first must be broken down into smaller pieces (called sub-queries), then these sub-queries must be distributed to various processors, and then the results must be merged before handing them back to the user. Yes, that's right- -parallel queries slightly increase the total amount of work that needs to be done. Because of this, my definition of a good parallel query algorithm is one that increases the overall workload by as little as possible. With this in mind, let's look at the two most common reasons why parallel queries sometimes fail to improve query performance.
Lack of Available Resource Capacity
The most common culprit is the lack of any available excess processing capacity. (Actually, it's the lack of excess capacity in any system resource, including disk drives, disk controllers, system bus, etc. If any of these components are at their bandwidth limits, parallelism will not increase performance. But, I'll focus on CPUs for this discussion.) Even if a machine has many CPUs, if it is running close to 100 percent busy (that is, very few idle CPU cycles), then parallelism cannot improve overall system performance. Remember, we are not eliminating any work, parallel queries will just redistribute the work. For example, let's look at an eight CPU machine. For scenario 1, assume there are eight queries running (without parallel query capability), with each one assigned to its own CPU. Also assume that these queries are keeping the machine nearly 100 percent busy, and each processor can complete its own query in 16 minutes. In this scenario, at the end of the 16th minute, all eight queries will be completed.
For scenario 2, let's turn on parallel queries and have each query run across all eight processors (via the use of eight sub-queries). The first query will finish in about one- eighth of the time (that is, two minutes), and each subsequent query will also finish in two minutes. Notice that it will still take 16 minutes of elapsed time to complete all eight queries, just as it did in scenario 1. Therefore, we really have not increased overall system throughput at all. And, due to the increased overhead described above, it would actually take slightly more than two minutes to complete each query, so overall throughput would go down.
Many of you are probably thinking that there is, in fact, a gain here, because unlike scenario 1, at least some users in scenario 2 get their answers in less than 16 minutes. But, there are significant downsides too. In scenario 1, all users can begin getting the first rows of their result set fairly quickly, because processing on all eight queries starts immediately. In scenario 2, processing of a query only begins once the previous query is completed. For example, even if all queries are submitted at roughly the same time, the user who submits his or her query just a single second after everyone else will have to wait 14 minutes before that query even begins to be processed. So, the time to get the first row back will greatly increase for most users.
Now let's look at scenario 3, where only four queries are running (without parallel query capability), so the system is roughly only 50 percent busy. As before, assume each query takes 16 minutes to complete, so at the end of the 16th minute, all four queries are complete. If we again turn on parallel queries (call this scenario 4) and have each query run across all eight CPUs, then each query again finishes in two minutes. But notice that now, all four queries finish in eight elapsed minutes, not 16. Because excess processing capacity was available, we were able to increase throughput by a factor of two.
The bottom line is this: With a few important caveats, increase the level of parallelism until some component of your system approaches 100 percent busy. After that, the only way to increase overall performance is to add more resources to your system (which is exactly what scalable systems were designed to allow you to do).
Problems That Are Too Small
The other common culprit is when the size of the problem to be solved is too small. When clients ask me, "What types of problems do not lend themselves to being efficiently solved with parallelism," the most straightforward answer I can give is "small problems." Because of the overhead associated with running a query in parallel, it is possible that the overhead added can outweigh the benefits of parallelism if the problem to be solved was very small in the first place. For example, assume it takes 1.5 seconds to create and distribute the eight sub- queries described above and that it takes another 1.5 seconds to merge the eight results sets back into one final result set to be returned to the user. In the earlier example, the sub-queries each ran in two minutes, so these additional three seconds of overhead have no real noticeable effect. But, what if the original query only took eight seconds to run (instead of 16 minutes)? Then each sub-query would take only one second, but now the three additional seconds of overhead have a significant impact. Your query will complete in four seconds, which is only twice as fast as before (rather than eight times as fast, as you would expect). And if the original query only took two seconds to complete, then turning on parallelism would slow it down.
This brings me to the caveat I mentioned above. Even if you have available excess system capacity, it doesn't always make sense to increase the level of parallelism for a parallel query, because at some point the sub-queries will become too small and response times will increase. To see this more clearly, imagine that I gave you a 100-page book and asked you to find all occurrences of a particular word. To help you with your task, I also tell you that you can use as many people as you like to work with you (that is, you have unlimited available resources). It's not hard to imagine that 10 people could probably solve the task faster than one (i.e., parallelism level set to 10--each would scan 10 percent of the book) and that maybe even 100 people could solve the task faster than 10 (i.e., parallelism level set to 100- -each would scan one percent). But, how about 10,000 people? Your performance would go down. The overhead of trying to divvy up the work so finely and the effort involved in trying to coordinate 10,000 people would surely outweigh any performance gains. Applied to parallel queries, the rule is that short-running queries should not be run with parallelism turned on.
Knowledge--Better Than Magic
Parallelism is an extremely powerful tool for ensuring good performance for your data warehouse (and for just about every other application too, such as OLTP and data mining). But, it is not magic. Like all things, it has places where it is useful and places where it isn't. In a perfect world, query optimizers would know exactly when to use parallelism and how much, but unfortunately they're not magic either. So, to fully leverage the capabilities of parallel queries, you must truly understand how they work.
Ken Rudin is the CEO and co-founder of LucidEra. Rudin has held key roles with leading software as a service and business analytics companies, including Siebel, Salesforce.com, NetSuite and Oracle.










Be the first to comment on this post using the section below.