Clients often ask me if a particular RDBMS or hardware platform is scalable, or whether or not the design of their data warehouse or OLTP application will scale. What they are looking for is an answer of either "yes" or "no." But, neither of those answers is very meaningful. While the marketing departments of various vendors would love to classify all things as being either "scalable" or "not scalable," it's not that black and white. Scalability is a continuum, not a binary variable; and it can be measured in various ways.
There are no real standards to determine exactly how to measure scalability. Various people measure it differently, and this can make interpretation of the measurement difficult. Comparisons between two different measurements are sometimes impossible. For a measurement to have any meaning, you must know two things:
- How the measurement was calculated from the data that was generated by the scalability tests.
- What type of application was used for these tests.
Diminishing Returns
When running an application (such as a data warehouse or an OLTP application) in a multiprocessor environment, it is a well-known (but unfortunate) phenomenon that adding another processor rarely gives you a full processor's worth of incremental throughput capacity and that as you continue to add more processors, you get diminishing performance returns. For example, if a single processor can process 100 units of work per second, it's unlikely that two processors will be twice as fast. And, four processors are unlikely to be twice as fast as two, and so on.
There are two reasons for this. The first reason is really only noticeable in the case where a user starts with a single SMP box and then moves to a clustered environment by adding additional SMP boxes. Usually, when the user was running on a single SMP, they were running their RDBMS in "single node" mode. However, when they move to the clustered environment, they have to run in "multi-node" mode. This means that additional functionality is invoked to coordinate the actions of these multiple nodes. This additional functionality increases the length of the code path that must be executed for most database operations and, therefore, has a negative impact on performance. In fact, there will be a noticeable performance difference even on a single node system if the RDBMS setting is changed from "single node" to "multi-node." So, when adding a second node, it is unreasonable to expect twice the performance that you were used to when you were just using a single node.
The second reason is more important because it affects all scalable hardware platforms, regardless of the number of processors (or nodes) that you begin with. Let's again assume that a single processor can process 100 units of work per second. When there's just a single processor, all 100 units can be dedicated to processing database requests. However, when a second processor is added, some of this capacity must be dedicated to communicating with the other processor. (Assume this communications overhead takes one unit of work per second for a specific application in which we're interested.) Note that this affects both processors, so each processor can only perform 99 units of database work. When we add a third processor, each processor must now communicate with two others, so each can only perform 98 units of database work. Obviously, I'm simplifying matters tremendously, but you can see why each additional processor not only performs less incremental work itself, but also puts a drain on all the other processors. And, you can see that at some point, an additional processor will actually reduce the overall system throughput.
Interpreting the Numbers
What we're trying to do is quantify these diminishing returns. There is only one case where this is easy. If you happen to be running an application where there is virtually no interprocessor communication needed, then no CPU cycles need to be dedicated to communication overhead, and this is defined as 100 percent scalability (also known as "linear scalability"). Applications such as these, often referred to as "embarrassingly parallel," require that data access be completely partitioned, so that no processor ever needs any data managed by another processor. Though few real-world applications are 100 percent scalable (there's always some overhead involved), large table scans performed via parallel query can be extremely close because each processor can scan its own unique partition of the database file.
Quantifying scalability in other cases is less straightforward. However, there are generally two approaches that are used. The first compares the actual performance of n processors to the theoretical performance of n processors if the system and application were 100 percent scalable. For example, if a single CPU executes a query in 20 minutes, and 20 CPUs execute the query in 1.05 minutes, then the system is 19 times faster. If the system were 100 percent scalable, it would be 20 times faster. So, the scalability is 19/20, or 95 percent.
The second approach defines the scalability number as the overall system throughput that is gained when a processor is added compared to the gain added by the previous processor. For example, 95 percent scalability using this definition means that if the first processor contributes 100 units of work, then the second adds another 95 units, and the third adds (.95)x95 or 90.25 units, and so on. In other words, each processor adds 95 percent as much as the previous processor added. Using this definition, 95 percent scalability means the query mentioned above would complete in 1.56 minutes. Even though this approach is more complex to calculate, the advantage is that it intuitively captures the notion that the incremental gains diminish as more processors are added, and it allows you to predict the performance gains that each additional processor will provide.
Don't Forget the Application!
If I told you that my car got 30 miles to the gallon, you would probably want to ask me whether I was referring to driving in the city or on the highway. In the same way, if I tell you that this system is 95 percent scalable, you need to ask me about the application I was running. The same hardware and database combination will have very different scalability characteristics for different applications.
For example, query-based applications are usually partitionable which, therefore, lowers communication overhead and results in higher scalability numbers. OLTP applications, on the other hand, tend to require either centralized locking coordination (in shared-disk database architectures) or the use of a two-phase commit protocol (in shared-nothing database architectures), and this overhead results in lower scalability numbers.
The next time someone is discussing scalability numbers with you, don't take the numbers for granted. Ask questions to ensure that you understand the context in which the numbers are being used. And, the next time you are curious about the scalability of a certain product or design, don't ask "Does it scale?" but rather "How well does it scale and under what conditions? "
Ken Rudin is the CEO and co-founder of LucidEra. Rudin has held key roles with leading software as a service and business analytics companies, including Siebel, Salesforce.com, NetSuite and Oracle.










Be the first to comment on this post using the section below.