Continue in 2 seconds

Scalable I/O, Part II: Database Partitioning

Published
  • June 01 1998, 1:00am EDT

Last month, I stated that good query performance requires (among other things) scalable I/O performance and that there are two major techniques that enable scalable I/O. I discussed the first technique (called "striping") which allows you to take a pool of data and distribute it across a set of disk drives. However, it doesn't let us place various pieces of data at certain physical locations based on the value of the data itself. Why would we want to use the value of a particular piece of data to determine where it should go? To understand, let's reuse the example from last month where we have a data warehouse that stores information pertaining to foreign automobile sales for the United States. Assume our warehouse only has information on Mercedes, Porsche, BMW and Volvo, and that it has roughly an equivalent amount of information on each type of car, all of which is stored in a single table called "Car_Sales." Also, assume that our hardware platform has four CPUs and four disks.

Now, suppose you wanted to issue a query that only looked at BMW sales. Traditionally, you would have to scan the entire table looking for rows that pertain to BMW sales. But, wouldn't it be better if we could group all BMW sales records together in one portion of the table? We could then hone in on only those parts of the table that are relevant, thereby speeding up our query because we don't have to scan through lots of irrelevant data. The technique we need to accomplish this is called "database partitioning" or "table partitioning." There are three common partitioning strategies: range partitioning, hash partitioning and round-robin partitioning.

Range partitioning allows you to define non-overlapping ranges of data values (such as 0-100, 101-200, etc.) and then place each row into the appropriate range, based on the value of a particular column in that row. This, therefore, requires selecting the column (known as the "partition key") by which the data will be physically separated. For example, if we had a customers table, we might choose to use the "Last_Name" column as the partitioning key and define ranges such as A-E, F-J, etc. We would then assign the A-E range to one set of disks, the F-J range to another set of disks, and so on.

Choosing an effective partition key requires an understanding of the types of queries users will be executing. Generally, you want to use as your partition key a column that is frequently used in the WHERE clause of your SELECT statement. In our Car_Sales example, if we frequently look for a specific type of car in our database, using the car type column might be a logical choice for a partition key. Then, to create our partitioned Car_Sales table, we would first allocate multiple chunks of space on our disks, and call them "partition_1," "partition_2," etc. We would then create our table using a syntax similar to the following:

CREATE TABLE Car_Sales

( type CHAR

model CHAR

price NUM

dealer CHAR

sale_date DATE )

PARTITION BY VALUE

type='Mercedes' IN partition_1,

type='Porsche' IN partition_2,

type='BMW' IN partition_3,

type='Volvo' IN partition_4;

Note: In this simplified example, we're not really using "ranges," since each partition holds just a single type of car. If we had more car types, we could define real ranges with a syntax such as "type BETWEEN ('Audi' and 'Ford') IN partition _1." Then any make of car that falls into that alphabetic range would be placed in partition_1.

Now assume we want to execute the following query:

SELECT type, model, price, dealer, sale_date

FROM Car_Sales

WHERE type='BMW' AND

price BETWEEN (30000 and 45000)

The database engine knows that the data which has been requested (BMW vehicles) resides only in partition_3, so it does not use any system resources to retrieve the data other than the one disk and one CPU which services the one scan process. What has occurred is known as partition elimination. Instead of using all four CPUs and all four disk drives, we use partition elimination and are able to use just one CPU and one disk drive and scan 75 percent less data to get our answer.

With range partitioning, there is deterministic logic behind where a particular row was placed, and that can be used to make some queries more efficient via partition elimination. However, we have to be careful to avoid data skew, which occurs when the data is not evenly divided up between the partitions. The goal of the second type of partitioning, called hash partitioning, is to try to get the best of both worlds--deterministic placing of rows while minimizing the chances of having data skew.

To achieve this goal, hash partitioning applies the hashing function to one of the columns in your table (the "hash key") and uses the resulting output value to determine into which partition the row should be placed. The hashing function is chosen so that its output values are evenly distributed given the range of possible input values. Unfortunately, hash partitioning won't really enable us to perform partition elimination when performing table scans, because the data in any particular partition is not associated with a single range of data. However, if we're looking for an individual row, hash partitioning can be used to instantly hone in on the correct partition if the hash key is used as part of the query.

Both range and hash partitioning are also valuable in helping improve the performance of joins on clustered and MPP architectures. In these architectures, partitions are assigned to physically separate nodes, and these nodes are connected by a high-speed interconnect. Even though the interconnect is a high-speed transfer medium, you must minimize data transfer across this interconnect in order to obtain maximum scalability. By using range or hash partitioning on the join keys for all tables which will be joined for a query, we can guarantee that any rows that need to be joined will always be found in the same partition and, therefore, on the same node. Data transfer between nodes will be minimized (because all the joins will be local), and performance will increase.

The third common partitioning method is basic round- robin partitioning. Since it uses the round-robin technique, the results of this "partitioning" are exactly the same as if you used the basic striping technique I discussed last month. This method of partitioning provides the easiest way to evenly distribute data across multiple disk drives, because you don't need to worry about selecting a partitioning key or a hash key. However, it doesn't give the database any way of performing partition elimination; and it doesn't help with joins, because the rows are randomly distributed.

Using the techniques of striping and partitioning enables you to get the maximum throughput from your I/O subsystem while minimizing the amount of I/O work that needs to be done to execute a query (via partition elimination). But, while the concepts are straightforward, tuning your I/O can be very complex, and there are a number of pitfalls. Next month, I'll wrap up this series on scalable I/O with a discussion of the most common pitfall (data skew) and how the combination of striping and partitioning can bring further performance benefits.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access