Last month, I stated that good query performance requires (among other things) scalable I/O performance and that there are two major techniques that enable scalable I/O. I discussed the first technique (called "striping") which allows you to take a pool of data and distribute it across a set of disk drives. However, it doesn't let us place various pieces of data at certain physical locations based on the value of the data itself. Why would we want to use the value of a particular piece of data to determine where it should go? To understand, let's reuse the example from last month where we have a data warehouse that stores information pertaining to foreign automobile sales for the United States. Assume our warehouse only has information on Mercedes, Porsche, BMW and Volvo, and that it has roughly an equivalent amount of information on each type of car, all of which is stored in a single table called "Car_Sales." Also, assume that our hardware platform has four CPUs and four disks.

Now, suppose you wanted to issue a query that only looked at BMW sales. Traditionally, you would have to scan the entire table looking for rows that pertain to BMW sales. But, wouldn't it be better if we could group all BMW sales records together in one portion of the table? We could then hone in on only those parts of the table that are relevant, thereby speeding up our query because we don't have to scan through lots of irrelevant data. The technique we need to accomplish this is called "database partitioning" or "table partitioning." There are three common partitioning strategies: range partitioning, hash partitioning and round-robin partitioning.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access