Last month, I showed how partition elimination allows us to satisfy a query by only having to scan a portion of a table rather than having to perform a full table scan. If you recall, the example we have been using involves a data warehouse that stores information pertaining to foreign automobile sales in the United States. For simplicity, we have assumed our warehouse only has information on Mercedes, Porsche, BMW and Volvo and that it has roughly an equivalent amount of information on each type of car, all of which is stored in a single table called "Car_Sales." Also, we assumed that our hardware platform has four CPUs and four disks. In last month's discussion of partitioning, we partitioned the data by car type, and data relating to each car type was assigned to its own disk. (That is, all Mercedes sales data was on disk 1, all Porsche sales data was on disk 2, etc.) We then examined a query to look at only the BMW sales data and showed how partition elimination allows us to look at only one of the partitions (the BMW partition that resides on disk 3) instead of all four. We were, therefore, able to eliminate the need to scan the other 75 percent of the Car_Sales table.
But, did this use of partition elimination really reduce the response time of the query any more than the response time improvement we observed by just using parallel scans on all the data? Assume for a minute that a full-table scan of the Car_Sales table using a single scan thread (that is, no parallelism) would result in a response time of T1. The scan thread would scan disk 1, then disk 2, etc. If we then add in parallelism and create four simultaneously executing scan threads, each thread would only have to scan a single disk (which contains one-fourth of the data), so the response time would be T1/4. Now by using partition elimination, we can remove the need to create and execute the scan threads for disks 1, 2 and 4. However, the scan thread for disk 3 still must scan one-fourth of the total data, so the response time would still be T1/4, the same as it would be if we performed a parallel scan of all the data.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access