Capacity planning is a problem for a data warehouse because it sets contrasting functional requirements against each other. On one hand, data warehouse customers consume data warehouse capacity as they query the data in the data warehouse business intelligence (BI) reporting. Meanwhile, applications consume data warehouse capacity as they load data into a data warehouse through the extract, transform and load (ETL) process. These two functions, BI reporting and ETL, grow in volume and frequency as a data warehouse grows. Additionally, database administrator (DBA) tasks, such as backups and table reorganizations can cause additional data warehouse bottlenecks because ETL, BI reporting and DBA tasks contend for the same resources.  

Often, the first solution proposed for resource contention is more resources. However, such contention for system resources can marginalize the value and impact of additional hardware and processing capacity. To paraphrase an old saying, we don’t need more cooks in the kitchen, and we don’t need more kitchen for the cooks.

 

We face these issues daily in another environment – street traffic. Daily, millions of cars and trucks use a shared resource - the street. We can take some principles from street traffic and apply them to capacity planning in a data warehouse. Rather than throw more money at resource contention, a data warehouse can resolve some of its resource and capacity constraints by using the following five lessons from the street.

 

1. Obey Traffic Signals

 

A four-way intersection would become a multivehicle traffic accident were it not for the lights hanging above the intersection. Each individual traffic light faces an individual lane of traffic. Drivers are not confused as to which traffic light determines their passage through the intersection. Instead, each driver waits for the one traffic light facing his/her lane to display the color green before proceeding. This system works because all the traffic lights are coordinated to allow only one flow of traffic through an intersection at a time. Likewise, a data warehouse needs “traffic lights,” a centralized control mechanism that coordinates the activities of a data warehouse. With a central control mechanism, a table would not be locked by an ETL application during a backup or locked by a backup during an ETL load. A data warehouse operates with its least efficiency when it contends with itself. By removing resource contention, the resources of a data warehouse can operate with greater efficiency, which improves the impact and value of the capacity already in a data warehouse.

 

2. Travel at Similar Speeds

 

If you’ve driven on the highway, it’s happened to you. You look in the rearview mirror and see a car coming up behind you, driving much faster than you. You immediately look for a path that will get you out of the way before the other car hits you. The approaching car swerves, misses you by a hair and you think, “I hope a police officer stops that car.” No car is allowed to consume more than its fair share of the road; otherwise, it will eventually collide with other cars on the road. Likewise, no process (i.e., ETL, BI reporting or DBA task) is allowed to consume more than its fair share of a data warehouse; otherwise, it will eventually collide with other processes in the data warehouse. Some processes can control their resource consumption, others cannot. For the processes that cannot control their resource consumption, a data warehouse must enforce its own speed limits; otherwise, an application or process can run amok through the data warehouse.

 

3. Big Vehicles Pay More than Small Vehicles

 

Large vehicles consume more than small vehicles. Large vehicles require a deeper and stronger foundation underneath the pavement, send more exhaust fumes into the air and occupy more physical space than small vehicles. As a result, owners of large vehicles pay more money for the license tag required to travel on public streets. Likewise, large applications and processes consume more of a data warehouse than small applications and processes. Large applications consume more physical data storage, memory, central processing unit (CPU) cycles and input/output (I/O) throughput than a small application or process. As a result, owners of large applications and processes contribute more funds to a data warehouse than small applications and processes. This has a balancing effect. Applications that require more from a data warehouse bring more to that data warehouse; and, applications that cannot justify funding a contribution to a data warehouse do not consume the resources of that data warehouse.

 

4. Upgrade Prior to Traffic Growth

 

Street traffic patterns change. A new residential zone, business district or shopping mall can drastically increase the traffic volume on one set of streets and simultaneously decrease the traffic volume on another set of streets. Rather than wait to see if drivers really do travel to and from the new residential location, planners consider streets to be part of the local infrastructure. When a new residential zone, business district or shopping mall is built and the people come to it, the streets are prepared to support the increased traffic volume. Likewise, a data warehouse planner must plan for new business growth, interfaces and subject areas. A data warehouse planner does not wait until new business growth has occurred, interfaces have been created or subject areas have been added to a data warehouse to plan for the increased data volumes and capacity consumption. A data warehouse planner should consider the hardware and processing capacity to be the infrastructure of a data warehouse and build up that infrastructure prior to increasing the load on a data warehouse.

 

5.One Set of Rules Applies to Everyone

 

Finally, rules of the road succeed only when they are applied universally. When rules are not applied universally, those to whom the rules do not apply will be perceived to be the privileged group. Everyone will want to be in that group and will do what they can to be in it. Chaos is likely to ensue, followed by unmet expectations as everyone trying to be in the privileged group realizes the group no longer exists or never existed at all. A system of resource allocation that is recognized as fair and universal is the first and best way to avoid political anarchy, which occurs when multiple people and groups contend for data warehouse resources.

 

A street is similar to a data warehouse; both are long-term investments built for the utility of many. As such, some of the principles of capacity planning for a street have similarities and parallels with capacity planning for a data warehouse. By learning and applying these lessons from the street, those responsible for a data warehouse can increase the value and ROI of a data warehouse.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access