There is every reason to do capacity planning for the data warehouse/DSS environment. Data warehouses grow at a tremendous rate. Data warehouses cost money. Data warehouses operate on a variety of technologies. Data warehouses have performance problems just like other technological environments. Therefore, it only makes sense to plan your data warehouse environment. In doing so, you place your corporation in a proactive position, not a reactive one.
How many corporations do capacity planning for the data warehouse/DSS environment? And of those few corporations that actually do capacity planning, how many have done capacity planning for the data warehouse environment effectively? The answer is that there are, at best, a handful of companies that have done data warehouse/DSS capacity planning effectively.
Does that mean that data warehouse/DSS capacity planning should not be done? Not at all. There is good reason to do capacity planning for the data warehouse environment; however, it is hard to do. It is different from anything that most capacity planners have ever done before.
This column will describe three approaches to capacity planning for the data warehouse/DSS environment. After reading about the three approaches, you can decide why capacity planning for the data warehouse/DSS environment is so difficult. And if you are one of the hardy souls who likes to be proactive, you might even choose one of the approaches.
What is it that needs to be planned in the data warehouse/DSS environment? While there are many facets to the data warehouse/DSS environment, the two most important aspects of capacity planning are planning for storage and planning for processors.
It is noteworthy that there are plenty of other things that come with the territory: what DBMS to use, what form of networking to be used, what end-user access and analysis tools should be used, what kind of integration and transformation tools should be used, and so forth. But the heart of the matter is how much storage and how many and what kind of processors should be used for the data warehouse.
The two key factors the capacity planner looks at are the amount of data there will be and the workload that will be run against the data warehouse. Unfortunately, both of these factors in the data warehouse/DSS environment are very difficult to ascertain.
The first approach to capacity planning is the analytical approach, where the capacity planner attempts to calculate and/or predict capacity needs before the equipment is purchased. In the analytical approach, the analyst attempts to quantify such things as:
- How many customers will be in the warehouse? At what rate will the customers grow?
- How many transactions will be in the warehouse? At what rate will the transactions grow?
- What other data will be in the warehouse? At what rate will the other data grow?
- What is the proper level of granularity for data in the warehouse? Can the level of granularity be changed if needed?
- What amount of history is needed in the warehouse? Will the user decide to add more history than anticipated?
Each of these interrelated questions must be answered in order for the analyst to determine how much data there will be in the warehouse. If you have ever struggled through an exercise of trying to accurately predict these questions, you know that obtaining accuracy is very difficult. In all honesty, a good guess is about the best that can be achieved.
Volumes of data are only one aspect of capacity planning. The other side of capacity planning in the data warehouse/DSS environment is that of workload projection. If you thought trying to predict volumes of data was difficult, wait until you try to predict what the workload for the DSS environment is going to look like. There are many factors that must be considered when trying to profile the data warehouse/DSS workload. Some of the more interesting factors are:
- How many farmers will you have? What does the average farmer query look like?
- What does the pattern of submission for the farmers look like?
- How many explorers do you have? What does the explorer query look like?
- Is there any pattern to the submission of analysis by the explorer community?
- Has the explorer community ever been addressed or graced with an infrastructure before?
- Are there predictable peaks and valleys of processing throughout the day, throughout the month or throughout the quarter?
- Will there be an attempt to use a resource governor?
There are many questions that need to be answered in order to portray the data warehouse/DSS workload. In the case of volumes of data, an accurate picture simply cannot be painted.
Perhaps the biggest enemy of the analytical approach is that of the attitude of the community of people using the data warehouse/DSS environment. In most cases, the data warehouse/DSS environment is one of discovery. People simply don't know what is going to happen until they get there. People don't know what they will do until they know the possibilities. And where people really don't know what they will do, trying to look into a crystal ball to predict what will happen is black magic.
Calibrated Extrapolation Approach
The calibrated extrapolation approach is one where there is, at best, a rudimentary attempt at analytical capacity planning. After the first or second iteration of the warehouse is created and after the first few users have become enamored with the data warehouse, then careful track is kept for the warehouse and its usage. Over calibrated periods of time, the growth of the warehouse is tracked. Based on the incremental growth that is being measured, an extrapolation of future capacity needs is made. The extrapolation of capacity needs then becomes an educated guess. Of course, the educated guess can be refined. The analyst can factor in known growth factors such as addition of new subject areas, addition of history and the like. In doing so, the analyst combines the best of the calibrated extrapolation approach and the analytical approach.
However, even when the calibrated extrapolation approach is used wisely and well, this approach has only a short time horizon for effectiveness. In other words, trying to project into the long-term future using the calibrated extrapolation approach is a dicey venture. Extrapolation can be done for three months or maybe even for six months, but anything beyond that is questionable.
The third approach is the copycat approach. In the copycat approach, you find a company (with roughly the same characteristics as your company) that has advanced into data warehousing further than your company. In this case, you simply ask what environment they are operating in and how things are going. The copycat approach is by far the easiest approach. When the copycat approach to capacity planning works well, nothing beats it. There are, however, pitfalls with the copycat approach, such as:
- The company being examined has not fed you accurate information.
- The company being examined has fundamental business and technological differences of which you are not aware.
- The company being examined is affected by and is responding to business pressures of which you are not aware.
All of these factors mean that the comparison between your company and the examined company may produce very misleading results.
There is, of course, a fourth alternative. That alternative is to let a hardware vendor come in and do capacity planning for you. This is surely the laziest way to go. Don't be surprised when the vendor discovers that the only way to meet your capacity needs is to buy the vendor's hardware. In short, the capacity planning done by hardware and DBMS vendors is an exercise in subtle hard selling. While you may actually get some useful capacity projections, you will certainly get a hard sale for the vendor's products.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access