I’m in charge of defining the administration tasks of a data warehouse application, and I was asked to define a method to determine the growth of the data warehouse (capacity planning). How can I achieve this in the most accurate way?
Sid Adelman’s Answer: It’s very difficult to plan for the growth of the data warehouse.
So what’s a capacity planner to do? You will estimate the size of the database from the raw data that will be loaded. Some organizations have used a 5X factor to estimate disk requirements. If you have 100 gig of raw data, buy 500 gig of disk space to support the indexes, the summary tables and the workspace. The hardware vendors should be able to offer some assistance, but be sure to understand all their assumptions as they configure a system that is sure to support you (and the marketing rep) for years to come.
Measure right from the moment the system is in operation. Discover what assumptions were wrong let your boss know that some estimates will be wrong and learn how to estimate more accurately for the next phase. Capture what you have learned and make this information available to others who will be estimating capacity.
Chuck Kelley’s Answer: One way is to capture how much data is going into the data warehouse and how it is being used (including by whom). You can keep statistics from your ETL tool (or your hand- written programs!) in terms of loading, you can keep statistics from your DBMS (hopefully) on usage. There are also products on the market that can aid in the task. Pine Cone Systems (www.pine-cone.com) is one such company.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access