Across the industry, a common metric is in cost per GB stored. Is there a standard for totaling the costs (does it including hardware, software, maintenance and personnel)? We are trying to find a means to compare to the standard but need a consistent industry-wide formula.
Evan Levy's Answer: Cost per GB stored is a very tricky metric to use for data warehousing systems. Data warehouses can typically require storage 3 to 5 times greater than the underlying data. This is due to the storage method, table and row overhead, working storage, indexes, and other associated DBMS details. It also doesn't accurately reflect the inherent differences between storage systems (host attached/JBOD, NAS, or SAN). The costs of these alternatives are dramatically different - as is their functionality.
I don't know of an industry standard calculation. While I've seen many total cost of ownership calculations, they are typically focused on individual platform components (e.g. the cost of storage, or the cost of a server platform). You're focused on determining the cost of the business solution.
When we use total cost of ownership of an application or a system with our clients, we like to break the costs into two areas: initial development and ongoing maintenance.
We frequently use a three-year schedule when determining costs because of the depreciation and support/replacement schedules that many companies use (your company might be different). It's common for vendor and internal support costs to change on a yearly basis, so it's important to note these changes. A simple calculation for three-year ownership would include
Yr1dev - year 1 development costs; Yr1Mnt - year 1 maintenance costs
Sid Adelman's Answer: There aren't any standards that are meaningful but these are some ideas on DW costs and a template for calculating your costs taken from Data Warehouse Project Management by Sid Adelman and Larissa Moss.
User support staff or the help desk staff will be the users' primary contact when there are problems. Providing adequate user support will require more people, and more training of those people, to answer questions and help the users through difficult situations. The cost for the additional people, the training and possibly an upgrade in the number and knowledge-level of the staff answering the phones must be added into the data warehouse costs.
Consultants and Contractors
Consultant and contractor expenses can balloon a project's cost. Consultants are used to supplement the lack of experience of the project team; contractors are used to supplement the lack of skilled personnel. There are two types of consultant/contractors:
Very often, contractors and consultants are asked to participate in activities beyond their original statement of work. These may or may not be useful activities and they may or may not contribute to the success of your project. Keep the contractors and consultants focused on their stated objectives and on your project and, if they are working on something other than your project, be sure some other department is paying for them.
The software products that support the data warehouse can be very expensive. The first thing to consider is which categories of tools you need. Do not bring in more categories of products than you need. Do not try to accomplish everything with your first implementation. Be very selective.
Hopefully, you have someone in your organization experienced in dealing with vendors and understanding their contracts. You will be working closely with this person. They will know the things to watch out for in a contract, but you will need to give them some help to acquaint them with data warehousing. You will also have to give them some warning if you heard anything negative about the vendor. Your contract people will know how to include protection in the contract to keep the vendor from arbitrarily raising their prices. They will know how to control maintenance costs. They will know about protection if the vendor sells out to a less-accommodating company. They will know about enforcing satisfactory service from the vendor. If you do not have anyone in your company who can perform these functions, engage a procurement consultant.
Most of the products have base prices and many add-ons. For example, add-ons with the ETL tools could include additional costs for each different type of source file or target database. The tools are often priced based on the platform and the size of the platform. For the query tools, they are based on the number of seats and these could either be designated users or concurrent users. When you talk to the references you will want to ask them what they planned to buy from the vendor, and what they had to eventually buy. There often are surprises.
Anticipate growing your environment. The growth will be in the number of users (web delivery could significantly increase the number of users), the size of the database, much more machine resources necessary to perform the more complex queries, an extension to users beyond your enterprise (for example, customers and suppliers through an internet capability), and possibly, the need to migrate to a more robust and better performing platform. Be sure your contract allows for these types of growth factors, and that the growth, while it may increase, does not explode your costs.
Capacity planning for a data warehouse is extremely difficult because:
All these unknowns mean that whatever hardware is chosen, it must be scalable. It must be able to scale and grow to at least three times the largest anticipated size.
For the data warehouse, you will need CPUs, disks, networks and desktop workstations. The hardware vendors can help size the machines and disks. Be aware that unanticipated growth of the data, increased number of users and increased usage will explode the hardware costs. Existing desktop workstations may not be able to support the query tool. Do not ask the query tool vendor for the minimum desktop configuration. Ask for the recommended configuration. Call references to find out if and how they had to upgrade their desktop workstations.
Raw Data Multiplier
There are many debates over how much disk is needed as a multiplier of the raw data. Besides the raw data itself, space is needed for indexes, summary tables and working space. Additional space may be needed for replicated data that may be required for both performance and security reasons. The actual space is very dependent on how much is indexed and how many summary tables are needed. The summary tables may be created as you learn more about what the users are asking for and how often they ask. From this information, you will be creating summary tables, one at a time. Considerable disk space will be required for the staging areas where the raw data is prepared and then loaded to the target data warehouse databases. The RDBM vendors should be able to help you with estimates of how much working space you will need. We did a study of what other data warehouse installations were using as a multiplier. It was highly varied but we arrived at a five times multiplier - if you have 100 gigabytes of raw data you should provide for 500 gigabytes of disk space.
Internal People Costs
These people are getting paid anyway regardless of whether we use them on this project or not. Why should we have to include their costs in our budget? We have to assume these people would be working on other productive projects. Otherwise, there is no reason for the organization to keep them employed. Count on having to include the fully burdened costs of the people on your project. Keep in mind that you are much better off with a small team of highly skilled and dedicated workers than with a larger team of the type of people who should be avoided.
User training is usually done on the premises and not at a vendor site. There are four cost areas for user training that must be considered.
Training may be simplified if the delivery is Web based.
Generic training may be appropriate. Examples are classes in logical data modeling, data warehouse project management or star schema database designs. Data warehouse conferences and seminars can provide an overall perspective as well as training in specific areas. IT will need to attend training on the complex tools and products. IT will also need enough time to work with the products to become proficient. The cost of training is sometimes included in the price of the tool.
Most organizations focus on the cost to implement the initial data warehouse application and give little thought to ongoing expense. Over a period of years, the continuing cost will very likely exceed the cost of the initial application. The data warehouse will grow in size, in the number of users and in the number of queries and reports. The database will not remain static. New data will be added, sometimes more than for the initial implementation, the design most likely will change, and the database will need to be tuned. New software will be introduced, new releases will be installed and some interfaces will have to be rewritten. As the data warehouse grows, the hardware and network will have to be upgraded.
User support of the data warehouse will be an ongoing expense both from the maintenance team and the help desk. Desktop workstations will need to be upgraded as will the software on those workstations although the cost may be significantly reduced by deploying web-based systems.
Total Cost of Ownership
The total cost of ownership (TCO) includes all the direct and the indirect costs. Examples of direct costs are the software and hardware purchased specifically for the data warehouse project, training costs and consultants. Examples of indirect costs are internal costs related to improving standards, procurement time and management attention. TCO includes costs that are normally hidden from a budget such as the additional time the users spend working with the data warehouse directly, assuming that in the past they sent their requests to IT. TCO includes costs you incur initially as well as costs in future periods. If your recruiting requires using placement agencies, their fees must be included. Insurance, tax, floor space and utility costs will all be part of the TCO. The TCO should be calculated for the initial implementation and for projected costs in later years for the enhancement and sustenance of the data warehouse.
In the calculation column, indicate how you arrived at your numbers. For example in the consulting row, Number of Consulting days X Day Rate.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access