Q:  

I am preparing a document for effort estimation for data warehousing projects. Currently we are using WBS (work breakdown structure) for effort estimation. I want to have some ideas about how others are doing the estimation.

A:  

Clay Rehm's Answer: Congratulations on using the WBS! No matter what kind of project you are doing, including an evaluation of the WBS is highly recommended. Other factors include number of users, number of entities and attributes, number of proposed programs and numbers of queries and reports.

Chuck Kelley's Answer: I think that the work breakdown structure is the most widely used structure for estimation. The problem comes when you delve into an extremely low level of detail. That causes the "iterative" processes not to be executed very well.

Joe Oates' Answer: In my experience, using a WBS is the best approach if you have not estimated and built a couple of data warehouses for experience. The key is to have an estimate with close to the number of tasks that will actually be performed. A good place to look for an example is Building the Operational Data Store by Bill Inmon.

One of the most common mistakes that I have seen in estimating the effort (and consequently cost) for a data warehouse project is to underestimate the effort needed for ETL and reporting. It is very important to base the ETL on the number tables to be extracted from the source system. In my experience, you should allow an average of three to 10 days per table, depending on a) which tool, if any you are using and b) the level of experience your people have with the tool and c) whether there is anyone available who really understands what's in each table/file and how the contents of the table/file are used. The same concept is applicable for the reporting tool, even though the time for producing each report should be less than for doing ETL for each file.

Sid Adelman's Answer: What you want is ideas about how others who are successful are doing the estimation. The WBS itself does not provide any estimates. It does give you a place to capture the estimates and, with most project management tools, the WBS and the estimates for effort and duration you can generate a project plan.

The following is taken from the chapter on Project Planning in Data Warehouse Project Management by Sid Adelman and Larissa Moss.

Estimating

The job of estimating how long a task should take, the work effort and the duration are difficult at best and almost always too optimistic. There are many factors that should be taken into account in the estimating process.

Experience

Skill/experience level - A DB2 or Oracle database administrator (DBA) with skills and experience in data warehouse will have already made the big mistakes and, hopefully, will have learned from those mistakes and not make them again on your project. It will take the experienced DBA far less time to create the database design; he will make far fewer missteps and backtracks. He will require less time to research and will be surer of himself, thereby cutting down on wasted effort, reducing the learning curve and on hesitancy in unsure situations. A DBA with skills and experience may be anywhere from two to six times more efficient than one new to the job.

Knowledge of the organization - A person that is newly hired will take some time to learn how things work in your organization, and will take some time to familiarize himself with the standards and procedures specific to your organization. Most importantly, it will take him some time to learn the people organization, the power structure, who honors their commitments, who is competent and who are either allies or foes.

Documentation - Another aspect to experience is the accurate, documented knowledge of the data, the processes of the systems and the organization. The completeness, accuracy and availability of this documentation will impact the work that needs to be done.

Dependencies

Some tasks are dependent on the completion of others while other groups of tasks may be done concurrently.

Task dependencies - When you are building a house, you have to lay a foundation before you start the framing. By the same token, you must understand the user requirements before much else is done. For example: the logical data model must be fairly complete before work is started on the physical model. Most tasks will have a dependency on a preceding task, and it is critical to identify and understand these dependencies as the project plan is built. The task dependencies will determine the critical path of the project plan.

People dependencies - Tasks have associated resources, i.e., skilled people who are either primarily or secondarily responsible for a task. Let's assume that two tasks are independent and can be performed in parallel, but you may have only one person who has the skill needed to perform either task. In this case you cannot take advantage of scheduling the tasks to be worked on in parallel, because both tasks are now dependent on one person. In addition, people are not always available as scheduled. They become ill, go on unscheduled leave or get committed to some other project (as is so often the case). Obviously, the tasks assigned to them will not be completed according to the original start and completion times on the project plan. People resources must be constantly reviewed and revised to stay on top of the plan.

Decision and sign-off dependencies - Many decisions, sign-offs and validations are made in the course of a data warehouse project, both by management and by the technical people. We have seen projects slip due to the inability, unwillingness or inattentiveness of decision-makers to make decisions. This has resulted in schedule impacts from key decisions that have not been made in a timely manner. If the real decision-maker is not in the meeting where the decision is supposed to be made the schedule may also be impacted. The stand-in may not have the authority or knowledge to make the decision, and so, time is lost. The people responsible for the decisions and sign-offs must be made aware of the impact of their indecision, their unwillingness to delegate authority, and the impact of their not attending a decision-making meeting.

Industry Standards

There are no real standards for how long it should take to perform any data warehouse task. Some of the project management tools have suggested times but these times should be taken only as guidelines and must be modified for the project, the team and for the environment. The suggested times are best used to alert the project manager to a possible misunderstanding of what the task involves. If a suggested time is 35 hours and the project manager was about to estimate three hours, it alerts the project manager to a discrepancy in an understanding of the depth of the task.

Because of the absence of standards due to the high variability between data warehouse projects and the high risk of a data warehouse project, it is essential that the project manager has some prior experience in managing large dynamic projects. A data warehouse is not a training ground for a rookie project manager.

In the Best of All Possible Worlds

If estimates are derived with the idea that everything will go perfectly, that the software will install without any problems, that meetings will happen on time, and that decisions will always be made in a timely fashion, then the work effort and especially the duration will be grossly underestimated. If your organization's experience of installing software from a particular vendor has been a problem in the past, it will probably be a problem with your project. In your organization, if meetings usually start late, are postponed or unproductive, they will probably start late or be postponed and be less than efficient with your project. If your business sponsor has a reputation for not making decisions, you should factor in the delays as you estimate duration.

Hours and Elapsed Time

Except for small, contained tasks, elapsed time will almost always exceed the work or effort hours. Activities are usually multi-tasked with personnel working on multiple tasks at the same time, and occasionally the same task may be assigned to multiple (usually no more than two) people. Events that are non-project related will absorb both work hours and durations. Events such as meetings unrelated to the project, company outings, weather-related problems, illnesses, holidays, vacations, personal time, World Series, titillating Senate Committee hearings, earthquakes, pestilence and floods and countless other interruptions mean you will be unable to accurately predict and these events must be taken into account. Due to these types of interruptions, some project managers will only consider using five or six hours each day to be allotted to productive project and non-project activities. Other managers give an overall 25 - 30 percent bump to the overall project estimate and schedule.

Do not initially schedule work for Saturdays and Sundays, holidays or any of your team's scheduled vacation. Toward the end of the project, a few weekend days may be necessary but do not include them in your initial project plan. Although it may not have occurred to some readers to have their staff scheduled to work seven days a week, we have seen this practice.

Rework

The reality of projects is that some tasks will need to be redone - hopefully not more than once. A design review may uncover flaws that require the task to be reworked. Tests may indicate problems with design, with a vendor's product or with the user interface. Circumstances with the sponsor, with the organization, with competition, or with new regulations may cause the results of some tasks to be scraped and the task to be reworked under the constraints of new requirements or new direction. Estimates should include the time and effort for rework even though it is not be possible to predict where the rework will be required or how much will have to be redone.

What about the role of those wonderful data warehouse tools? Shouldn't they have a positive impact on the productivity of the team and therefore the schedule? They should have a positive impact but the usefulness of the tools will be determined by how well trained your people are. Do not believe the productivity representations of the vendors. Even when they use real situations with real productivity improvements, the examples are almost always taken from the best of the best. Your situation may not be as fortunate. Also there is the learning curve that usually results in an initial loss of productivity.

As a final note on estimating, a mentor once suggested coming up with a best estimate for each task and then doubling it. At the risk of appearing terribly imprecise, this technique has worked for many of our project managers.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access