Don't you just dread being asked the question, "How long will the modeling take?" I usually react to this question with another question: "How much time are you giving me?" Often, however, it comes down to me rolling up my sleeves and coming up with an estimate. I have noticed over the years that my estimates hover somewhere between 20 and 40 percent of the overall project effort. (Note that the 20 percent estimate is given rarely - only when the requirements are very well defined and the modeling and mappings are relatively simple.) What techniques do you use to estimate the modeling portion of your projects?

After determining the resources that will be provided to us and clarifying which artifacts the data modeler will be responsible for delivering, we can apply an equation to come up with a reasonable estimate to complete the data models.

Resources and Expectations

So what will the data modeler deliver? Richard Kooijman, data warehouse architect, suggests tackling this question first and offers these included artifacts: "We see this as the design phase ... including the data model, mapping rules and metadata information for end user tooling." Both Norman Daoust, data modeling consultant, and Georgia Prothero, data modeler, ask several questions prior to providing an estimate. Norman asks:

a) Could you produce me a complete list of attributes, including definitions, of your current system tomorrow?

b) Who is the customer for the model?

c) Who will approve the model?

d) Is this for a new system or for existing systems?

"I use the answers as the basis for my estimate," Norman explains. "If the answer to a) is 'We don't have any existing documentation,' I know what I'm in for and my response won't be at the low end of the range. I use the answer to b) to then speak with those people to understand what they want. I use the answer to c) to talk that those people and understand their acceptance criteria. I use the answer to d) to determine who will be the source for the data requirements and then speak with them."

Georgia asks three questions that neatly complement Norman's: "First, is the project using an existing database? Second, is the project an interface, a data entry application or a processing engine? ... Interfaces are the easiest, so the data modeling is likely to take up about 40 percent [of the project]. Applications and processing engines are more complex, so the data modeling is likely to take a smaller percentage of the overall effort. And thirdl, how experienced is your data modeler?"

Estimate Approaches

Here are some of the estimating approaches and formulas that were submitted in response to this challenge:

  • Data warehouse architect John Stinnett says, "If we're building a new mart, we roughly determine the number of fact and dimension tables and apply an hourly multiplier for each type. We'll also apply a complexity factor for a large number of attributes, snowflakes, versioning, alternate keys, etc."
  • Jan Cohen, data modeler says, "My definition of modeling is completing the physical model. My manager guesstimates how much time I can afford to put into a project per week, say 60 percent of my time. I ask the project manager when they need to start development, say eight weeks from now. So 40 hours times eight weeks times .6 equals how much time it will take to complete 80 percent of the model. The other 20 percent is reserved for requirement changes that come up during development."
  • Richard Kier, data architect, shares his technique: "My preference has usually been to allocate based on the number of developers. I generally assume that one modeler can support a team of four to eight developers, depending on the skills of both the development and modeling team. For a clean sheet project going through a full analysis and design session, I allocate a modeler 100 percent for each set of developers and then scale back - small enhancements to add functionality to an existing system would usually get one-fourth of a resource."
  • Chris Welch, data administrator, says: "I typically apply a confidence level as a factor. If I am very confident in my overall domain knowledge and I feel the time is about 20 percent, then I'll give a confidence rating of one and the estimate is at 20 percent. A two to five confidence level I will tend to add a quarter percent. So that breaks down as:

2 = +25% or 25% of total time
3 = +50% or 30% of total time

4 = +75% or 35% of total time

5 = +100% or 40% of total time

There were other creative estimation approaches submitted, including what Bob Schork, senior principal engineer, calls rapid data modeling: "After the initial [Java application descriptor] session or use case creation, when I have most of what I need, I tell the business user to give me a couple of days and let's see what I come up with. When I show the user the logical data model, I always get more detail because when they see the model visually, they start to think of things they have missed. So I get those missing elements and then modify the model and I am back the next day or two with the updated data model."

There were other creative estimation approaches submitted, including what Bob Schork, senior principal engineer, calls rapid data modeling: "After the initial [Java application descriptor] session or use case creation, when I have most of what I need, I tell the business user to give me a couple of days and let's see what I come up with. When I show the user the logical data model, I always get more detail because when they see the model visually, they start to think of things they have missed. So I get those missing elements and then modify the model and I am back the next day or two with the updated data model."

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access