Question: I need to prepare an estimation effort for our ETL process. Is it right to calculate the effort only based number of sources, targets and transformations? If it is not only based on above criteria what are the other criteria which we need to follow during estimation effort preparation? Also let me know the percentage of complexity level and data quality issues need to be taken into account. It would be great if anyone could post a sample ETL estimation effort template for reference.
Sid Adelman's Answer: In your question you've identified some very important determinants of ETL effort. Here are a few more thoughts you need to consider:
- How well documented are the source files
- How knowledgeable are the ETL developers with the source data
- How clean does the data need to be? We often find that some data does not have the same data quality requirements.
- How knowledgeable are the ETL developers with the ETL tool
- Will the ETL developers be assigned full time to the project
- How well is the project being managed
- How much data has to go through the ETL process? Very large amounts of data result in challenges (read problems) that take time and effort to correct.
Clay Rehm's Answer: I don't know where that technique came from - based on how many sources, targets and transformations there are. That technique simply misses so many other factors, such as:
- What is the skill level of the programming resources
- Availability of resources (how many other projects are they working on, other time off for sickness and vacations)
- Who is doing the testing?
- Who is writing the test cases
- How well were the test cases written
- How well were the requirements written
- If the scope is changing yet?
- What is the level of data quality?
- What is the level of understanding of the data?
My suggestion is to perform some research before providing estimates. This can be done by:
- Reviewing how the data will be used,
- As well as reviewing the data in the data sources by writing queries and manually looking at the data, and
- Performing some simple pseudo coding of the solution first.
Sid Adelman is a principal in Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, in data warehouse and BI assessments, and in establishing effective data architectures and strategies. He is a regular speaker at DW conferences. Adelman chairs the "Ask the Experts" column on www.dmreview.com. He is a frequent contributor to journals that focus on data warehousing. He co-authored Data Warehouse Project Management and is the principal author on Impossible Data Warehouse Situations with Solutions from the Experts and Data Strategy. He can be reached at (818) 783-9634 or visit his Web site at www.sidadelman.com.
Clay Rehm, CCP, PMP, is president of Rehm Technology (www.rehmtech.com), a consulting firm specializing in data integration solutions. Rehm provides hands-on expertise in project management, assessments, methodologies, data modeling, database design, metadata and systems analysis, design and development. He has worked in multiple platforms and his experience spans operational and data warehouse environments. Rehm is a technical book editor and is a co-author of the book, Impossible Data Warehouse Situations with Solutions from the Experts. In addition, he is a Certified Computing Professional (CCP), a certified Project Management Professional (PMP), holds a Bachelors of Science degree in Computer Science and a Masters Degree in Software Engineering from Carroll College. He can be reached at clay.rehm@rehmtech.com.










Be the first to comment on this post using the section below.