It has long been known that the data warehouse environment, with its many components, is best developed in an iterative manner. But there are some interesting aspects of the iterative development process itself which do not immediately meet the eye. This column will explore some of the more interesting features of the iterative nature of data warehouse development.

Iterative Development

What is meant by iterative development? Iterative development is the opposite of classical, requirements-driven development where activities occur in a "waterfall" manner. With iterative development, limited results are achieved in short, fast bursts and only a fraction of the project is developed in any one effort. Instead of a waterfall, iterative development is often said to occur in a circular manner.

Fitting the Pieces Together

One of the secrets of iterative development is ensuring that the different iterations fit together. The coordination of these different iterations is achieved by developing each burst from a common road map. In data warehouses that road map is a data model. Each iteration of data warehouse development focuses on a different part of the road map.

The coordination required to ensure that the different parts of the warehouse are being developed is achieved by following the road map assiduously. Without a common road map to direct the autonomous iterative development efforts, coordination is difficult. Iterative development without a unifying road map looks like a quilt made by blind sewers, while the results of an iterative data warehouse constructed with a well-designed road map look like a highly integrated jigsaw puzzle.

How Much Redevelopment is There?

The second interesting aspect of iterative development for the data warehouse is that oftentimes 90 percent of the developed product for the first iteration will not be right and will be discarded. This is why building the first iteration of development in a small, fast manner ­ rather than in a "big bang" fashion ­ makes so much sense. If you know you are going to have to throw something (or most of something) away, why not throw away a small effort rather than throw away a large effort. The real purpose of the first iteration of development is to start the end user down the road of the "discovery" process. The end users start at a point of being unable to say what they want until they see the possibilities. Until the first iteration of data warehouse development is manifest, there is no way the end users can start the "Aha!" process. The fact that 90 percent of the initial development effort will not be used is not an indication of waste. Instead, the non-usage of the data is an indication that progress is being made and that the end users are starting to discover the requirements of a decision support system.

How do Iterations Progress?

Does every iteration waste 90 percent of the development effort? Interestingly, as iterations progress, the percentage of wasted development decreases. The second iteration may experience a 75 percent waste factor, the third iteration may experience a 50 percent waste factor and the fourth iteration may experience as little as a 20 percent waste factor. As the end users begin to understand the data warehousing possibilities, they start to project much more reasonable requests and requirements. With the different iterations of development, the end users go through a steep learning curve and improve their ability to articulate requirements.

In coincidence with end users being able to project requirements better, the speed with which the requirements are articulated and the speed with which development occurs slows down. To some extent this slowdown is due to the fact that the developer must rework old material rather than start with a clean slate. It always takes longer to work from a basis already established than it does to work from a clean foundation.

Heuristic Development

At the foundation of the iterative development process is the notion of heuristic development, which is the style of developing systems where the results of any one step determine the requirements for the next step. You cannot plan a heuristic development effort. You allocate resources and use experienced people. Some of the time you achieve what you set out to do in a shorter amount of time than what you had allocated. In other cases, your development effort requires more time and resources than originally allocated. With heuristic development you cannot look beyond the nearest horizon of development. Each step of development is its own development project ­ its own iteration.

Iterative development applies to several places in the DSS environment: the building of the infrastructure ­ the data warehouse ­ as well as to the usage of the data in the warehouse.

Iterative analysis applies particularly well to the sampling techniques often used with large data warehouses. In this case, a small sample of a large data warehouse is extracted. The DSS analyst then uses the sample iteratively in order to develop the requirements. Iterative analysis is done against the small sample in an efficient fashion. Once the DSS analyst is satisfied that the requirements are known, the analysis is run against the large data warehouse. In such a fashion the DSS analyst has his/her cake and eats it too, because the DSS analyst has the benefit of being able to do iterative analysis quickly against the small sample and the ability to run the final analysis against the entire data warehouse (if needed and if desired).

One of the interesting issues of iterative development is whether the iterative development process applies to all components of the data warehouse. Indeed, some of the architectural components of the DSS environment do not apply as steadfastly to iterative development as others. Certainly the enterprise data warehouse is best built iteratively, and exploration warehouses are almost always built iteratively (in some ways it is hard to imagine an exploration warehouse that is not built iteratively). But data marts and ODS structures must have the input of requirements, so they are somewhat less subject to iterative development.

Iterative development is the backbone of development for the DSS data warehouse environment. Iterative development applies across the architecture, but in varying degrees to different structures. Iterative development is often used as the basis for starting the end users down the path of discovery.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access