During my computer science classes in college, professors typically presented program development as a process of transforming an algorithm into real code programmed in a programming language. The algorithms were almost always described in terms of a sequence of steps to be taken to solve the problem. At a higher level, my experience and training in algorithm design and division of labor suggested breaking each problem into a collection of discrete stages to be implemented in isolation. When all the stages are finished, they are combined to form the complete application.
However, this discretive process for the construction of applications leads to an assembly-line model of information processing in the way that data and partial results are forwarded from one processing stage to another. These processes take data (e.g., a transaction stream or extracted records from multiple data sets) as input and provide some product as output. That product can be a physical product (such as invoices to be sent to customers), a side effect (such as the settlement of a sequence of transactions) or an information product (such as a business intelligence report).
By developing applications in this manner, we impose a control structure on the way that information flows through processing. However, this control structure does not always reflect the true dependencies inherent within the original application. We may have decided to break a process into two stages that could truly have been executed in parallel, but an implementation decision may make one stage precede another. Now, 20 years later, the basic processing scheme remains the same, yet there have been multiple additions, changes and enhancements to the base processing system. Many decisions have imposed a strict flow of processing control, which may no longer reflect the true "data dependence" of the application. Therefore, when we attempt to dissect the way data sets are being used within the system, most analysts may throw up their hands in frustration.
An interesting information management tool that can be used effectively in this case is an information flow model. This modeling scheme characterizes how information flows throughout an application by describing the kinds of processing that take place and how data flows between these stages. This model is valuable because it provides a basis for distinguishing between data dependencies, control dependencies and artificially imposed implementation dependencies which, in turn, can lead toward flow optimization, identification of bottlenecks, finding locations for insertion of data validation monitors and opportunities for increased business analysis points.
In an information flow model, each processing stage is described as one of the following stage classes:
Data Supply where data suppliers forward information into the system.
Data Acquisition the stage that accepts data from external suppliers and injects it into the system.
Data Creation internal to the system, data may be generated and then forwarded to another processing stage.
Data Processing any stage that accepts input and generates output (as well as generating side effects).
Data Packaging any point at which information is collated, aggregated and summarized for reporting purposes.
Decision Making the point where human interaction is required.
Decision Implementation the stage where the decision made at a decision-making stage is executed, which may affect other processing stages or a data delivery stage.
Data Delivery the point where packaged information is delivered to a known data consumer.
Data Consumption as the data consumer is the ultimate user of processed information, the consumption stage is the exit stage of the system.
Data moves between stages through directed information channels pipelines indicating the flow of information from one processing stage to another and the direction in which data flows. An information flow model is represented by the combination of the processing stages connected by directed information channels. Once the flow model has been constructed, names are assigned to each of the stages and channels.
An information flow model can be used to identify the source of a data quality problem. The effects of a data quality problem might manifest themselves at different stages within an information flow, perhaps at different data consumption stages. However, what may appear to be multiple problems may all be related to a single point of failure that takes place earlier in the processing. By identifying a set of data-quality expectations and creating validation rules that can be imposed at the entry and exit of each processing stage, we can trace through the information flow model to the stage at which the data quality problem occurred. Fixing the problem at the source will have a beneficial effect across the board, as all subsequent manifestations should be eliminated!
The ability to evaluate a system for business intelligence opportunities from an abstracted point of view can add significant value to your business environment.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access