How to Calculate Data Warehouse Reliability
InfoManagement Direct, September 10, 2009
Data Flow Architecture

Reliability Revisited
R = 1/((1/R1 + 1/R2 + 1/R3 + 1/R4 + 1/R5+ +1/Rn)/n)
where Ri = reliability of the individual components of the system.
Applying the above principles to the data flow within a data warehousing framework, there are two basic variables that impact the overall reliability. They are:
Advertisement
n = number of stages or data flow components from start to end, and
Ri = reliability within a stage.
Number of Stages
The number of stages is very important since the overall reliability reduces as the number of stages increases. Think of a system with three stages in a series with each stage having a reliability factor of 0.9. The overall reliability of the system equals 0.73 (0.9*0.9*0.9). Add another stage to the system, and the reliability is reduced to 0.65 (=0.9*0.9*0.9*0.9).
Reliability of Individual Components
The following section describes how to compute reliability at each stage of ETL within the data flow. In all of these cases, it is presumed that the programs (new and old) have been well tested and validated with an estimated reliability of 100 percent.
Case 1
Extraction is done from the source system and stored in the form of flat files on a server/partition/folder; another ETL program uses this flat file to transform and load to tables (refer to Figure 2).

Reliability of this type of ETL stage can be computed as:
Ri = r1 * r2 * r3 where
r1 = Reliability attached to process 1 in Figure 2, which for all practical purposes can be assumed to be the availability of the source system based on the past experience.
r2 = Reliability attached to process 2 in Figure 2, which could be availability of the server/partition/folder storing the flat file.
r3 = Reliability attached to process 3 in Figure 2. This is equal to the availability of the server/partition/folder on which the destination tables are located.
Case 2
One program does ETL (refer to Figure 3). The reliability of such a stage can be computed as
Ri = r1 * r2 where
r1 = reliability attached to process A in Figure 3, which is equal to the availability of the source system.
r2 = reliability of process B in Figure 3, which can be assumed to be the availability of the destination server for all practical purposes.

A quick comparison of the above two cases reveals that case 1 will always yield lower reliability compared to case 2.
How to Compute: An Example
Page 1 of 2.







