Is Your BI Problem Hiding in Your Data Warehouse?
Billions of dollars are spent each year on business intelligence tools. Every year, these tools improve, incorporating better reporting and analysis, visualization, dashboards, forecasting and planning. With all this advancement, shouldnt the use of BI be a productive, positive experience?
Yet according to the 2008 Forrester study, Current State of BI Adoption in Enterprises, only 40 percent of BI users find most or almost all of the information they need available and accessible from their BI applications. The BI tools do what they can with the data they have. Yet, if the data is incomplete, inconsistent, or inaccurate, even the most sophisticated tools can't provide managers with the right answers to basic business questions. Moreover, when business conditions change, managers have difficulty using their BI tools for answers on how changes will affect their business. Their BI tools can't access the data they need quickly enough to answer these questions.
An examination of the process used by most companies to develop their data warehouses reveals several points where disconnects occur and shows why changes are often long and costly. Fixing these disconnects is key to deriving business benefit from BI.
Lost in Translation
There are many areas of disconnect in the conventional data warehouse design process. BI's true potential can be realized if 10disconnects are avoided.
Disconnect #1: business requirements. Business users in DW and BI projects usually document their requirements with text, lists, diagrams and report descriptions. However, there can still be significant disagreement and conflict. Typical issues include disparities between departments such as different classifications of the same things, unclear business definitions and inconsistent or missing business rules. Too often, project leaders expect to define and agree upon an accurate set of business requirements in a single session. Given that business requirements can be difficult to articulate and frequently shift, such an expectation is not only unreasonable, it can be project-threatening from the start.
Disconnect #2: The logical model. Data architects combine their understanding of business requirements with their understanding of the source data to create a logical data model. Business representatives find it difficult to relate to this technical representation of the data they know, which results in uncertainty about the exact details of their business requirements. Because of this, business representatives tend to make few comments and corrections, then the data architect and the business representative agree on the logical model so that the project can progress without a full understanding of what they have defined together.
Disconnect #3: The physical design. The next step is to convert the logical model to a physical database schema design. Data modeler tools convert it to an initial physical design where database designers add indices and foreign keys, declare the primary keys and data types, and denormalize or normalize tables as needed. The architect may apply data warehouse design best practices, resulting in a design thats different from the logical model agreed with the business representatives. Complexity results in further discussions between the data warehousing professionals. The first version of the design is sealed in the modeling tool, which generates an impressive database creation script. This milestone in the project represents the point where the data architect hands over the responsibility of the design to a separate development team.
Disconnect #4: DBAs fine-tune the design. With the design controlled by the development team to build the data warehouse, the database creation script is run against the DBMS to create the data warehouse physical schema. At this stage, database administrators make additional edits for standardization or performance fine-tuning purposes. Edits may also be required to adjust the schema for the underlying DBMS and hardware platform. The business has no visibility into these changes because the changes are deemed to be just technical in nature. However, those changes may unintentionally impact the business requirements.
Disconnect #5: Changes to the design during development. During the development phase, the database design may need additional changes, often for technical reasons. When initial loading starts and tables are loaded with data, we discover issues like duplicate identifiers, nonstandard data values or cardinalities in the data are not as we expected. The development follows an iterative method, so early prototypes or identified issues are resolved with business representatives who may propose changes. Where will all these be applied during development? Should the project manager incorporate them into the data modeling tool and then recreate the development instance? Very few go back to the data modeler because of time or resource restrictions. Nearly all such changes identified during the development phase are directly applied to the development instance. When this occurs, the logical model becomes totally disconnected from the physical database, and we no longer have a logical record that ties to a physical instance.
Disconnect #6: The data modeling tool is offline! Todays data modeling tools allow changes to the logical or physical model and then creation of a delta DDL (database modification script in Data Definition Language). However, there are changes that are not supported. Even if a delta DDL is created, it is unlikely to work on the development DBMS because of changes made directly to the production instance. Furthermore, graphical data modeling tools are available for use by data architects in the enterprise architecture group whereas the development project team may not be using or given access to these tools.
Most data architects are not involved in changes subsequent to their initial design. The best organizations would still capture the as-built design back into the data modeling tool perhaps using reverse-engineering but such a best practice still serves only documentation purposes. The business representatives have already been cut off because the IT experts are talking in a different language now, and they are waiting for the user acceptance test anyway. Changes proposed after the UAT, and then any changes implemented during the production life of a data warehouse, face the same treatment; they are applied in the DBMS platform, not in the data modeler tool.
Disconnect #7: Semantics left in the business dictionary. Experienced architects and business analyst representatives, keen on capturing the meanings of their business information objects and business rules, document these in either the data modeling tools or in a separate business dictionary document. Very few data warehouse projects convey these semantic definitions into their development instances. Projects rarely include automatic carryover of all semantic definitions all the way from the modeling tool into BI reports - end-to-end software from sources to BI is too fragmented in conventional data warehousing. The business users see a complex and confusing set of terms in each aspect of the system and give up on keeping the definitions and rule descriptions together with data.
Disconnect #8: Inadequate master data management practices. When data is received from multiple sources, we encounter data quality and integration issues, particularly concerning the shared master data. Examples include incomplete and inconsistent information about entities like suppliers, products, key performance indicators or financial data. Most data warehousing projects approach this issue from a data quality angle and attempt to correct master data inconsistencies using unmanaged lookup tables, transformation rules and extract, transform and load. This is a major issue for business users expecting a harmonized and complete view from their reports. The reports may seem consistent if inconsistency has been patched by ETL, but they are not accurate and dont reflect the business.
Disconnect #9: Inconsistencies in the BI layer. Most BI tools offer a conceptual layer enabling the definition of business objects in business terms, as well as the mapping of objects back to the data objects in the warehouses, data marts or other sources. These architectural components are a buffer between all the underlying complexity and the reports that users work with.
But this architectural layer is fragmented by nature because it does not hold consistent data across the business. There are generally multiple universes, each designed for a limited subscope and certain group of users. There is no guarantee that a business rule hardwired in an object is consistent with a piece of SQL supposedly mapping to the same object in another report. Some BI tools provide the flexibility (or risk) of having a separate conceptual layer component for each report. These are defined with yet another interface where the semantic layer is regenerated from the technical metadata of the data sources, adding to the complexity. Furthermore, these conceptual layer components are independently maintained, disconnecting them from their main source warehouses and marts.
Disconnect #10: Responsiveness to business changes. When the business identifies new BI requests - such as new view of sales/profitability, an organizational change or acquisitions, - numerous parts of the data warehouse infrastructure need to be changed rapidly. Also, the business wants to keep a corporate memory so they can meaningfully compare what they planned last year to the latest view of the business today. How fast can IT respond to these requests? Weeks? Months? Will the design and development principles stay intact during those rush periods?
Going Beyond the Conventional
For the most part, the disconnects described here are caused by two primary faults of the conventional BI/DW design process.
First, business users are brought in at the very beginning, sporadically through the middle, and at the very end of what can be an extremely long process. Theyre asked their requirements (as if they know exactly what theyre looking for and those things will not change) and asked to confirm those requirements when presented to them in a language and format they cannot be expected to understand. They arent consulted when minor tweaks are made in the design and development process, and there is little room for the changes to the business requirements that inevitably occur in the months between consultation and delivery. The obvious first step is to bring the business user into the fold early and often throughout the process rather than bookending it. By keeping the business intimately involved and engaged, the project has far more chance for continued success.
The second major issue is the rigidity of the system and the isolation of each of its parts. Each stage represents a whole new set of players, with new tools, new understandings and new interpretations of the business requirements. The result is like a game of telephone with 40 people who speak different languages. The key for this second issue is to develop and adhere to a business model as a constant.
Business models capture the way a business works in a medium that can easily be understood by both business managers and IT. A business-model-driven approach to BI can automate dynamic modeling of a data warehouses data structure, making the translation from business model to data model to data warehouse structure nearly automatic.
More importantly, if the data structure is actually based upon and driven by a business model, then changes to the business model can be automatically reflected in the data structure. This limits the issues that come with changing a data warehouses schema, and speeds the BI systems delivery of answers to new business questions. This approach makes business intelligence far more dynamic.
The disconnects described have become ingrained in the conventional development process. Identifying them and attempting to address them will take us one step toward bridging the greater IT/business gap we all face.