The literature of our industry is replete with advice about why data warehouses fail and what makes data warehouses successful. One topic that gets only cursory coverage, however, is what to do when they are in the process of failing. In my own practice, this is a pretty robust business; and I initially wanted to evoke the image of Red Adair and his gang rushing off to do the impossible extinguish oil well fires. Some more thoughtful marketing minds prevailed and I dropped it; but, in a way, data warehouse rescue is very much like the Red Adair act. Projects are caught in limbo between success and failure, and the precise steps needed to save them are not clear to the people involved. However, one principle stands out above all others: Rescue efforts are not the place for revisioning or going back to square one. If rescue is called for, the only appropriate thing to do is to get things back on track or, in some cases, write the whole thing off. Not every failing project can be saved.
Failure rates of data warehouse projects have been discussed for years. One study claimed it to be more than half, perhaps as much as 80 to 90 percent. Two columnists in this magazine had a heated debate over those numbers, with one quoting his own survey pegging the number at five percent. Such a discrepancy just proves Mark Twain's point that "There are lies, damned lies and statistics." Whatever the number, I personally believe that outright failures of data warehouses are far less common today; but, by the same token, achieving an unqualified success is just as uncommon. Between the rescue efforts I've been involved with and those I'm aware of, two broad categories of missteps that lead to data warehouse purgatory are apparent.
The first category is architectural errors. Even when the execution of the architectural design has been flawless, implementing the wrong architecture in the first place leads to a multitude of problems. In a typical case, the organization has relatively straightforward BI needs that could be served by a simpler architecture; but the designers have been influenced to believe that a complex, multitier architecture, heavy on technology, is the only respectable (or defensible) thing to do. There are many situations for which this architecture is a good solution. Unfortunately, there are many more situations for which it is not.
In a common example where the architecture is too complex for the application, we find there are four distinct (relational) schema areas as follows, with as many as 10, 15 or 20 distinct schemas:
- Staging area.
- Operational data store (in fact, the ODS often has multiple schema, one for each operational system).
- Enterprise data warehouse (i.e., EDW another large database that is not queried directly).
- Data marts, multidimensional online analytical processing and other downstream extracts the last two with their own assortment of (nondimensional) schemas.
It's important to remember that each of these areas has its own extract, transform and load (ETL), its own database administrator requirements and adds its own layer of complexity and latency. Each component requires ongoing maintenance and the collection of them poses a significant barrier to enhancement. They drain resources from other productive pursuits of the data warehouse, such as expanding information delivery and driving change management, to ensure the organization is deriving the benefits it should from the effort.
It is difficult to turn this situation around because the owners of this overbuilt structure resist the idea that a simpler approach might be better. Rescue also requires scrapping a lot of hard work and investment, which can be a political hurdle. The easy part is the technical solution, which just involves getting them back to basics. Usually, a single schema and BI tools that can query the database directly are the solution. Even if there are more complicated requirements, it's good to triage the whole set of them (what we need to do now, what we can get to soon, what is nice to have). Pushed-back items typically include data marts or specialized tools/extracts for data mining, campaign management or industry-specific needs. Remember, in a rescue of this type, the point is to make the data warehouse functional at the business level by realigning the architecture, not to deliver all of the requirements. There are basically four steps for getting to that point:
- Understand the BI requirements and map a solution that will meet the needs.
- Work backward and realign the architecture to meet those needs.
- Reengineer the back end (schema, physical design and ETL).
- Hook up the BI components and deliver the agreed-upon functionality.
Another common problem can be simply a loss of focus on the business perspective. Data warehouse projects are always funded on the basis of what they can provide to the organization in terms of information delivery, analysis and enhancement, benefits that inure to the business community. These funding decisions are typically not based on database size, cool SQL, the application of some new technology longing for a home or the skewed interpretation of some data warehouse best practices. Sadly, this becomes the focus in too many cases instead of staying focused on the business perspective, which is concerned with the flow of information. This skewing of direction is usually the result of poorly informed decisions and inexperience. As a consultant myself, I've been involved in projects where we've been brought in to do strategy, scoping and perhaps vendor selection and initial data modeling. Then the project is taken over completely by inexperienced in-house staff or offshore outsourcing. I've seen this succeed, but only occasionally.
Rescue efforts are necessary when the gap between what the IT organization is delivering and what the business recognizes as its requirements is so wide that people are in danger of falling into it. Depending on the organization and the culture, this can spark 1) an earnest discussion with the goal of understanding the problem and proposing a solution, 2) denial by one or both parties that a problem exists, or 3) lack of acknowledgement that there is a problem because money was spent and nothing changed. The prospects of resuscitating the patient are the slimmest in the last case.
The course of remediation for these cases can go in many directions, but the first step is to get a hold of the reins and calm the horses. Then, regardless of what other problems you find, you have to re-engage the business and establish a new agenda for reengineering information access, analysis and collaboration, or at least the subset of that which was called for originally. Though there are only a few steps, they represent a great deal of effort, skill and facilitation.
Through interviews, existing material and extrapolation, establish a cognitive model of the informational needs of the affected organization. By cognitive model, we mean understand how people think about their work, not the work products themselves. If time and budget permit, work with people one-on- one to observe their work.
Quantify the skills that exist and match them to tools. Project the time and effort to improve the skills to the desired level of competence. Make recommendations about the bridging the temporary gap.
Develop the game plan and present it in neutral language, heavily weighted with observable, quantified facts that support your position. Arguing on principle is never useful. Do not under any circumstances say you have a Kimball warehouse and you need an Inmon warehouse or vice versa. These concepts are not useful at this point. What you need is a functioning warehouse and a plan to get one.
In almost every case that I am aware of where data warehouse rescue is called for, IT veers toward what it knows and is comfortable with, which is a technology solution, and fails to operate in partnership with the business. No technology implementation can be successful unless there is a partnership between IT and the business. However, just because it's a partnership doesn't imply it's an equal one. In most cases, IT must work much harder than the business. In order to get it right, IT must have a cognitive grasp of what people do and their informational needs. It's hard to believe that after 10 years of intensive growth in data warehousing and business intelligence, this is still a novel idea. A data warehouse can be a transformative effort. It can change things for the better.
Data warehouses were invented to provide business intelligence, but that is changing. There is a rising tide in the data warehousing industry, a version 2.0 of our methodologies, approaches and tools, that promises much more. The real-time enterprise, enterprise information integration, the merging of operational and analytical processing, the re-emergence of e-business all of these elements are coming together to expand and extend the usefulness of data warehousing. The downside is that all of those under-performing data warehouses will be flushed into the harsh light of day. Many installations that appear to be operating are, in fact, on the razor's edge, barely able to meet their scaled-down objectives. The next set of requirements will be a tipping point, sending them tumbling. Every practitioner needs to understand the techniques of data warehouse rescue. It's the next big thing.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access