Douglas MacArthur said it best "There is no substitute for victory." A paraphrase of that famous line is one that can be used in our profession: There is no substitute for reality. All of the theory and all of the talk about data warehousing and the corporate information factory are good; however, periodically a good hard dose of reality confirms and/or denies what we consultants and theoreticians have been proposing. Reality has a way of teaching us when nothing else seems to get through.
Recently I had the opportunity to visit Verizon in Dallas, Texas. In one form or another, they have been working on data warehousing for a number of years. Through starts and stops, they have several data warehouse/DSS initiatives. Some of the initiatives are more mature than others, but all seem to be heading the right direction. Their efforts are bearing fruit quite nicely.
In particular, the "DSS data warehouse" which has been built for the Yellow Pages portion of the company seems to be the furthest along. Following are some of the observations that I made while visiting the company.
True to form, data warehouses are not built overnight. To be successful, you must build iterations of data warehouses, and you have to align the data warehouse effort with the business. A data warehouse built by technicians alone is no better than a data warehouse built by business people alone. Proper construction of a data warehouse requires a combination of both technical and business people working together. In this case, Verizon was entirely in sync with the established wisdom in the industry.
"Build it and they will come." Create a basis of useful and available information, let the world know about it and then step back. It is probably true that Verizon has done an above average job of internal marketing of their own data warehouse efforts. Notwithstanding that effort, once the organization discovers that there indeed is a collection of available, historical, integrated information, the data warehouse turns into a popular font of information.
There is a marked and real difference between operational reporting and informational reporting. Operational reporting is for clerical activities at the detailed level. Informational reporting is for the management level, where strategic and tactical thinking occurs. Informational reporting depends on historical data, integrated data, and both detailed and summarized data.
Verizon is starting to struggle with the issue of the need for an exploration/data-mining facility. Currently, ad hoc queries are lumped in with standard data mart loads and other processing. As expected, occasionally performance suffers. As the number of users grows and the volume of data grows, it is predictable that the need for a separate exploration/data-mining facility will become even more apparent. A good first step to determining the nature of the ad hoc processing is to start monitoring the usage of the data through an activity tracker.
These observations were entirely consistent with the conventional wisdom of the industry. However, there were a few departures from conventional wisdom, and these departures were as instructive as they were interesting. Rather than contradicting conventional wisdom however, these departures add to conventional wisdom.
The first interesting slight departure from the conventional data warehouse experience at Verizon is in the building of a data mart as the first step in data warehousing. Over time, the data mart is converted to a full-fledged data warehouse. Conventional wisdom states that you build the data warehouse first, then you build the dependent data marts from the data found in the data warehouse. In theory, that is how you should do things and for some pretty good reasons.
However, the reality at Verizon was that they built a data mart first and then over time mutated that data mart into a real data warehouse. They have done a good job and produced a success. Following are the circumstances that allowed Verizon's approach to succeed:
They started with a single integrated source.
They are still using the same technology Oracle that they started with.
The initial design was a star schema, but initially it contained a lot of data in a common format. From the beginning, the schema was designed to support multiple requirements and multiple uses.
There have been constant additions of new types of data to the star schema. Throughout time, the star schema has been designed to support multiple uses of the data.
Users are asked to pay for the changes and alterations that have been made to the star schema as a result of adding dependent data marts and servicing new requirements.
The original amount of data was modest.
Under these circumstances, the practice of building a data mart and mutating the data mart into a data warehouse has produced very satisfactory results. Reality suggests that perhaps under the proper conditions, it is possible to start with a data mart and turn it into a data warehouse.
Verizon had another innovation that was very impressive. In their data warehouse, they have two kinds of data. There is the standard detailed data that comes directly from the ETL process. This data is stored in a relational-like normalized format. Verizon called this "native" data. The other data is called the "reporting" data. The reporting data is essentially data in a star schema format and arranged so that it can feed the needs of several data marts and other users. The reporting data is "prestructured" so that when data marts need to go to the data warehouse, the data has already been preconditioned. Verizon calls this data their "VZ" files. There is only one basic star schema rendition of reporting data for Verizon, and that single star schema serves many different users and many different purposes.
Both VZ data and native data are available to the users of the data warehouse. However, the majority of the accesses to the data warehouse look at the VZ data. Only rarely do the accesses of data in the data warehouse need data found in the native files.
Storing the data in two forms in the data warehouse is a departure from conventional wisdom but it works and serves a very real purpose. It makes the data warehouse an easy place to access and do business. Maybe it is time that we added this technique to conventional wisdom.
The new twists that were the reality of Verizon ought to be added to the conventional wisdom of data warehousing. After all, there is no substitute for reality.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access