This months column is excerpted from the white paper Modernizing and Advancing Information Management across the Enterprise by William McKnight.
The data warehouse faces a conundrum, and companies need to make a decision about it. One approach is to make the data warehouse real time, loading it in concert with operational structures and minimizing operational business intelligence (BI). This is simple in concept. However, making data warehouses real time can be an extremely complex challenge. Operational systems need to cooperate with this vision by not being so fragile that they break with intraday extracts. The data warehouse environment needs to be efficient to the point where the requested extracts are kept to a minimum. This is not always the case, so real time remains a challenge. Service-oriented architecture (SOA) has increasingly helped allow operational queries, but extracts remain a challenge.
Sometimes, however, the value-add of a real-time data warehouse may prove to be so enormous that a company can choose to actually replace its operational system with one that is more real-time data warehouse friendly. Generally, these systems can tolerate extracts while also performing real-time operations. The irony is that many of these modern enterprise resource planning (ERP) systems provide much more analytics than previous ones and also control many of the functions that were previously the domain of the data warehouse. Consequently, some companies have found themselves in the enviable position of having analytical abilities in operations as well as in a real-time-enabled data warehouse.
Most shops need to choose where analytics and BI will prevail and give the appropriate attention to operational BI as a result. They cannot ignore operational BI any longer. However, whether the majority of BI occurs in the data warehouse or the operational arena is in question.
A lot of corporate M&A activity has occurred in the past few years as well as virtual M&A within organizations finally ascribing a sense of need and value to looking at the overall business. This includes customers, products, parts, etc. across the organization. As a result, information leadership is increasingly going to look suspiciously at their multiple data warehouses. Should there be a consolidation effort? If the warehouses reveal unwanted redundancy, especially inconsistent, redundant data (i.e., two versions of gross profit), the answer is probably yes.
If there is little to no redundancy (i.e., there is a sales-focused data warehouse and a supply chain-focused data warehouse), there is still much benefit from analytical views of the data in both or all data warehouses, and it is likely that these needs are going to require physical cohabitation. While the detail data may be left alone in the warehouses, a federated layer may need to be added in the extraction, transformation and load (ETL) that physically meets those needs. If consolidation is needed, it must be justified on the basis of system cost savings or on the additional business benefit the consolidated data provides, such as a consolidated view of customer transactions across all touchpoints.
About half or more of third-party data brought into an information environment could have multiple uses - for example, D&B demographic information on customers and prospects. Most third-party data is analytic in nature and wont act in real time with customer demographics. However, it needs to interact with detailed transaction data to determine detailed customer and prospect profiles for operational and analytical use.
Much third-party data is added for its value proposition to post-operational analytics, and the data warehouse is a good leverage point for these analytics. Aside from being the only environment where attention is realistically going to be given to modeling for access, data quality, metadata and multiuse in general, the data warehouse should be the launching point for all post-operational information.
So, at the least, the data warehouse becomes both the historical data store for regulatory-required data (high volume, low operational use, infrequent query) and those exceptional and infrequent queries that 1) need all the data and 2) cannot be satisfied in earlier phases of the information lifecycle. The data warehouse is also where the summary data required for operational BI can be generated from the detail and provided back to the operational environment. At query time, this warehouse has terabytes of information at its disposal but should not have high concurrency needs. This paradigm is fit for one of the data warehouse appliances.
At most, the data warehouse goes operational, receiving real-time feeds from its sources and directly supporting BI in a near real-time manner. One major benefit to this approach is that all the data is available for analysis without needing summarization and ETL in reverse from the data warehouse to the operational system.Most database management system technology does not cooperate with this strategy. Most are appliance-like and not built for heavy interaction with other systems. This will slowly change. However, what the market will prefer is something that provides short-term lowest total cost of ownership and a smooth path into that approach. With the continued average tenures of CIOs two years or less, getting operational data warehouses up and running is an ongoing challenge.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access