To advance from traditional data warehousing to dynamic data warehousing requires a low-latency environment, not just a large atomic data warehouse. Most of the capabilities required to get from vanilla data warehousing to near real-time, on-demand data warehousing are implemented by the proprietary data warehouse vendors as workarounds to a legacy database and operating system.
The proprietary data warehouse vendors make it sound like loading inconsistent, diverse data into its atomic data store automatically rationalizes it and renders it consistent. Not so. What is required is a dynamic ecosystem that makes data warehousing as simple as front, middle and back end. The time horizon extends upstream and downstream to encompass the atomic data warehouse in an architecture designed to reduce latency at key points in the information supply chain. Key chokepoints that create delay are in-bound processing, data rationalization, closing the loop between transactional and business intelligence systems and information delivery.
In order to succeed at dynamic data warehousing installations require several capabilities in additional to a large, active parallel data store:
High performance in-bound processing. High performance, premium extract, transform and load (ETL) technology is required and not just for a wide bandwidth input. New data sources are being brought onto the stream and existing data sources are being changed or removed periodically. Few database utilities have the flexibility to handle heterogeneous data without the transformational and metadata capabilities of an ETL tool as a front end to them.
Ability to rationalize and normalize data. If an enterprise has 10 different customer master files and nine product files from diverse sources, then open architecture and design tools such as those found within the Eclipse framework will be needed to derive a consistent unified representation of the customer and product that can then be implemented in the dynamic data warehouse. This is required to capture metadata, the rule of interoperation and meaning, to a dynamic repository to enable system maintenance, impact analysis and productivity improvements. This also points in the direction of mastering master data management.1
Closed-loop processing. Traditional data warehousing goes from transactional systems to decision support systems, aggregating data to track market trends, customer profitability, forecasting or demand planning. Dynamic data warehousing uses decision support to optimize transactional systems, such as sourcing a demand plan or issuing interactive promotions to key customers upon point of contact. This requires a design for closed-loop processing that leverages message brokers, ETL or other system interfaces.
Ability to send alerts to a variety of different devices and appliances such as personal digital assistants (PDAs). No database alone can do this. This is a perfect example of proprietary data warehousing vendors moving beyond FUD to hype - pure and simple. Broadcasting to remote devices is a function of a messaging broker or a broadcast server either from one of the established infrastructure vendors or one of the partners. The glossy brochure from the proprietary data warehouse vendors is vague on this point and lets the reader assume it is a function of the database. Not so.
In conclusion, recommendations to move beyond active to dynamic data warehousing include:
- Get a database with a transaction processing heritage and more. Look for a database with a transaction processing heritage (OLTP) that has innovated to perform the heavy lifting of decision support queries through partitioning, parallelism and optimization for business intelligence (BI). As BI applications slide in the direction of advanced queries for forecasting, fraud detection, interactive recommendations and low-latency customer responses, the advantage shifts to the database with a transactional heritage. Proprietary, special purpose, back-end query machines will be increasingly marginalized as transactional processing converges with business intelligence into operational decision support. The shared-nothing architecture of a balanced configuration data warehousing appliance brings the advantages of real-time processing to multiterabyte volume points. Active data warehousing means dynamic, on-demand data warehousing.
- Change the context - focus on business use cases. Examples of advanced business solutions that end-user enterprises have implemented include proactive alerts that anticipate and avoid out of stock situations; calculations of customer lifetime value; interactive product recommendations to customers at time of purchase; using the demand plan to source the forecast; cross-selling and up-selling based on market basket analysis. In every case, the information supply chain is able to connect the dots between the underlying technology - the data warehouse - and the end user thanks to superior in bound ETL, messaging, data transport and information delivery capabilities. Simple as that.
- Define "real time" as a function of the service level agreement (SLA). Dynamic data warehousing requires a dynamic enterprise. If an enterprise is managing a supply chain that is one week long - a week is required to get the product from raw material to store shelf - then a minute-by-minute update of inventory will be just so much static in the communication channel. A robust, overnight batch process will do the job nicely, thank you. Even the Teradata white paper prepared by Desmond A. Martin entitled, "Active Data Warehouse: Where Agile Retailers Win by Capitalizing on Time," acknowledges, "There will be a significant human element to this retooling."2 However, such acknowledgement is well buried in the text and the reader is left to infer naively that if you buy one, then the enterprise automatically gets an active data warehouse. Debunk the implication by pointing out real time and near real-time data warehousing starts with the definition of time used by the business and captured by a service level agreement (SLA). Apply the distinctions between real time, on time and right time early and often. If a large batch process prepares customer recommendations in the background the evening prior to presenting them in an interaction with a client on the phone, then the real time interaction is simulated (see Figure 1.) Find a vendor who excels in services science and the negotiating and implementing of SLAs, representing IT's commitment to the business.
- Get back to basics - demonstrate knowledge of the basics. The ratio of hype to fact is high in the area of active data warehousing. The proprietary data warehousing vendors' glossy brochures describe real-time scenarios and let the reader assume that if you buy one, then you will automatically get real-time everything. Be sure to read the fine print - as I have said previously, it is simply not so. In reply, point out that by using a balanced configuration appliance from a standard relational database vendor you only need one kind of database to handle both transactional and decision support systems. Dynamic data warehousing closes the loop between the two, enabling business intelligence results to optimize transactional, line-of-business processing. The atomic, high-performance database is one component - albeit an important one - of an overall information management strategy to reduce and manage latency, which includes, ETL, federation, master data management, metadata and messaging delivery.
1. Mastering Master Data Management http://www.dmreview.com/article_sub.cfm?articleID=1060131
2. Desmond A. Martin. "Active Data Warehouse: Where Agile Retailers Win by Capitalizing on Time." Teradata. August 2006. p. 11.
Lou Agosta, Ph.D., is a business intelligence strategist with IBM World Wide Business Intelligence Solutions, focusing on competitive dynamics. He is a former industry analyst with Giga Information Group and has served many years in previous careers in the trenches as a database administrator. His book The Essential Guide to Data Warehousing is published by Prentice Hall. This article is © IBM. Agosta would like to hear from you, so please send comments and questions to him at LoAgosta@us.ibm.com.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access