Continue in 2 seconds

Data Warehousing Trends for 2007

  • November 15 2006, 1:00am EST

It is time to get out the crystal ball again, see what comes into focus in its dynamic depths and consider data warehousing trends for the year ahead.

Data warehousing trends will track the architecture of business intelligence (BI) systems as a whole with innovations in the front end, middle and back end. Data warehousing will move beyond data integration - in some cases without having fully mastered it - to business and technology integration. The already substantial pressure on CIOs and information technology staffs to close the gap between the business and IT will increase to an even more extreme intensity. Today all business processes have software and IT procedures that are essential enablers, facilitators and implementers on the critical path to results. Among these enabling systems, data warehouses are the first among equals in that they enable advanced applications such as forecasting, performance management and working smarter in one hundred and one ways. In turn, these are on the critical path to business model innovation which itself is a priority for the CEO's agenda of increasing top-line revenue. According to the IBM Global CEO Survey: Expanding the Innovation Horizon, firms growing their operating margins faster than the competition were more than twice as likely to emphasize business model innovation as the underperformers (see Figure 1).1 To succeed at this, the business will need to continue to team up with business intelligence. No scare quotes - this will happen in the reality of enterprise operations.

Figure 1: Innovation Priorities of Under Performers versus Out Performers. Source: IBM Global CEO Survey: Expanding the Innovation Horizon

At the front end, business value will continue migrating to the user interface. Regardless of the amount of up-stream data integration - which will continue to increase substantially in complexity and volume point - when the user gets an answer to a business question, he or she credits the information delivery layer, even if the real work already occurred elsewhere. Innovations in structured search and multipass SQL ("relational OLAP") will outperform proprietary multidimensional databases in delivering business value to the front end. Power users will like the "what if" and forecasting capabilities of proprietary data cubes but be disenchanted with the latency (delay) in loading and maintaining them. Business analysts (users) will continue to be willing to pay a premium for the ability to pose seemingly simple, but actually intricate, business questions through end-user self service on their own. However, a more balanced view will emerge that end-user self service is not for everyone and that rapid application development (RAD) prototyping, portals, dashboards and scorecards are just what is needed by many classes of users such as executives, administrators and line-of-business managers who know what they want to know. No one - or very few - will build front-end applications from scratch, but quick start solutions and RAD tools will gain traction.

Get ready for at least one surprise at the front end. The supposed megatrend of BI for the masses will disclose that "the masses" are not what they used to be. Instead, a diversity of different kinds of users - with different requirements, needs and jobs to do - will make clear that more than one tool or technology is required to transform dumb data into enterprise intelligence. Merely buying a branded suite of tools from a single vendor does not solve the problem of data integration, though it does have the advantage of "one throat to choke" if service or product issues arise. At the front end, different classes of users will still need more than one kind of tool to accommodate their different requirements. Executives will favor scorecards and dashboards, power users will leverage their data cubes and OLAP notwithstanding their latency, specialists will require advanced analytics for predictive analysis, clerk administrators will value standard predefined reporting, and all users will benefit from proactive notification and alerts.

The middle layer is where innovations in metadata, messaging and semantic analysis of meaning will make a difference in productivity, efficiency and system integration. Even as business value migrates to the front end and architecture has the buzz, plumbers will still find a market for their infrastructure plays. Extract, transform and load (ETL) technology, message brokers and information integration will enable real time (dynamic) data warehousing, which is a trend that continues to make steady, incremental progress and has taken off. No end-user enterprise without a high performance message broker will be able to perform the sustained real-time and near real-time update required to squeeze latency out of the system and implement dynamic data warehousing, and no vendor without one (or a close partner with one) will be able to maintain its credentials in serving such enterprises. The ability to perform memory-to-memory transfer by locating the message broker on one of the nodes of the shared nothing processor that forms the data hub of the data warehouse is a proven tactic to perform dynamic, low-latency update. In addition, capabilities in parallel ETL, metadata and on-the-fly information integration make a useful complement to basic centralized, atomic data warehousing capabilities. In any case, ETL is an ideal way to connect the stage of data transformation with the underlying technology.2

Through trial and error, enterprises will understand that system integration costs track closely the number of interfaces between up- and down-stream transactional and BI systems. The more system interfaces, the more the cost. Absent a data hub through which to rationalize data transport with an ETL tool, the number of connections can grow super-linearly as many-to-many interfaces multiply geometrically, putting the IT department on a slope of diminishing returns, running faster and faster just to stay even with maintenance. By capturing the rules of system interoperation as well as the meaning of system data elements to a dynamic metadata repository, productivity improvements can be gained directly through metadata based impact analysis. Similar considerations apply to enterprise application integration (EAI) based on message brokers and on-the-fly information integration.

"Dynamic" refers to metadata that is used to generate system components, the most common example being where a logical data model is able to generate data definition language (DDL) structures for any given standard relational database such as IBM DB2, Oracle 10g or Microsoft SQL Server 2003. Metadata-driven design will continue to improve traceability of all sorts of system components, process integration and impact analysis, which will enhance developer productivity and reduce project coordination costs. The open source revolution will march on encompassing tools for the rationalization of system processes and methods, for example, in the form of the Eclipse Modeling Framework (EMF), providing a useful backbone to model-driven software development. In every case - ETL, EAI, enterprise information integration (EII) - advances in metadata will lead to connecting the dots between the infrastructure and the answers to business questions, such as what customers are buying or using what product and service and when and where they are doing so.

At the back end going forward, virtualization and autonomic management of systems will continue to lower - yes, lower - the bar on operational excellence by extending flexibility and reducing the need for human intervention and manual micromanagement of system configurations and resources. Incremental innovations such as improved compression algorithms will help reduce data warehousing obesity, though data volumes will continue to grow explosively. Deploying and implementing the native data type XML in the relational database will enable enterprises to capture and structure content that was previously unstructured, making it tractable for business processing in the data warehouse. This too will help narrow the gap between business and IT in delivering information in actionable packages and recommendations.

The special purpose, prescriptively defined, shared nothing database - now commonly known as the data warehousing appliance - will continue to have market traction with discounting reaching the point of no return. In the short term, this will result in the proliferation of singleton data marts, hastily implemented to address a tactical business pain, scratch a technology itch or demonstrate to a big vendor that it is not the only game in town. This is well and good. However, the intensifying competition will include appliance entries from the "big guys"- many already in production in client installations - resulting in the early innovators being co-opted by a second wave of major power players such as HP, IBM and Sun with significantly reduced business risk, superior maintenance and a coherent roadmap, all for a modest premium.

Services science will come to data warehousing - and vice versa. Both are inherently cross-functional. Both require a grasp of the underlying business process as well as the technology. Both are inherently "T" shaped - both broad across a number of disciplines as well as able to go deep in a given area (whether finance, marketing, production or customer relations). This will be driven by disenchantment with application functionality. End-user enterprises will continue to purchase applications because they have nowhere else to turn, but disenchantment will also continue to grow because it seems as though everything you really need is always in the next release. Instead, solutions rather than applications will look increasingly attractive because they allow for significant out-of-the-box functionality with the flexibility to address requirements that capture the competitive advantages of the enterprise.

End of life issues will surface for BI applications running on legacy client/server platforms, including enterprise resource planning (ERP) systems with such client/server architectures. Applications will be caught between a rock and a hard place. Granted that no one builds from scratch, applications still require customization to accommodate end-user enterprise requirements. The initiative will shift to quick-start data warehousing solutions that incorporate a data model and a front-end tool to deliver the best of both worlds.

This will enable enterprises to connect the dots between the two realms of business and technology with the data warehouse as the point of intersection. Separate initiatives such as business process management (BPM), service-oriented architecture (SOA) and master data management (MDM) will each in their own way raise the bar on and present challenges to data warehousing. BPM will leverage the data warehouse as a source of persisting key performance indicators; SOA will put this information over the wire as a service and enable action at a distance; and MDM will provide clean, consistent representations of customers, products and other essential business entities in support of advanced business intelligence. On the business side, sales and marketing will connect the dots between the business question "Which customers are leaving and why?" and the BI available from the data warehouse. Finance will connect the dots between the question "Which clients, products and categories are the profit winners and which are the profit losers?" with the consistent, unified view of customer and product master data in the warehouse. Operations will connect the dots between the questions about supplier and procurement efficiency, stock outages, capital risks and reserves, dynamic pricing and the aggregations of transactional data in the warehouse.


  1. IBM Global CEO Survey: Expanding the Innovation Horizon:
  2. Lou Agosta. "The Data Strategy Adviser: Six Degrees of Separation: Connecting the Dots Between Business and Technology using ETL." 19 October 2006.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access