DEC 1, 2005 1:00am ET

Related Links

10 Sustainability Predictions for 2011
February 23, 2011
A Letter to Future Employees: Embrace Analytics
February 3, 2011
A Hunger for Risk
January 6, 2011

Web Seminars

Suit Yourself: An Effective Recipe for Self-Service Analytics
March 20, 2012
How to Narrow the IT/Business Communication Gap
March 21, 2012
Enhance and Expand BI with Mobile
Available On Demand

The Enterprise Data Warehouse Strikes Again, Part 2

Print
Reprints
Email

As we discussed in my previous column, big data warehouses are back in style. In a 2004 TDWI study, 60 percent of respondents indicated that they were adopting a centralized data warehousing architecture, i.e., using a single enterprise data warehouse (EDW) with no associated data marts or operational data store (ODS) systems. We have come full circle to the early 1990s, when this approach led the initial data warehousing wave.

In addition to the motivation of consolidation and cost cutting, many people felt that an EDW would be the easiest path to creating the elusive "single version of the truth." After all, it lets you avoid creating those pesky data marts, not to mention managing the associated data integration to ensure that all their reports and analysis are consistent. Many see the data silos - multiple data warehouses, data marts, ODSs and cubes - created during the Internet boom and cry, "No more!" They think it's overcomplicated and want to put everything into one data warehouse that is integrated and consistent. Because a centralized, standalone EDW is an easy architecture to describe and understand, it seems like the clear path to take. But is it really that simple?

Will EDWs become a best practice, and is that a good idea? Will the early successes of those adopting an EDW lead others toward this architecture?

An EDW was the first generation of data warehousing. It was simple and everyone could understand it. But that generation faced implementation and scope overload. Many projects failed outright or didn't live up to expectations. People today think they can do better. They think that with today's leading-edge thinking and technology they won't have the same problems. They've got databases and extract, transform and load (ETL) tools that handle huge volumes of data; real-time updates and access are now available from enterprise application integration (EAI) and enterprise information integration (EII) tools; business intelligence (BI) tools can access data from anywhere, anytime and any way they want. In addition, people have the experience gained from multiple generations of data warehouses that the first generation did not have.

However, just having the technology and experience now doesn't dictate that an EDW is a good idea.

Simplicity in a design is terrific, if it is the right approach. An EDW eliminates the complexity and expense of creating data marts with a hub-and-spoke architecture, but it is only the right choice if it meets your business users' needs.

Why Should We Complicate the Architecture?

Why did we have data warehouses, data marts, operational data stores and cubes - was it because we could? Was it because we couldn't agree on the EDW so everyone built their own? There is a little bit of truth to the first statement and, unfortunately, more truth than we care to admit for the latter. However, there is a method to the madness.

Data warehouses are built to integrate data from multiple source systems, cleanse the data, ensure consistency across the multiple sources and store historical data. The DW enables the data distribution that establishes an enterprise's information backbone. Data marts are built to subset, aggregate and perform transformations for their target information consumers (business users). A data mart's typical audience is a business function (such as finance) or a set of business processes (such as supply chain management). Data marts are really data franchises pulling data from your data warehouse and configuring, packaging and distributing it for the business community. OLAP cubes are also data franchises derived from data warehouses or data marts. These cubes can be considered mini-data marts.

An EDW is quicker and cheaper to build than a DW with associated data marts (hub-and-spoke architecture) at first blush. A more realistic cost comparison needs to take a holistic enterprise view that examines the total cost of ownership (TCO) of providing information for the business users to create their reports and perform their business analysis. The holistic view needs to account for the flow of data from its point of creation in the enterprise until it is consumed by the business users in their reports or analysis. Every time the data is accessed, extracted and transformed, it costs the enterprise time, resources and money. More importantly, it has the potential to reduce business productivity and lessen their ability to manage the business, because users may be spending time gathering and transforming data rather than analyzing the information and managing the business.

When you develop an EDW, you are building a database to perform data collection, cleansing and integration as well as data franchising and reporting. The same database is used as the vehicle to source, cleanse, transform, integrate, aggregate and provide access to the information that will eventually feed reports and business analysis. Databases have advanced significantly from the first generation of data warehouses, but have our design skills? When you design something to do all things, it may not do them all well. In fact, sometimes it does not do anything well. Let's look at what results from an all-in-one EDW.

The Loss of Specialization

The data franchising approach - building data marts and cubes from a data warehouse - initially takes more time to build. However, this architecture is a very efficient method to prepare the data for business reports and the analytics that business users perform. Without this data franchising, each report and business analysis requires repeatedly performing filtering, aggregation, denormalization and data transformation that could have been done once in their data mart. Regardless of whether the IT group builds them or not, the data franchising processes must occur. The question is: where do you want them to occur? What is the most effective and efficient approach?

If you don't have data marts, data franchising needs to take place within the BI reports generated by the EDW. It means that the reports are more complex to design, build and maintain. Queries are slower. Analysis is more difficult because business users need to understand much more complex data structures. Data marts, on the other hand, are prebuilt for better understanding and easier access, and they are customized for their customers.

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.