Continue in 2 seconds

Data Warehouse Size Depends on the Size of the Business Problem

Published
  • August 01 2003, 1:00am EDT

The primary determinant of the size and shape of the data warehouse is the size and shape of the business problem, not the company building the system. If ever a question was squarely in the "it depends" area, this is it. Still, it is possible to make some generalizations concerning the functional dependencies on which "it depends." The dependencies include:

  1. The business problem being addressed and solved.
  2. The experience and maturity of the industry as to the use of business intelligence for decision support and competitive advantage.
  3. The costs, risk and benefits of designing, implementing and operating a data warehouse with direct reference to the particular operational environment at the hosting enterprise in question.

Let's look at some examples by way of illustration. Think of three different enterprises with $3 billion in revenue. One is a telecommunications company with 3 million customers to whom the firm wants to cross-sell and up-sell additional telecom products and services. The second is a consumer packaged goods (CPG) firm that wants to improve logistics and distribution, and reduce inventory by means of superior demand planning. The third is a high technology manufacturer (not necessarily in information technology) that wants to improve quality and manage relations with suppliers, users of its products and regulatory oversight agencies. The first will be the largest because of the amount of detailed transactional data to be aggregated. The second will be less large in terms of data, but perhaps more computationally intense and complex in terms of the calculations that need to occur to generate a multiplicity of forecasts by product. Therefore, less cost will be consumed by disk, but more will be consumed in process design. The third will probably entail even less data but will be extremely challenging in terms of how the data is to be captured, represented and evaluated as to the key performance indicators (KPIs). While the data and schema integration challenges are significant in all three cases, the third case presents special challenges requiring considerable domain expertise concerning the manufacturing processes in question. After a year of operation, the telecommunication warehouse will be 1 terabyte or more of detailed transactional data, the CPG warehouse will be a couple of hundred gigabytes of shipment and forecasting data (depending on the number of products), and the manufacturing warehouse may be 50 to 100 gigabytes.
Second, vertical industry dynamics influence the business problem being addressed by the data warehouse and, in turn, influence the size and shape of the data warehouse. If an industry makes extensive use of data warehousing ­– think of retail or financial services –­ an enterprise in that vertical which does not build a data warehouse (or find a way of getting the same answers elsewhere) is at risk of incurring information asymmetries that put it at a competitive disadvantage.

The size and shape of the data warehouse in a given enterprise is a function of the experience and maturity of the industry as to the use of business intelligence for decision support and competitive advantage. Early adopters of data warehousing include market research firms such as A.C. Nielsen and marketing-driven enterprises in all aspects of retail and consumer goods. The basic question of data warehousing is: What customers are buying and using what products or services and when and where (channel) are they doing so?

Retailers have many questions of this form, and those that operate a data warehouse also have the answers. That is now largely the case in customer-facing industries. CPG and manufacturers have exploited data warehousing to reduce inventory and optimize distribution through superior demand planning. Transportation (airlines) and hostelry have exploited data warehousing to build loyalty through frequent flyer and related programs. Banking and financial services exploit data warehousing once they overcome the challenge of finding the customer behind all the various accounts in which he or she is hidden. Insurance presents a mixed bag with property and casualty being committed users of the technology, but the healthcare industry is ambivalent about the entire concept. The pharmaceutical domain and related suppliers of medical devices do indeed have sophisticated, complex and substantial data requirements, but they have been relatively late adopters of data warehousing (as have public sector firms and education). The reasons for this are many, but relate closely to the power of suppliers and the amount of regulatory oversight. In the latter sectors, data warehousing exists in a form limited to good, solid data management practices ­– without particular reference to infrastructure for business intelligence –­ and "data management" is often what these clients mean when they use the term "data warehouse." Data warehousing is also sometimes confused with data archiving –­ but that is a different story for another month.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access