To ensure that your data warehouse is scalable, you have to view all of the hardware and software components and all of the processes that are part of data warehousing (such as extracting, cleansing and transforming data) as parts of an overall "performance chain." If any component or process is not scalable, then you have a weak link in the chain, and the warehouse as a whole will not be able to scale very well. It doesn't really matter where that non-scalable component or process is in the chain ­ its mere existence will eventually create a bottleneck. And, if you have a bottleneck, you will not have a true "organic data warehouse," meaning your warehouse will not be able to organically grow and adapt as rapidly as your organization's needs increase and change. So, we need to be vigilant about ensuring the scalability of all components and processes. However, what happens when we have some legacy components that we still want to use in our new warehouse, but which where built a long time ago and weren't originally designed to be scalable? For example, we may have very large and very complex batch-processing programs written in COBOL that contain critical data manipulation routines. In many cases, due to their complexity, rewriting these programs (in order to make them more scalable) may be a very expensive proposition. In other cases, any individual legacy program may not be very complex; but there may be hundreds of these programs that were developed over time, and each one would have to be rewritten. Again, the time and the resources needed to rewrite all these programs might be prohibitive.

A similar issue can arise if we want to use off-the-shelf applications that simply weren't written to take advantage of scalable hardware platforms and, therefore, cannot scale in their generic form. Or, there may be cases where your data warehouse will need some custom code, forcing you to write scalable/parallel applications (which may require that your developers become trained in how to write scalable/parallel programs).

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access