There's finally an answer - the egg came first. If you accept evolutionary theory - but remember, it's just a theory, along with the theory of gravity - and if you accept the idea that birds evolved from dinosaurs, and because dinosaurs had eggs, and the chicken is still classified as a bird, it's clear that the egg came first. So how does this relate to selecting data warehouse tools?

It mostly has to do with where you start your evaluation. You start with the egg, which is your environment. Long before you let a vendor in the front door, you will want a full understanding of your environment and your infrastructure. This will be your starting point. Answer the following questions:

  • What is your preferred platform? This is your operating system (Windows, UNIX) and database management system.
  • What tools do you already have in place, how well are they working and do you plan to use them in your data warehouse?
  • Do you have preferred vendors? This may eliminate some you might want to evaluate.
  • What is your standard for buy versus build? For the data warehouse, buy is usually preferred.
  • Are you looking for vendors with suites of products or do you prefer best of breed? Best of breed takes more software integration effort and testing. Buying a suite from the same vendor makes it clearer who is responsible when there are problems, but the suite rarely has the best products.
  • Do you have an ongoing relationship with a consulting/contracting organization that you plan to use? Even when the consulting/contracting organization states their ability and willingness to work with any product, they will try to encourage the use of the products they know best.
  • What skills are available for implementing the software? This goes much beyond your in-house skills if the people with the skills are not available to implement the software. A budget for contractors and consultants could provide some mitigation when skills are not available.
  • Does the CIO play golf with any of the vendors' reps? Unfortunately, this often drives the final decision.

Consider the Future

In addition to where you are today, you will want to know your future plans before talking to the vendors. You will want to know:

  • Is your application CRM, finance, HR, campaign management?
  • What are your requirements for data integration? You may have a requirement for integrating your customer or supplier data.
  • What are the data sources, both internal and external?
  • What is the level of data quality in your data sources? If you haven't already started, it is worth beginning a data profiling process, identifying missing data, data that is outside the valid values, uniqueness, data redundancy and data that violates business rules.
  • Your user community - who are they? Are they power users, business analysts, casual users, executives, or does your community include all of them?
  • What are your expectations for scheduled hours (24x7, 18x6)?
  • What service level agreements (SLAs) are you expecting for the data warehouse regarding availability during the scheduled hours? Most data warehouse users are satisfied with a 98 percent availability. A 99.99 percent availability is achievable but at an additional expense.
  • What are your requirements for historical data? Older data will have more data quality problems.
  • At what level of detail must the data be captured? Even though the users tell you that all they need is summary data, don't believe them. When they see strange results in the summary data, they will want to see the detail that was the source of the summarizations.
  • What is the periodicity of the updates or loads (real time, daily, weekly)? Real time or near real time usually means the data warehouse is being used for operational purposes, not just for decision support.
  • What are your expectations for the number of concurrent users? It is always the number of concurrent users (not the number of named users or seats) that will affect performance requirements.
  • What are your expectations for the volume of data?

All plans for performance and capacity should allow a three-times multiplying factor. In other words, plan for three times your anticipated end-state data volumes and three times your anticipated end-state number of concurrent users. When data warehouses are successful, more users become active, the queries become more creative and more complex, new applications emerge, more detail is required, new data sources are identified, SLAs become more demanding, and new governmental regulations require maintaining more and more data and being able to report on this data.
Now that you have fully defined your environment and validated your results, it is appropriate to share any nonconfidential information with prospective vendors - future strategic plans are probably off the table. The vendors want to appropriately market to you, and so information that would help them will optimize time and effort for both of you. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access