Continue in 2 seconds

We are building a data warehouse which will start at 100GB and will reach the terabyte range or more.

By
  • Douglas Hackney, Sid Adelman, Les Barbusinski
Published
  • March 13 2002, 1:00am EST

Q:

We are building a data warehouse which will start at 100GB and will reach the terabyte range or more. Can Oracle scale to meet this database? And if so, do we denormalize the data schema?

A:

Sid Adelman’s Answer: You are getting into an area where, if you are wrong, you have cost your organization dearly and you have wasted precious time and credibility. You must be sure that your data warehouse will scale. Take everything the vendors say with a grain of salt. Make them back up their statements with real live customer databases that are at least as large as yours and have at least the number of users and the level of complexity of the queries you intend to run. Be careful of benchmarks; they can be misleading. Talk with organizations running data warehouses with multiple terabytes. Find out what problems they encountered and how they solved those problems and ask them about vendor support; it becomes critical when – not if – you run into problems. Doug Hackney’s Answer: Any of the leading database products can scale to support a terabyte system, or at least their marketing and sales teams will claim they can. As in any situation, talk to real teams that have implemented real systems at the size you think you will be at in the foreseeable future. Also, question them closely, as many vendors claim multiterabyte systems but much of this size is actually indexes, temp space, etc.

Clay Rehm’s Answer: First off, any major RDBMS vendor can handle the data. This includes Oracle, DB2, SQL Server and NCR Teradata. Your hardware (CPUs, disk, RAM, etc) will need to scale, and any hardware purchases you make now will impact your scalability. If your initial database size is only 100 GB, why purchase hardware now? You may be better off putting your DW on a loaner server or share it with another server in the company. This way you can see how fast the database is growing and what type of hardware you will need without making the initial investment up front and possibly making the wrong initial investment.

Should you denormalize? I think it takes more thought that that. It is not just a simple decision to normalize or denormalize. Even when you consider dimensional modeling, the fact tables are normalized and the dimension tables are denormalized. What I am saying is that there is not a one size method that fits all. Keep in mind that performing pure denormalization will use more disk space that a normalized model.

Most importantly, each data element that the users need, needs to be represented in your data model. So your logical data model can be a third normal form, but your physical model could turn out to be normalized, denormalized, dimensional or a hybrid.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access