Is there a standard that can be used as a guideline for sizing and capacity? I've been reading many sources of information regarding database and cube sizing but I am not clear on exactly how this is calculated and measured in regards to server and database resources. For example please review the following:
Dim_Time 144 rows 132 char record length = .019 MB
Is the database taking up 100.031MB of space? How is the cube space calculated? What impact is there when there are concurrent users? How is this measured?
Les Barbusinski's Answer: In your example, the actual size of your database is probably closer to 200 Mb. The reason is that you have to factor in a) block size, b) free space, c) extent size and d) indexes.
Block size is the physical size of the data block that contains the rows. RDBMSs do not store records individually...they group them into blocks (i.e., "n" rows per block) that are then written to - or retrieved from - the hard disk as a unit. Block size is fixed at the time a database is created and is usually set at 16K, 32K or 64K. Free space is the percentage of each block set aside by the DBA for future insert/update activity. If, for example, free space is set at 20 percent, each block will initially be loaded to 80 percent of its capacity. Extents are the physical units of disk space that are allocated by the operating system (i.e., UNIX or Windows) to specific tables and/or indexes. Extents can be "primary" or "secondary." Primary extents are usually sized to contain all of the rows currently in a table or index with some allowance for short-term growth (say, six months). Secondary extents are incremental blocks of disk space that are allocated to a table or index as the primary extent is exceeded. Indexes are similar to tables in that each requires its own space... based on its row length, free space,and extent settings.
Now let's apply all this knowledge to your example using the following assumptions: 1) block size equals 32k, 2) 20 percent free space and 3) 10 percent short-term growth. Let's further assume that each dimension table has a single 10-byte numeric primary key and the fact table has four indexes: a primary index consisting of the three foreign keys from the dimensions and three individual FK indexes. Given these assumptions, your calculations should look something like this:
Data Space Requirements
Index Space Requirements
So the total amount of space you must allocate for your database is 174.91 MB (i.e., 134.46 + 40.45). Granted, this is an oversimplified example, but you get the idea (hope I got the math right). Your DBA can provide you with specific formulas for estimating space requirements in your shop.
As for estimating cube sizes, that varies by vendor and by platform. Cubes are proprietary and internal architectures differ, so there is no standard formula. Check with your BI vendor.
The number of concurrent users on a database has no effect on storage requirements. It does, however, impact server capacity...as measured by total CPU horsepower (i.e., CPU count x MHz), and RAM. Estimating this impact on server capacity is an art form, not a science. Performance varies by server platform, operating system and RDBMS. The easiest way to estimate your capacity requirements is to benchmark the performance of similar servers in your shop. Observe the server's performance while asking yourself the following questions:
Once you've ascertained how your chosen server is likely to react to various loads, it's time to estimate your DW application's "load:"
Ultimately, when you apply your estimated "loads" against your server's performance curves, you should be able to make a fairly accurate SWAG at your server capacity requirements.
Michael Jennings' Answer: The table below (see attached spreadsheet also) provides a simply method for calculating the size of a dimensional model plus forecasting annual growth. I have inserted the dimension table information you provided in your question into the spreadsheet (the number of rows for the fact table was erroneous). Using the row size, estimated initial rows for dimension tables, predicted annual growth and index size (if used) the spreadsheet will provide sizing estimates for the maximum possible fact table size, dimension tables, and totals for the database. As with all tools, your own specific knowledge and experience should be applied to the final database size estimates.
To view or download the XLS spreadsheet, click here.
Chuck Kelley's Answer: Sorry, but I don't know of any guideline for sizing and capacity. However the calculations will depend on the products you use. From the database size of the equation (assuming you are doing this is a relational database), then you calculations are almost correct. The raw data is 100.031mb, but the database will take up more space. There is overhead for each page in the database, each file of the database, system tables and indexes, database logs/journals, and other like items. Generally this is not much, maybe 10+mb will do. Next you need to consider temporary space. Depending on the number of users, I think 100 percent extra space is reasonable (for smaller databases), though when you get in the 100s of gigabytes, the accountants will start screaming. In your case, it probably won't be much of an issue. Next you need to add space for indexes (and surrogate keys if you use them and they were not in your original raw data estimate). I use the "seat of the pants" double the amount of raw data. In your case, that will be another 100mb. Again, this will be dependent on the database vendor and whether you choose to use indexing strategies other than B-Tree indexes (most common index type in databases).
Other things that you will need to consider is whether you will have aggregate (or summary) tables. For the size of your data, they will probably not be needed. But if you decide to build them, you will have to do the calculation for those tables like you did the raw data tables.
Then you will want to consider is the staging area. The staging area is where the data comes into your system. Sometimes this is in the database. Sometimes this is flat files. However, there will be some space that you will need to be able to run the ETL products. You will want to make sure that you have sufficient space to do all of this. Depending on the types of transformations and source feeds you get, this could be quite significant.
Now as for cube sizing, that will be highly dependent on the product that you use and the density of your cube. Your best bet is to take some sample data and have the vendors give you a "seat of the pants" calculation.
Joe Oates' Answer: I don't know what database that you will be using, so I can only give some general observations. I do not see where you have allowed for indexes, sort space, staging area space, etc. You must consider these and perhaps others, when you do database sizing.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access