How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
How to Select Relevant KPIs: Here are a seven ways to determine key performance indicators that are relevant for the individual as well as the organization and improve initiative adoption. The result will be a company better positioned to effectively monitor and improve business performance.
When planning for an enterprise data warehouse, what types of investment should be planned for in the way of technology infrastructure - mainly computer hardware and software platforms (servers, databases, storage, OS, etc...)? Let's say this is a 1 terabyte EDW to start, please estimate requirements and rough costs.
A:
Sid Adelman's Answer: Your cost will be based on many more factors other than just the size of the data warehouse. Your cost will be based on:
The functions you are delivering
The number and characteristics of the source files, levels of quality, and requirement for quality.
The size of the database
The number of users, both concurrent and total
What software you buy and support (ETL, BI, data cleansing, DBMS, etc.)
Your need for consultants and contractors - this will be a function of your skills levels and the schedule for implementation
Your service level agreements for performance and availability.
The architecture of your EDW, use of ODS, staging areas, data marts, metadata
How well you plan to support your users
What training you intend to provide for your users
The training required for your IT staff
It's important to include all these costs as you establish a budget for the EDW. If the other costs are not included, they will not be allocated and you will have a substandard EDW or, if the moneys are allocated, you will be criticized for going over budget.
Les Barbusinski's Answer: Whoa! That's a tall order, but here goes. Exact infrastructure requirements and costs are dependent on a lot of factors including:
The purpose of the data warehouse (e.g., BI vs. data integration vs. compliance, etc.) as well as the functionality that is required (e.g., 24x7 availability, real-time vs. batch data acquisition, etc.)
Refresh frequencies and latency requirements
Projected transaction volumes and concurrency requirements
Service level agreements
How much of the required HW/SW is already in house
Shop standards
The degree of competition between potential vendors
Hence, I can't give you exact system configurations or cost estimates. However, I can give you a list of infrastructure components that you'll need to assess in order to lay the groundwork for your terabyte data warehouse. This includes:
Hardware
Application Servers: You'll need a collection of different application servers to host the layers of application services that your data warehouse will need including ETL, Business Intelligence, Reporting, Messaging, Web Portal and Web Services, Business Rules Engines, etc. Each application will have an optimal configuration that you'll need to address. You'll need to size the server(s) for each application layer (i.e. number of servers, number of processors, RAM, etc.) in order to handle the expected transaction volumes. You'll also have to address the need (if any) for clustering, active/passive failover, 24x7 availability, etc. Also, don't forget to plan for different environments (e.g. development, test, and production). Some of these servers will be UNIX, while others will be Windows ... depending on the application.
Database Servers: You'll need a collection of database servers to handle all of the database layers inherent in a data warehouse (e.g. historical data store, operational data stores, data marts, extract areas, etc.). First, you'll need to decide on how best to distribute your database nodes (e.g. all centralized, de-centralized data marts, de-centralized ODS instances, etc.), then size each node (i.e. CPUs and RAM per node, servers per node, etc.). As with the app servers, you'll have to address such issues as failover, clustering, 24x7 availability, etc. ... as well as make provisions for handling multiple environments. Keep in mind that your choice of RDBMS will - in large part - dictate your database server requirements.
Disk (SAN) Space: You've already stated that you need to store roughly one terabyte of usable data. Now you need to plan the space requirements for such things as indexes, logs, sort-work areas, staging areas, distributed data marts, extract areas, etc. You also have to make adequate provisions for replication, mirroring, and database backup (i.e. most backups are done to disk then spooled to tape asynchronously). Finally, you'll have to adjust all these estimates to handle the expected growth rate (taking into account whatever archiving scheme you have).
Tape Drives: Database and file backups from terabyte-plus sized data warehouses are usually spooled to streaming RAID-5 tape arrays. You'll need to assess the frequency and volume of these backups, choose an appropriate model of tape array, plan for an adequate supply of tapes, and insure adequate bandwidth between the servers being backed up and the tape arrays.
Archiving Devices: The growth of a data warehouse must be controlled in order to assure adequate performance. Aged data that is no longer used, must be periodically culled and archived from the database. Normally, this function is performed with tape drives. However, you may want to consider archiving to a CD-ROM "jukebox" if there is a chance that the data will have to be (a) restored, or (b) retrieved at some point in the future.
Networks: Obviously, you need to assess your company's LANs and WANs to insure that you have adequate bandwidth for (a) pulling data from source systems into the data warehouse, (b) replicating tables or refreshing materialized views to distributed nodes, and (c) handling query and reporting traffic from end users. You also need to insure that your network is sufficiently redundant to insure that the loss of a single trunk line will not disable your whole system. Also, if any portion of your data warehouse will be exposed to the Internet, you need to assess your need for security devices such as firewalls.
Information Management discussion: Business Intelligence
Twitter
Facebook
LinkedIn
Looking to move your initiatives and career forward? Information Management's LinkedIn group can help you connect and collaborate with a network of your peers.
Be the first to comment on this post using the section below.