Continue in 2 seconds

Enterprise Architecture View

  • December 01 2004, 1:00am EST

How are you detecting and measuring performance of your data warehouse?

The complexities and cross dependencies of today's IT infrastructures pose continuous challenges for maintaining acceptable and consistent performance in the data warehouse environment. Data warehouse processing activities (reports; analytics; extract, transform and load [ETL]; queries; etc.) that initially performed optimally may begin to degrade over time. The operational impact to the data warehouse environment may not be readily apparent due to the rate of change or the seemingly sporadic occurrence of the these types of issues. These changes in the environment can be caused by a variety of factors that are directly or indirectly associated with the warehouse.

This column is the first portion of a two-part series examining various data and infrastructure issues that may affect performance in the data warehouse environment. Techniques for measuring and/or detecting performance issues in the data warehouse environment will be examined in part two.

Environment Factors

You deployed your data warehouse to the production environment for user access after extensive testing of all processing activities. Stress testing was performed against the data warehouse environment to ensure that adequate performance would be maintained during periods of high demand. What other factors could affect warehouse performance? To start, it is likely that none of your testing was done on an exact copy of your production infrastructure. Typically, due to cost, the testing environments do not mirror the infrastructure components found in production. The number of server tiers (Web, application, database, firewalls) or the redundancy of parallel data centers for load balancing and failover is not available in the testing infrastructure. Additionally, there may be differences in network bandwidth between the testing and production environments.

How complete is your view of processing activity occurring on the data warehouse infrastructure? What effect does your ETL processing cycle have on front-end performance of the data warehouse? Did you measure front-end performance of the data warehouse during an ETL processing cycle to gauge its impact? ETL processing typically consumes significant CPU and disk I/O resources from the servers. If your ETL product resides on the same application and/or database servers servicing user queries, there can be an increase in processing times for these requests.

Other database activities in the environment can also affect warehouse performance. If other databases reside with the data warehouse, their activities will consume server resources. Ideally, transactional applications should be on separate servers from the warehouse to ensure optimal performance of both environments. If transactional and data warehouse databases must be on the same server, they should be in separate database instances in order to take advantage of DBMS and database tuning parameters specific to their use. Depending on growth and usage changes in the data warehouse, the DBMS instance and database may require additional tuning in order to maintain performance.

Database and log file backups can also be potential factors affecting overall performance of your warehouse environment. These activities should be scheduled during off-peak hours to minimize impact on the environment.

Shared Infrastructure Factors

Many data warehouse environments are taking advantage of the various shared storage device options available today. These shared storage devices allow for data growth for all servers on the network, eliminating the need and maintenance of dedicated disk for specific servers. Application and database server resources are freed from data storage processing activities because data does not reside on them. Network attached storage (NAS) and storage area network (SAN) are examples of these shared storage devices. While these storage options provide overall maintenance and growth advantages to the entire enterprise, they can affect performance of the data warehouse in ways that are not easily detectable. Because a shared device server is shared in the enterprise, processing activity from a variety of application servers can be requesting resources from the same storage server that the data warehouse accesses. During periods of heavy data processing from multiple applications, the shared storage server can reach saturation points that affect the processing performance provided to the data warehouse and other applications. From a data warehouse perspective, the degraded performance will not be apparent or measurable on the warehouse servers because of the use of shared storage. The degraded performance may be sporadic and reached only during certain periods when the combined application processing requests all simultaneously hit the shared storage server.

In data warehouse environments that are deployed across multiple data centers for availability and load balancing of application processing, the interconnect infrastructure, sometimes referred to as the fabric, can cause performance issues with the data warehouse that are not easy to detect. Degraded data transfer media and/or interconnect components can affect data processing activities of the warehouse, resulting in slower performance. Again, the cause of the performance issue on the data warehouse will not be apparent due to the use of this shared infrastructure service of the enterprise.

As you can see, many direct and indirect factors may affect overall performance of your data warehouse environment. The impact these factors will have on data warehouse performance cannot typically be measured during development and testing phases. Getting a complete picture of all processing that can impact performance of the warehouse will require a total understanding of all data warehouse administration, maintenance, tuning and shared services activities. Next month, we will investigate various methods for detecting and measuring performance of the data warehouse for these types of issues.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access