Choices for Minimizing Data Contention and Unleashing Performance
They're the heart and soul of business-critical applications. Enterprise databases carry the freight in an organization, running almost every major enterprise information system. Yet as a successful organization begins to expand applications, implement Web-based commerce or deliver real-time services, growth and success can be too much of a good thing for a database management system. More users mean more contention in the database, leading to database locking that slows business processes - and commerce - to a crawl.
This article will discuss the problem of database locking - why it occurs, how it affects performance, which database functions are most susceptible and new choices that database managers have for minimizing or avoiding it. We'll help you identify locking problems, troubleshoot them and circumvent them with options ranging from adjusting database design, limiting the number of concurrent users and restricting users' activities, to implementing new high-speed databases that can stand alone or function as "helpers" to existing relational database management systems (RDBMSs). With a better understanding of contention, you will be able to choose the best available options.
A Discussion Deadlock
Database locking is a core mechanism for data integrity, concurrency control and transaction management. It is so important that ideological "wars" have raged in journals over locking granularity - page level or row level? - because locking is so critical to performance and multi-user access. For example, SAP's internal transaction manager depends upon more granular row-level locking in order to provide high performance, and any database management system (DBMS) that supported less granular page-level locking was not able to participate in the SAP boom over the last several years. As enterprise applications are pushed to accommodate Web commerce and deliver real-time updates, database locking becomes a real, measurable obstacle to achieving corporate performance objectives. Update-intensive applications - as well as roll-ups of summary information, real-time running totals and inventory updates - cause the most critical locking problems. However, even "read-mostly" databases can suffer serious contention and locking problems.
What is Database Locking?
Database locking is often misdiagnosed because it is poorly understood. To begin a discussion of locking, we must assume that existing database servers are properly installed and that the operating system is properly configured to maximize hardware capabilities. We must also assume that memory, swap space and other resources have been properly tuned.
All databases have locking capabilities. At its most simple, locking is a mechanism that is designed to maintain database internal structures and enforce the database's "rules of engagement" - or how changes can be made by users or applications. These locks can be caused by explicit operations of an application's code, such as a SELECT for UPDATE. Locking may result from internal RDBMS engine behaviors, built-in by the vendor and outside of your control. For example, most RDBMSs automatically take out coarse-grained locks when updates are made to an indexed table. The operating system and associated hardware drivers may also lock resources, but these locks are quite large (e.g., entire files) and beyond the scope of this article. Operating system utilities and "perfmon" tools can provide indications of locking conditions, but resolving operating system (OS) locking problems usually requires rebooting and often requires the addition of hardware or reconfiguration of software.
Suspect Locking When ...
Assuming that your RDBMS is installed and tuned properly, database-locking problems tend to first appear as performance anomalies - transient aberrations or intermittent processing stalls. Locking will not crash the database or operating system, but you will notice severe responsiveness and throughput issues. Suspect locking problems when:
- Adding users, clients, threads, processors or disks to a system does not improve performance and may even degrade it further.
- Reducing the number of connections does not decrease aggregate database throughput.
- Adding a reporting or analytic application brings the system to its knees.
- Starting an application that performs roll-ups, running totals or real-time statistics dramatically changes total throughput. Applications such as Black-Shoales portfolio optimization are extremely stressing to RDBMS engines.
- Parallelizing your application causes no improvement. For example, you add another Web server running stateless scripts, yet database throughput goes down.
- Optimizing application code causes total throughput to drop.
Many of these problems may occur because the "improved" applications present more workload that further strains the database bottleneck.
Is it the Database?
If you're experiencing any of these symptoms, you know there's a problem - but how do you isolate the real issue to the RDBMS? Troubleshooting is easier if you temporarily shut down all other resource-hogs that are competing for the OS and hardware - run only the RDBMS processes.
Examine OS Resources. First, look at the following operating system resources to identify constraints and general hot spots:
- Memory paging statistics - if there is no operating system paging or swapping occurring.
- Disk I/O operations/second - look for total I/Os below 300 of reads/sec per RAID controller or dozens per disc drive.
- Disk I/O queue depths - look for shallow OS I/O queue depths.
- TCP/IP packet throughput - check for adequate network bandwidth and that only few packets are bounced or resent.
- Check for absence of Ethernet collisions.
- CPU utilization per process is "normal" (note: some RDBMS engines always saturate the CPU, leaving no idle cycles even if there is no work to do).
If some, or all, of these indicators are sub-optimal, there may be a database contention problem. If all of the indicators are sub-optimal, the database engine is likely to be the culprit.
Check Database Configuration. Next, check the database configuration to ensure that there aren't constraints affecting performance that have nothing to do with locking (for specific guidance on tuning your RDBMS, check the documentation). These constraints could include:
- Insufficient logging threads
- Not allowing multiple insert pages
- Insufficient client "listener" processes
- Too few CPUs and disks dedicated to the RDBMS
- Insufficient memory allocation
Gauge Symptom Severity. Next, examine the symptoms and magnitude of the performance anomalies you are experiencing and try to understand what makes them occur. The telltale sign of an RDBMS bottleneck is a radical decrease in performance with only a minor change to the workload. While the symptoms will be repeatable, they will not be very predictable. Therefore, it's essential to carefully control conditions and take thorough notes along the way. To begin narrowing possibilities, see if you can cause performance anomalies just with read-only workloads. If performance is satisfactory and predictable on any combination of SELECT operations, try to run a reporting or analytic application to see if its long-running queries bog down the system. Halt the reporting application and add writing operations. Vary read and write workloads, the number of concurrent users and the number of applications that are hitting the database. When you encounter a 50 percent or greater decrease in performance, try to perturb the system more to understand where the sensitivities are. You'll know you're onto something when performance has decreased by 90 percent or more.
What a Problem Lock Looks Like. The locks that most commonly create problems are memory locks - logical locks on database internal structures and resources. The most severe locks are write locks on indices such as operations that update primary keys; serialized read locks, which are required for roll-ups or statistics; and the generating of unique identifiers. If your application updates entries in an indexed column, this will cause very high impact locks.
The second-most severe locks are write locks on tables. Write locks on tables are rarely caused by application code. They can occur when utilities demand or bulk operations occur - such as a command to delete 100,000 rows.
Third-most severe locking issues involve write locks on pages or rows. These are caused by your application as multiple threads contend for a defined number of rows.
The least severe locking in online transaction processing (OLTP) applications involves read locks, but read-only applications can still cause database structures to be locked. This is because database internal "housekeeping" operations typically lock structures, such as work-queues. These locks are not because of something your application has requested.
Take Some Measurements. Use your RDBMS's statistics commands to query "number of locks open" and I/O queue depth. Take measurements under light workload and compare them to measurements when the system is swamped. RDBMS statistics usually include:
* How many locks are open - the more locks open, the greater the likelihood for contention and performance impact. Keep track of read versus write locks, and the level at which the lock is occurring (e.g., row vs. table).
- How many lock conflicts are occurring - the more lock conflicts, the greater the contention and impact on overall performance.
- How many users are waiting on locks - the more users are involved with lock contention, the "easier" it is to make the problem temporarily disappear (if logging off 10 percent of the users alleviates lock contention, application code changes are less likely to make a fundamental improvement).
- How many simultaneous threads are being executed - as with the number of users waiting on locks, the number of threads can be easily manipulated in test situations to isolate problem areas.
Profile Application Operations. Profile your application to understand what SQL operations are constrained. In contrast to compiler-level profiling (which examines the most time-consuming CPU/memory operations), database application profiling examines the timing, sequence and "expense" of database statements. Profiling tools are available if you are running packaged applications, and even more detailed profiling can be done by inserting logging statements into code for the applications that you write.
Ideally, applications should be tested to identify their scalability limits before they are deployed, and that baseline information may be available to provide you with greater insight. However, making test runs with the application in question will likely be necessary. During testing, vary one factor at a time to identify the limits of overall workloads and different transaction mixes. This will help you uncover stressful operations and perturbing conditions.
Check the Client Side. Finally, examine the database client code, checking the following items:
- ODBC/JDBC drivers must be set to allow multithreading.
- The application must be properly creating/destroying threads and properly preparing and committing transactions.
- The application must be handling threads properly under all conditions, which you can detect through multithreaded debugging.
Unlocking RDBMS Potential
In treating database locking problems, you have options, and your treatment of choice will depend on many nontechnical factors such as anticipated application growth, business objectives and IT budgets.
Recode. For short-term fixes, you can remove constraints or minimize their impact by enforcing shorter transactions, enabling more frequent commits, partitioning data into multiple tables to minimize the effect of concurrency and locking, carefully double-checking the threaded portions of code, and double-checking that ODBC/JDBC client drivers are configured to make sure that the driver is truly running multithreaded. Partitioning data is another solution.
Avoid. To assure that you don't introduce new locking problems into existing systems:
- Do not try to do transaction management in application code.
- Avoid bean-managed persistence in EJBs.
- Do not throw more hardware at the problem in attempts to reduce server workload (obviously this is expensive, but worse, becomes an administrative nightmare).
Replace. If the RDBMS is not mandated by your organization, choosing a different database engine may dramatically change results. Several important innovations have recently been made in "lock free" databases that, like asynchronous CPUs, change the rules of pipelining to radically reduce the impact of contention and bottlenecks.
Hire a "Helper." An option for improving database performance is to add a "helper" database to offload high-overhead functions and performance-critical tables from the main database. A new generation of ultra-high performance RDBMSs is designed to rarely - if ever - lock, enabling you to increase throughput. Using a helping database such as a cache can keep frequently used or recently accessed data readily available to users while sending the database of record (DOR) updates as a background function.
Helper databases can dramatically increase application performance - up to 20 times current application performance on the same installed hardware - while simultaneously reducing hardware, application development and overall system maintenance costs.
It's Nice to Have Options
Record contention and locking will always exist in large RDBMSs, especially when your organization is pushing the performance envelope. You can combine multiple tactics to minimize locking impact, cost effectively add significant "oomph" to processing power with a helper, or, with the choice of the right database, avoid locking problems altogether.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access