15 January 2010

Biased Locking - Cliff Click

Some interesting comments on biased locking at Cliff Click Jr.’s Blog
Recently I re-did HotSpot's internal locking mechanism for Azul's JVM. The old locking mechanism is approaching 15 years old and features a number of design decisions that are now out-dated:

1. Recursion counts are kept as a NULL word on the stack for every recursion depth (i.e., counting in Base 1 math) in order to save a few instructions and a few bits of memory. Both are now in vast plentiful supply. On the 1st lock of an object, it's header is moved into the stack word instead of a NULL and this means that GC or other locking threads (or threads installing a hash code) all need to find and update the header word - which can now be "displaced". This mechanism is complex, racey and error prone.
2. The existing mechanism requires a strong memory fence after a Compare-And-Swap (CAS) op, but on most machines the CAS also includes a memory fence. I.e., HotSpot ends up fencing *twice* for each lock acquire, once to CAS the header and again moving the displaced header to the stack. Each memory fence costs about a cache-miss on most X86 CPUs.
3. The existing mechanism uses "Thin Locks" to optimize for the very common case of a locked object never being contended. New in Java7, +UseBiasedLocking is on by default. This optimizes the common case even more by not using any fencing for locks which have never (yet) changed threads. (See this nice IBM paper on how to do it). The downside in the OpenJDK implementation is that when an object DOES have to change thread-ownership, the cost is so high that Sun has choosen to disable biased locking for whole classes of locks to avoid future thread-ownership-change costs.
4. When a lock does see contention it "inflates" and then the "inflated" lock is much more expensive than a fast-path "thin lock". So even the smallest bit of contention will cause a lock to be much more expensive than the good case.
5. JVM internal locks and locked Java objects use 2 utterly different code bases. This adds a lot of complexity to an already complex system. The two classes of locks are used in slightly different ways and do have different requirements, BUT they both fundamentally implement a fast-path locking protocol over the OS provided locking abstraction. At Azul Systems, we found that these two locking systems have a lot more in common than they do in difference.