Test and test-and-set

In computer architecture, the test-and-set CPU instruction (or instruction sequence) is designed to implement mutual exclusion in multiprocessor environments. Although a correct lock can be implemented with test-and-set, the test and test-and-set optimization lowers resource contention caused by bus locking, especially cache coherency protocol overhead on contended locks.

Given a lock:

boolean locked := false // shared lock variable

the entry protocol is:

procedure EnterCritical() {
  do {
    while ( locked == true )
         skip // spin using normal instructions until the lock is free
  } while ( TestAndSet(locked) == true ) // attempt actual atomic locking using the test-and-set instruction
}

and the exit protocol is:

procedure ExitCritical() {
  locked := false
}

The difference to the simple test-and-set protocol is the additional spin-loop (the test in test and test-and-set) at the start of the entry protocol, which utilizes ordinary load instructions. The load in this loop executes with less overhead compared to an atomic operation (resp. a load-exclusive instruction). E.g., on a system utilizing the MESI cache coherency protocol, the cache line being loaded is moved to the Shared state, whereas a test-and-set instruction or a load-exclusive instruction moves it into the Exclusive state.

This is particularly advantageous if multiple processors are contending for the same lock: whereas an atomic instruction or load-exclusive instruction requires a coherency-protocol transaction to give that processor exclusive access to the cache line (causing that line to ping-pong between the involved processors), ordinary loads on a line in Shared state require no protocol transactions at all: processors spinning in the inner loop operate purely locally.

Cache-coherency protocol transactions are used only in the outer loop, after the initial check has ascertained that they have a reasonable likelihood of success.

If the programming language used supports short-circuit evaluation, the entry protocol could be implemented as:

 procedure EnterCritical() {
   while ( locked == true or TestAndSet(locked) == true )
     skip // spin until locked
 }

Caveat

Although this optimization is useful in system programming, test-and-set is to be avoided in high-level concurrent programming: spinning in applications deprives the operating system scheduler the knowledge of who is blocking on what. Consequently, the scheduler will have to guess on how to allocate CPU time among the threads -- typically just allowing the threads to use up their timing quota. Threads will end up spinning unproductively, waiting for threads that are not scheduled.

By using operating-system provided lock objects, such as mutexes, the OS can schedule exactly the unblocked threads.

References

Gregory R. Andrews, Foundations of Multithreaded, Parallel, and Distributed Programming, pp. 100–101. Addison-Wesley, 2000. ISBN 0-201-35752-6.

Caveat

See also

References