I just read the chapter on locks in the book Operating Systems: Three Easy Pieces, and it discusses an implementation of locks (see in Figure 28.9) that involves a test-and-set hardware instruction.
The textbook emphasizes that both hardware support and OS support are necessary to build an efficient locking mechanism. Wikipedia echoes this sentiment:
Locks typically require hardware support for efficient implementation. This support usually takes the form of one or more atomic instructions such as "test-and-set", "fetch-and-add" or "compare-and-swap". These instructions allow a single process to test if the lock is free, and if free, acquire the lock in a single atomic operation.
I think I've come up with an alternative lock implementation that doesn't require hardware support. It seems to have relatively similar performance overhead, while guaranteeing mutual exclusion, so making this post to find out if I'm missing something.
Textbook Lock Implementation
I'll first summarize the textbook implementation. For setup, we need to:
- Maintain a single flag (set to 0 or 1) and a queue (of thread IDs).
- Maintain a separate flag for a "guard" lock, which will be a simple spin lock. Because the guard lock is meant to protect a relatively small critical section, the performance overhead of a spin lock won't be as pronounced.
When a thread wants to acquire a lock, it will do the following:
- First, spin with the test-and-set instruction until you acquire the guard lock.
- Once acquired, check if the flag (of the main lock) is set to 0. If it is, simply set it to 1, and the lock has been acquired! If the flag is set to 1, proceed to step 3.
- Push the current thread ID onto the queue.
- "Park" the current thread. This is an instruction in Solaris that basically puts a thread to sleep.
On the other hand, when a thread wants to release a lock, it will do the following:
- Spin until you acquire the guard lock.
- If the queue is empty, set the flag to 0, and you're done. If the queue is nonempty, proceed to step 3.
- Pop a thread ID from the queue, and "unpark" the corresponding thread (wake it up).
My Implementation (without hardware support)
First, some setup:
- Every lock must be initialized with some unique lock ID.
- Inside the OS, maintain a "lock table." For each lock ID, the lock table will maintain a flag (0 or 1) and a queue (of thread IDs).
Then, when you want to acquire the lock:
- Make a system call (which might be called something like
lock
) and pass in the lock ID as an argument. - Enter the OS. Since the OS manages scheduling, we guarantee mutual exclusion for the following steps.
- Search the lock table for the specified lock ID. If the flag is 0, set it to 1. Then return; the lock has been acquired! If the flag is 1, proceed to step 4.
- Push the calling thread to a queue.
- Block the calling thread, so for now, it cannot be scheduled.
To release a lock:
- Again, make a system call with the lock ID as an argument.
- Entering the OS (again guaranteeing mutual exclusion), search the lock table for the specified lock ID. If the queue is empty, set the flag to 0, you're done. If not, proceed to step 3.
- Pop a thread from the queue, and unblock it. It should be scheduled soon.
Note that when we are acquiring a lock, a context switch is not necessary if the lock is available, so the performance overhead should be minimal.
It's likely that I'm missing a small detail that will make me look stupid, but that's why I'm asking this question in the first place.