27

I am a junior developer working on writing an update for software that receives data from a third-party solution, stores it in a database, and then conditions the data for use by another third-party solution. Our software runs as a Windows service.

Looking at the code from a previous version, I see this:

        static Object _workerLocker = new object();
        static int _runningWorkers = 0;
        int MaxSimultaneousThreads = 5;

        foreach(int SomeObject in ListOfObjects)
        {
            lock (_workerLocker)
            {
                while (_runningWorkers >= MaxSimultaneousThreads)
                {
                    Monitor.Wait(_workerLocker);
                }
            }

            // check to see if the service has been stopped. If yes, then exit
            if (this.IsRunning() == false)
            {
                break;
            }

            lock (_workerLocker)
            {
                _runningWorkers++;
            }

            ThreadPool.QueueUserWorkItem(SomeMethod, SomeObject);

        }

The logic seems clear: Wait for room in the thread pool, make sure the service hasn't been stopped, then increment the thread counter and queue the work. _runningWorkers is decremented inside SomeMethod() inside a lock statement that then calls Monitor.Pulse(_workerLocker).

My question is: Is there any benefit in grouping all the code inside a single lock, like this:

        static Object _workerLocker = new object();
        static int _runningWorkers = 0;
        int MaxSimultaneousThreads = 5;

        foreach (int SomeObject in ListOfObjects)
        {
            // Is doing all the work inside a single lock better?
            lock (_workerLocker)
            {
                // wait for room in ThreadPool
                while (_runningWorkers >= MaxSimultaneousThreads) 
                {
                    Monitor.Wait(_workerLocker);
                }
                // check to see if the service has been stopped.
                if (this.IsRunning())
                {
                    ThreadPool.QueueUserWorkItem(SomeMethod, SomeObject);
                    _runningWorkers++;                  
                }
                else
                {
                    break;
                }
            }
        }

It seems like, it may cause a little bit more waiting for other threads, but then it seems like locking repeatedly in a single logical block would also be somewhat time-consuming. However, I'm new to multi-threading, so I'm assuming that there are other concerns here that I'm unaware of.

The only other places where _workerLocker gets locked is in SomeMethod(), and only for the purpose of decrementing _runningWorkers, and then outside the foreach to wait for the number of _runningWorkers to go to zero before logging and returning.

Thanks for any help.

EDIT 4/8/15

Thanks to @delnan for the recommendation to use a semaphore. The code becomes:

        static int MaxSimultaneousThreads = 5;
        static Semaphore WorkerSem = new Semaphore(MaxSimultaneousThreads, MaxSimultaneousThreads);

        foreach (int SomeObject in ListOfObjects)
        {
            // wait for an available thread
            WorkerSem.WaitOne();

            // check if the service has stopped
            if (this.IsRunning())
            {
                ThreadPool.QueueUserWorkItem(SomeMethod, SomeObject);
            }
            else
            {
                break;
            }
        }

WorkerSem.Release() is called inside SomeMethod().

7
  • 1
    If the whole block is locked, how will SomeMethod obtain the lock to decrement _runningWorkers? Commented Apr 7, 2015 at 17:58
  • @RussellatISC: ThreadPool.QueueUserWorkItem calls SomeMethod asynchronously, the "lock" section above will be left before or at least shortly after the new thread with SomeMethod starts to run.
    – Doc Brown
    Commented Apr 7, 2015 at 18:15
  • Good point. It is my understanding that the purpose of Monitor.Wait() is to release and re-acquire the lock so another resource (SomeMethod, in this case) can use it. On the other end, SomeMethod obtains the lock, decrements the counter, and then calls Monitor.Pulse() which returns the lock to the method in question. Again, this is my own understanding.
    – Joseph
    Commented Apr 7, 2015 at 18:16
  • @Doc, missed that, but still... seems like SomeMethod would need to start before the foreach locked on the next iteration or it would still be hung against the lock held "while (_runningWorkers >= MaxSimultaneousThreads)". Commented Apr 7, 2015 at 18:32
  • @RussellatISC: as Joseph already stated: Monitor.Wait releases the lock. I recommend to have a look into the docs.
    – Doc Brown
    Commented Apr 7, 2015 at 19:50

3 Answers 3

34

This is not a question of performance. It is first and foremost a question of correctness. If you have two lock statements, you can not guarantee atomicity for operations that are spread between them, or partially outside the lock statement. Tailored for the old version of your code, this means:

Between the end of the while (_runningWorkers >= MaxSimultaneousThreads) and the _runningWorkers++, anything at all may happen, because the code surrenders and re-acquires the lock in between. For example, thread A might acquire the lock for the first time, wait until there some other thread exits, and then break out of the loop and the lock. It is then preempted, and thread B enters the picture, also waiting for room in the thread pool. Because said other thread quit, there is room so it doesn't wait very long at all. Both thread A and thread B now go on in some order, each incrementing _runningWorkers and starting their work.

Now, there are no data races as far as I can see, but logically it's wrong, since there are now more than MaxSimultaneousThreads workers running. The check is (occasionally) ineffective because the task of taking a slot in the thread pool is not atomic. This should concern you more than small optimizations around lock granularity! (Note that conversely, locking too early or for too long can easily lead to deadlocks.)

The second snippet fixes this problem, as far as I can see. A less invasive change to fix the problem might be putting the ++_runningWorkers right after the while look, inside the first lock statement.

Now, correctness aside, what about performance? This is hard to tell. Generally locking for a longer time ("coarsely") inhibits concurrency, but as you say, this needs to be balanced against the overhead from the additional synchronization of fine-grained locking. Generally the only solution is benchmarking and being aware that there are more options than "lock everything everywhere" and "lock only the bare minimum". There is a wealth of patterns and concurrency primitives and thread-safe data structures available. For example, this seems like the very application semaphores were invented for, so consider using one of those instead of this hand-rolled hand-locked counter.

0
11

IMHO you are asking the wrong question - you should not care so much about efficiency trade-offs, but more about correctness.

The first variant makes sure _runningWorkers is only accessed during a lock, but it misses the case where _runningWorkers might be changed by another thread in the gap between the first lock and the second. Honestly, the code looks to me if someone has put blindly locks around all access points of _runningWorkers without thinking about the implications and the potential errors. Maybe the author had some superstitious fears about executing the break statement inside the lockblock, but who knows?

So you should actually use the second variant, not because its more or less efficient, but because its (hopefully) more correct than the first one.

2
  • On the flip side, holding a lock while undertaking a task that may require acquiring another lock can cause a deadlock that can hardly be termed "correct" behavior. One should ensure that all code which needs to be done as a unit is surrounded by a common lock, but one should move outside that lock things that don't need to be part of that unit, especially things that may require acquisition of other locks.
    – supercat
    Commented Apr 7, 2015 at 23:09
  • @supercat: this is not the case here, please read the comments below the original question.
    – Doc Brown
    Commented Apr 8, 2015 at 5:36
9

The other answers are quite good and clearly address the correctness concerns. Let me address your more general question:

How much work should I place inside a lock statement?

Let's start with the standard advice, that you allude to and delnan alludes to in the final paragraph of the accepted answer:

  • Do as little work as possible while locking a particular object. Locks that are held for a long time are subject to contention, and contention is slow. Note that this implies that the total amount of code in a particular lock and the total amount of code in all lock statements that lock on the same object are both relevant.

  • Have as few locks as possible, to make the likelihood of deadlocks (or livelocks) lower.

The clever reader will note that these are opposites. The first point suggests breaking up big locks into many smaller, finer-grained locks to avoid contention. The second suggests consolidating distinct locks into the same lock object to avoid deadlocks.

What can we conclude from the fact that the best standard advice is thoroughly contradictory? We get to actually good advice:

  • Don't go there in the first place. If you are sharing memory between threads, you're opening yourself up for a world of pain.

My advice is, if you want concurrency, use processes as your unit of concurrency. If you cannot use processes then use application domains. If you cannot use application domains, then have your threads managed by the Task Parallel Library and write your code in terms of high-level tasks (jobs) rather than low-level threads (workers).

If you absolutely positively must use low-level concurrency primitives like threads or semaphores, then use them to build a higher-level abstraction that captures what you really need. You will likely find that the higher level abstraction is something like "perform a task asynchronously that can be cancelled by the user", and hey, the TPL already supports that, so you don't need to roll your own. You will likely find that you need something like thread safe lazy initialization; don't roll your own, use Lazy<T>, which was written by experts. Use threadsafe collections (immutable or otherwise) written by experts. Move the level of abstraction up as high as possible.

Not the answer you're looking for? Browse other questions tagged or ask your own question.