15

Looking at the code for Contains in the HashSet<T> class in the .NET source code, I cannot find any reason why Contains is not thread safe?

I am loading a HashSet<T> with values ahead of time, and then checking Contains in a multi threaded .AsParallel() loop.

Is there any reason why this would not be safe. I am loath to use ConcurrentDictionary when I don't actually require storing values.

9
  • Are you writing to it once, and then only reading from it in multiple threads? Commented Mar 10, 2015 at 9:53
  • 4
    Contains is thread safe as long as you don't add/remove anything from the set (while you are using contains)
    – nafas
    Commented Mar 10, 2015 at 9:54
  • 3
    Why not read the manual? msdn.microsoft.com/en-us/library/bb359438.aspx: Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
    – user57508
    Commented Mar 10, 2015 at 9:55
  • 2
    MSDN does not say it is "not thread safe". They just don´t guarantee that. The reason might be that it is not tested or it might change with some future versions
    – Ondra
    Commented Mar 10, 2015 at 9:57
  • 1
    @nafas There is another problem... You must be sure that after the last write there was a MemoryBarrier, otherwise the read could read some incomplete data
    – xanatos
    Commented Mar 10, 2015 at 10:02

3 Answers 3

14

Normally (normally) collections that are used only for reading are "unofficially" thread safe (there is no collection in .NET that I know that modifies itself during reading). There are some caveats:

  • The items themselves could not be thread safe (but with an HashSet<T> this problem should be minimized, because you can't extract items from it. Still the GetHashCode() and the Equals() must be thread-safe. If, for example, they access lazy objects that are loaded on-demand, they could be not-thread safe, or perhaps they cache/memoize some data to speed-up subsequent operations)
  • You must be sure that after the last write there is a Thread.MemoryBarrier() (done in the same thread as the write) or equivalent, otherwise a read on another thread could read incomplete data
  • You must be sure that in each thread (different from the one where you did a write), before doing the first read there is a Thread.MemoryBarrier(). Note that if the HashSet<T> was "prepared" (with the Thread.MemoryBarrier() at the end) before creating/starting the other threads, then the Thread.MemoryBarrier() isn't necessary, because the threads can't have a stale read of the memory (because they didn't exist). Various operations cause an implicit Thread.MemoryBarrier(). For example if the threads where created before the HashSet<T> was filled, entered a Wait() and were un-Waited after the HashSet<T> was filled (plus its Thread.MemoryBarrier()), exiting a Wait() causes an implicit Thread.MemoryBarrier()

A simple example of a class that uses memoization/lazy loading/whatever you want to call it and in that way can break the thread safety.

public class MyClass
{
    private long value2;

    public int Value1 { get; set; }

    // Value2 is lazily loaded in a very primitive
    // way (note that Lazy<T> *can* be used thread-safely!)
    public long Value2
    {
        get
        {
            if (value2 == 0)
            {
                // value2 is a long. If the .NET is running at 32 bits,
                // the assignment of a long (64 bits) isn't atomic :)
                value2 = LoadFromServer();

                // If thread1 checks and see value2 == 0 and loads it,
                // and then begin writing value2 = (value), but after
                // writing the first 32 bits of value2 we have that
                // thread2 reads value2, then thread2 will read an
                // "incomplete" data. If this "incomplete" data is == 0
                // then a second LoadFromServer() will be done. If the
                // operation was repeatable then there won't be any 
                // problem (other than time wasted). But if the 
                // operation isn't repeatable, or if the incomplete 
                // data that is read is != 0, then there will be a
                // problem (for example an exception if the operation 
                // wasn't repeatable, or different data if the operation
                // wasn't deterministic, or incomplete data if the read
                // was != 0)
            }

            return value2;
        }
    }

    private long LoadFromServer()
    {
        // This is a slow operation that justifies a lazy property
        return 1; 
    }

    public override int GetHashCode()
    {
        // The GetHashCode doesn't use Value2, because it
        // wants to be fast
        return Value1;
    }

    public override bool Equals(object obj)
    {
        MyClass obj2 = obj as MyClass;

        if (obj2 == null)
        {
            return false;
        }

        // The equality operator uses Value2, because it
        // wants to be correct.
        // Note that probably the HashSet<T> doesn't need to
        // use the Equals method on Add, if there are no
        // other objects with the same GetHashCode
        // (and surely, if the HashSet is empty and you Add a
        // single object, that object won't be compared with
        // anything, because there isn't anything to compare
        // it with! :-) )

        // Clearly the Equals is used by the Contains method
        // of the HashSet
        return Value1 == obj2.Value1 && Value2 == obj2.Value2;
    }
}
3
  • +1 though I'd emphasise the "unofficially" a bit more. It's perfectly possible for reading operations to not be thread-safe due to memoisation, for example, though that isn't the case in this particular case, AFAICT.
    – Jon Hanna
    Commented Mar 10, 2015 at 10:12
  • @JonHanna I had added a Still the GetHashCode() and the Equals() must be thread-safe. If they, for example, access lazy objects that are loaded on-demand, they could be not-thread safe, or perhaps cache data to speed-up subsequent operations
    – xanatos
    Commented Mar 10, 2015 at 10:13
  • Yeah, I'm thinking more of the generalisation; nothing you say here is remotely wrong, but I can totally see someone reading this, and skipping past the "normally". One really needs to examine the code in question to be sure your answer applies.
    – Jon Hanna
    Commented Mar 10, 2015 at 10:22
6

Given that you are loading your set with values ahead of time, you can use the ImmutableHashSet<T> from the System.Collections.Immutable library. The immutable collections advertise themselves as thread safe, so we don't have to worry about the "unofficial" thread safety of the HashSet<T>.

var builder = ImmutableHashSet.CreateBuilder<string>(); // The builder is not thread safe

builder.Add("value1");
builder.Add("value2");

ImmutableHashSet<string> set = builder.ToImmutable();

...

if (set.Contains("value1")) // Thread safe operation
{
 ...
}
1
  • yes, that's good advice, and what I would use now. if memory serves they were not around 4 years ago. it was a read once check many scenario, and contains worked out of the box. I would not recommend it nowdays - there are better options. I read the source code, and it was actually ok though.
    – Jim
    Commented Dec 3, 2019 at 15:07
4

From Microsoft: Thread-Safe Collections

The .NET Framework 4 introduces the System.Collections.Concurrent namespace, which includes several collection classes that are both thread-safe and scalable. Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code. When you write new code, use the concurrent collection classes whenever multiple threads will write to the collection concurrently. If you are only reading from a shared collection, then you can use the classes in the System.Collections.Generic namespace. We recommend that you do not use 1.0 collection classes unless you are required to target the .NET Framework 1.1 or earlier runtime.

Since Contains does not modify the collection, it is merely a reading operation, and since HashSet is in System.Collections.Generic, calling Contains concurrently is absolutely fine.

Not the answer you're looking for? Browse other questions tagged or ask your own question.