What .NET collection provides the fastest search

Question

I have 60k items that need to be checked against a 20k lookup list. Is there a collection object (like List, HashTable) that provides an exceptionly fast Contains() method? Or will I have to write my own? In otherwords, is the default Contains() method just scan each item or does it use a better search algorithm.

foreach (Record item in LargeCollection)
{
    if (LookupCollection.Contains(item.Key))
    {
       // Do something
    }
}

Note. The lookup list is already sorted.

Contains for List doesn't work for list of objects because it's comparing references. — Fiur, Commented Jun 17, 2009 at 19:40
As an aside, if your elements are in a meaningful order and are pretty evenly distributed, you can do a binary search much faster by having your first guesses be within an estimated range of your item. This may or may not have any meaning for your specific application. — Brian, Commented Jun 17, 2009 at 21:54
Don't forget about System.Collections.Generic.SortedList(TKey, TValue) if you want to simplify this stuff but avoid a hashset. — Brian, Commented Jun 17, 2009 at 21:58

Jimmy · Accepted Answer · 2017-09-06 17:00:49Z

171

In the most general case, consider System.Collections.Generic.HashSet as your default "Contains" workhorse data structure, because it takes constant time to evaluate Contains.

The actual answer to "What is the fastest searchable collection" depends on your specific data size, ordered-ness, cost-of-hashing, and search frequency.

edited Sep 6, 2017 at 17:00

answered Jun 17, 2009 at 19:38

Jimmy

90.7k17 gold badges120 silver badges138 bronze badges

41

Note: Do not forget to override the hashcode function. For added performance, pregenerate your hashcode in your constructor.
– Brian
Commented Jun 17, 2009 at 19:51
1

@Brian: good point. I was assuming (baselessly) Record.Key was a builtin type of some kind.
– Jimmy
Commented Jun 17, 2009 at 19:59
3

@Brian: instead of pregenerating I prefer to store the generated one the first time, why to slow down the constructor with something you don't know if it will be used?
– jmservera
Commented Jun 17, 2009 at 20:48
10

FYI: Performance test - I created a comparison between List<T> and HashSet<T> for strings. I found that HashSet was about 1000 times faster than List.
– Quango
Commented Sep 5, 2010 at 19:29
12

@Quango: 3 years later, but really if you don't specify the size of your data set this performance comparison means nothing: Hashsets have O(1) search, lists have O(n) search, so the performance ratio is proportional to n.
– Clément
Commented Apr 22, 2013 at 15:34

| Show 4 more comments

SLaks · Accepted Answer · 2012-01-08 13:59:40Z

78

If you don't need ordering, try HashSet<Record> (new to .Net 3.5)

If you do, use a List<Record> and call BinarySearch.

edited Jan 8, 2012 at 13:59

answered Jun 17, 2009 at 19:40

SLaks

882k178 gold badges1.9k silver badges2k bronze badges

9

Or, in .NET >= 4, use SortedSet
– StriplingWarrior
Commented Oct 19, 2012 at 18:18
2

Or better yet, ImmutableSortedSet from System.ImmutableCollections
– Alexei Sosin
Commented Aug 5, 2018 at 15:23

Add a comment |

Liam · Accepted Answer · 2013-11-08 17:26:58Z

25

Have you considered List.BinarySearch(item)?

You said that your large collection is already sorted so this seems like the perfect opportunity? A hash would definitely be the fastest, but this brings about its own problems and requires a lot more overhead for storage.

edited Nov 8, 2013 at 17:26

Liam

29k28 gold badges133 silver badges197 bronze badges

answered Jun 17, 2009 at 19:44

Mark

8828 silver badges12 bronze badges

1

You are right, a hash may bring some undesirable problems when using mutable objects as a key.
– jmservera
Commented Jun 17, 2009 at 20:44

Add a comment |

user3810900user3810900 · Accepted Answer · 2014-11-27 07:23:35Z

13

You should read this blog that speed tested several different types of collections and methods for each using both single and multi-threaded techniques.

According to the results, a BinarySearch on a List and SortedList were the top performers constantly running neck-in-neck when looking up something as a "value".

When using a collection that allows for "keys", the Dictionary, ConcurrentDictionary, Hashset, and HashTables performed the best overall.

answered Nov 27, 2014 at 7:23

user3810900

Add a comment |

Tod · Accepted Answer · 2021-11-03 20:24:26Z

I've put a test together:

First - 3 chars with all of the possible combinations of A-Z0-9
Fill each of the collections mentioned here with those strings
Finally - search and time each collection for a random string (same string for each collection).

This test simulates a lookup when there is guaranteed to be a result.

Then I changed the initial collection from all possible combinations to only 10,000 random 3 character combinations, this should induce a 1 in 4.6 hit rate of a random 3 char lookup, thus this is a test where there isn't guaranteed to be a result, and ran the test again:

IMHO HashTable, although fastest, isn't always the most convenient; working with objects. But a HashSet is so close behind it's probably the one to recommend.

Just for fun (you know FUN) I ran with 1.68M rows (4 characters):

clemahieu · Accepted Answer · 2009-06-17 20:10:50Z

4

Keep both lists x and y in sorted order.

If x = y, do your action, if x < y, advance x, if y < x, advance y until either list is empty.

The run time of this intersection is proportional to min (size (x), size (y))

Don't run a .Contains () loop, this is proportional to x * y which is much worse.

answered Jun 17, 2009 at 20:10

clemahieu

1,4199 silver badges9 bronze badges

+1 for the more efficient algorithm. Even if the lists are currently unsorted, it would be more efficient to first sort them and then run this algorithm.
– Matt Boehm
Commented Jun 18, 2009 at 17:03
Wouldn't the runtime be proportional to max(size(x),size(y)) in the worst case scenario though? Example: int[] x = {99,100}; int[] y = {0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
– Matt Boehm
Commented Jun 18, 2009 at 17:14
No because once you complete the smaller set, you can append the remaining elements from the larger set because they are already sorted. I think this process is similar to Merge Sort.
– Brig Lamoreaux
Commented Jun 19, 2009 at 13:24

Add a comment |

Rich Schuler · Accepted Answer · 2009-06-17 19:51:28Z

3

If it's possible to sort your items then there is a much faster way to do this then doing key lookups into a hashtable or b-tree. Though if you're items aren't sortable you can't really put them into a b-tree anyway.

Anyway, if sortable sort both lists then it's just a matter of walking the lookup list in order.

Walk lookup list
   While items in check list <= lookup list item
     if check list item = lookup list item do something
   Move to next lookup list item

answered Jun 17, 2009 at 19:51

Rich Schuler

41.9k6 gold badges73 silver badges59 bronze badges

Yes, so true. If you have two sorted lists you only need to traverse each once.
– denver
Commented Jun 18, 2015 at 19:56

Add a comment |

Brian · Accepted Answer · 2009-06-17 19:57:38Z

3

If you're using .Net 3.5, you can make cleaner code using:

foreach (Record item in LookupCollection.Intersect(LargeCollection))
{
  //dostuff
}

I don't have .Net 3.5 here and so this is untested. It relies on an extension method. Not that LookupCollection.Intersect(LargeCollection) is probably not the same as LargeCollection.Intersect(LookupCollection) ... the latter is probably much slower.

This assumes LookupCollection is a HashSet

answered Jun 17, 2009 at 19:57

Brian

25.8k18 gold badges84 silver badges176 bronze badges

Add a comment |

Robert Horvick · Accepted Answer · 2009-06-17 19:48:22Z

If you aren't worried about squeaking every single last bit of performance the suggestion to use a HashSet or binary search is solid. Your datasets just aren't large enough that this is going to be a problem 99% of the time.

But if this just one of thousands of times you are going to do this and performance is critical (and proven to be unacceptable using HashSet/binary search), you could certainly write your own algorithm that walked the sorted lists doing comparisons as you went. Each list would be walked at most once and in the pathological cases wouldn't be bad (once you went this route you'd probably find that the comparison, assuming it's a string or other non-integral value, would be the real expense and that optimizing that would be the next step).

juFo · Accepted Answer · 2024-02-29 10:39:59Z

1

In case of .NET 8 you might also consider using System.Buffers.SearchValues<T>

https://learn.microsoft.com/en-us/dotnet/api/system.buffers.searchvalues-1?view=net-8.0

and also for .NET 8 you might have a look at System.Collections.Frozen.FrozenSet<T> or FrozenDictionary<TKey,TValue>:

https://learn.microsoft.com/en-us/dotnet/api/system.collections.frozen.frozenset-1?view=net-8.0

edited Feb 29 at 10:39

answered Feb 12 at 15:04

juFo

18.4k10 gold badges108 silver badges150 bronze badges

Add a comment |

Collectives™ on Stack Overflow

What .NET collection provides the fastest search

10 Answers 10

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

Linked

Related