12
\$\begingroup\$

Today I needed a method to replace all 'x' values in a said collection. Since there is no such method by default in .NET I wrote my own:

[Pure]
public static IEnumerable<T> Replace<T>(this IEnumerable<T> source, T oldValue, T newValue)
{
    if (source == null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    List<T> newValues = new List<T>();
    using (var enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            newValues.Add(enumerator.Current.Equals(oldValue) ? newValue : enumerator.Current);
        }
    }
    return newValues;
}

It's pretty short and does the job right in a reasonable amount of time.

I also tested few other solutions:

[Pure]
public static IEnumerable<T> Replace1<T>(this IEnumerable<T> source, T oldValue, T newValue)
{
    if (source == null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    using (var enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            yield return enumerator.Current.Equals(oldValue) ? newValue : enumerator.Current;
        }
    }
}

[Pure]
public static IEnumerable<T> Replace2<T>(this IEnumerable<T> source, T oldValue, T newValue)
{
    if (source == null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    List<T> newValues = new List<T>(source);
    for (int i = 0; i < newValues.Count; i++)
    {
        if (newValues[i].Equals(oldValue))
        {
            newValues[i] = newValue;
        }
    }
    return newValues;
}

But they are all slower than the former. The only advantage is in the function which utilizes iterator because it's a bit shorter but it's actually the slowest out of the three.


Here are my benchmark results for each method running 1 million times:

Replace

Slowest: 207ms

Average: 177.8ms

Fastest: 161ms

Replace 1

Slowest: 353ms

Average: 293.9ms

Fastest: 253ms

Replace 2

Slowest: 344ms

Average: 276.4ms

Fastest: 252ms


Feel free to comment on anything you feel like can be improved.

Here are some of my questions:

  1. Are there any flaws in my function?
  2. Are there any improvements that can be made?
  3. Why didn't Microsoft include this in the .NET framework?
\$\endgroup\$
1
  • 3
    \$\begingroup\$ Your List<T> constructing version has a terrible performance downside: SomeGiantSequence.Replace("bad", "good").Take(5); \$\endgroup\$
    – Caleth
    Commented Jun 19, 2017 at 7:35

3 Answers 3

25
\$\begingroup\$

I cannot comment about the speed because I didn't benchmark it but there are two obvious improvements.

  • add an IEqualityComparer<T> as a third parameter so that you can use a custom one if necessary (e.g. own typs)
  • use foreach instead of the enumerator

Example:

public static IEnumerable<T> Replace<T>(this IEnumerable<T> source, T oldValue, T newValue, IEqualityComparer<T> comparer = null)
{       
    if (source == null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    comparer = comparer ?? EqualityComparer<T>.Default;

    foreach (var item in source)
    {
        yield return
            comparer.Equals(item, oldValue)
                ? newValue
                : item;
    }
}

What I don't like about the version that use lists internally is that they return IEnumerable<T> instead of those lists at least at IList<T>. Having an IEnumerable<T> I think that this result is deferred and I most likely call .ToList() again so the results will be enumerated again. If I see something concrete I know all items are already there and the enumeration does not have to be executed.


There is actually one more way to do it, with a Select and a query:

return 
    from item in source
    select comparer.Equals(item, oldValue) ? newValue : item;

With LINQ you can get the job done quickly. Is it sometimes slower? Maybe, but who cares? Most probably no one will ever notice that unless you test it with an insane number of elements although your production code will never ever see that many.

If you need to optimize something then run a profiler first. Why? Because if we assume for a moment that we use a poor comparer here and you only work on optimizing the loops you'll never find out that actually the comparer is making your code running slow, no matter how hard you try. People say LINQ would be slow so you focus on optimizing its queries but not the comparer bottleneck. Usually there is something else that kills the performance like reflection, poor comparers, string concatenation or other calculations and only rarely LINQ.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ Comments are not for extended discussion; this conversation has been moved to chat. \$\endgroup\$ Commented Jun 19, 2017 at 16:58
32
\$\begingroup\$

Why didn't Microsoft include this in the .NET framework?

Probably because it can be realized in one line with Select:

source.Select(x => x.Equals(oldValue) ? newValue : x)

or with 2 lines as extension method if the function is needed frequently.

public static IEnumerable<T> Replace<T>(this IEnumerable<T> source, T oldValue, T newValue)
    => source.Select(x => x.Equals(oldValue) ? newValue : x)

That's the nice thing of enumerable methods. There are just some discrete methods available, but those can be parameterized / combined to realize the most use cases. I really like the compactness and elegance of the LINQ API :).

\$\endgroup\$
2
  • \$\begingroup\$ Thanks for answering one of the more interesting questions I had. I didn't thought of this usage of the Select function. \$\endgroup\$
    – Denis
    Commented Jun 18, 2017 at 20:28
  • 4
    \$\begingroup\$ @Denis: It helps to keep in mind that Select is called map in other programming languages. You can use Select every time you want to apply some kind of transformation to all items. \$\endgroup\$
    – Heinzi
    Commented Jun 19, 2017 at 11:33
4
\$\begingroup\$

Maybe late answer but we focused on that manual iteration instead of foreach and we almost forgot something very important: you're optimizing the wrong peace of code.

You're now invoking the non-generic Object.Equals(Object) method because you didn't add any constraint on your generic type. For sure Object.Equals(Object) has to cast the argument with as T to determine if it's the correct type, then it has to compare with null. Then normal comparison may start (in most cases probably delegating the job to T.Equals(T). Now let's imagine you're working with IEnumerable<int> (or any other value type). This will cause boxing and unboxing and you can imagine the relative overhead of this.

public static IEnumerable<T> Replace<T>(this IEnumerable<T> source, T oldValue, T newValue)
    where T : IEquatable<T>

Now you can use Object.Equals() without that (relatively) huge performance penalty. Unfortunately in C# we can't have two overloads which differentiate only by generic parameters constraints...

Note that the version accepting an external equality comparer does not suffer of this problem. Is it slower? Yes, it is few nanoseconds slower. Does it matter? See later...


Also we're talking about an optimization of 50/100 ms for a list of 1,000,000 elements...before dropping foreach (or LINQ) because of this I'd consider to provide specialized overloaded versions. IEnumerable<T> input is the generic case but I can't believe you are working with such performance sensitive code and your data structure is an enumeration...

For example you resize your list multiple times, this will greatly (together with equality comparison) hit performance. If your input is IList<T> then you can pre-allocate the list with the correct value (try this and compare):

public static IList<T> Replace<T>(this IList<T> source, T oldValue, T newValue)
{
    if (source == null)
        throw new ArgumentNullException(nameof(source));

    var newValues = new List<T>(source.Count);
    foreach (var value in source)
        list.Add(Object.Equals(value, oldValue) ? newValue : value);

    return newValues;
}

Do you have T[]? Even better, you do not need any enumeration and a plain for will work.

As Caleth noted if you will use something like Take(n) on returned list then a lazy enumeration is a much much better choice (when outside a fictional performance test).

Why didn't Microsoft include this in the .NET framework?

It's just my guess (maybe Eric Lippert will give us an authoritative explanation) but I suppose that it's pretty easy to do in one line with Select() (see JanDotNet's answer) and another function to write/document/test wouldn't add any value but save 20 characters (for a not so common task, at least in any imperative non-functional language where you more often need to modify the collection in-place).


A note about benchmark. A quick rough test gave pretty different results. Can you show your test code? Cache and list size may play an important role (at least you should check performance for - let's say - 10, 1000 and 100,000 items (and more). Branch prediction will also have some role in such small function then test should be repeated multiple times with randomized inputs (compared to some well-known patterns). Also unloading the process after each test and ignoring first invocation because of JIT.

\$\endgroup\$
1
  • \$\begingroup\$ Good catch on the Object.Equals(Object). I was also hoping to see Eric Lippert's comment on this question, but I got really good feedback from the current answers anyway. I usually add one more overload for extra IEqualityCompare<T> injection but I don't know why I skipped that this time. \$\endgroup\$
    – Denis
    Commented Jun 19, 2017 at 15:46

Not the answer you're looking for? Browse other questions tagged or ask your own question.