149

I have the following item set from an XML:

id           category

5            1
5            3
5            4
5            3
5            3

I need a distinct list of these items:

5            1
5            3
5            4

How can I distinct for Category AND Id too in LINQ?

8 Answers 8

239

Are you trying to be distinct by more than one field? If so, just use an anonymous type and the Distinct operator and it should be okay:

var query = doc.Elements("whatever")
               .Select(element => new {
                             id = (int) element.Attribute("id"),
                             category = (int) element.Attribute("cat") })
               .Distinct();

If you're trying to get a distinct set of values of a "larger" type, but only looking at some subset of properties for the distinctness aspect, you probably want DistinctBy as implemented in MoreLINQ in DistinctBy.cs:

 public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
     this IEnumerable<TSource> source,
     Func<TSource, TKey> keySelector,
     IEqualityComparer<TKey> comparer)
 {
     HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
     foreach (TSource element in source)
     {
         if (knownKeys.Add(keySelector(element)))
         {
             yield return element;
         }
     }
 }

(If you pass in null as the comparer, it will use the default comparer for the key type.)

2
  • Oh so by "larger type" you may mean I still want all properties in the result even though I only want to compare a few properties to determine distinctness? Commented Sep 20, 2016 at 14:31
  • @TheRedPea: Yes, exactly.
    – Jon Skeet
    Commented Sep 20, 2016 at 14:46
36

Just use the Distinct() with your own comparer.

http://msdn.microsoft.com/en-us/library/bb338049.aspx

0
31

In addition to Jon Skeet's answer, you can also use the group by expressions to get the unique groups along w/ a count for each groups iterations:

var query = from e in doc.Elements("whatever")
            group e by new { id = e.Key, val = e.Value } into g
            select new { id = g.Key.id, val = g.Key.val, count = g.Count() };
1
  • 6
    You wrote "in addition to Jon Skeet's answer"... I don't know if such a thing is possible. ;) Commented Dec 12, 2018 at 21:50
13

For any one still looking; here's another way of implementing a custom lambda comparer.

public class LambdaComparer<T> : IEqualityComparer<T>
    {
        private readonly Func<T, T, bool> _expression;

        public LambdaComparer(Func<T, T, bool> lambda)
        {
            _expression = lambda;
        }

        public bool Equals(T x, T y)
        {
            return _expression(x, y);
        }

        public int GetHashCode(T obj)
        {
            /*
             If you just return 0 for the hash the Equals comparer will kick in. 
             The underlying evaluation checks the hash and then short circuits the evaluation if it is false.
             Otherwise, it checks the Equals. If you force the hash to be true (by assuming 0 for both objects), 
             you will always fall through to the Equals check which is what we are always going for.
            */
            return 0;
        }
    }

you can then create an extension for the linq Distinct that can take in lambda's

   public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list,  Func<T, T, bool> lambda)
        {
            return list.Distinct(new LambdaComparer<T>(lambda));
        }  

Usage:

var availableItems = list.Distinct((p, p1) => p.Id== p1.Id);
2
  • Looking at the reference source, Distinct uses a hash set to store elements it has already yielded. Always returning the same hash code means that every previously returned element is examined every time. A more robust hash code would speed things up because it would only compare against elements in the same hash bucket. Zero is a reasonable default, but it might be worth supporting a second lambda for the hash code.
    – Darryl
    Commented Jul 1, 2017 at 2:50
  • Good point! I will try edit when I get time, if you are working in this domain at the moment, feel free to edit Commented Jul 1, 2017 at 3:44
9

I'm a bit late to the answer, but you may want to do this if you want the whole element, not only the values you want to group by:

var query = doc.Elements("whatever")
               .GroupBy(element => new {
                             id = (int) element.Attribute("id"),
                             category = (int) element.Attribute("cat") })
               .Select(e => e.First());

This will give you the first whole element matching your group by selection, much like Jon Skeets second example using DistinctBy, but without implementing IEqualityComparer comparer. DistinctBy will most likely be faster, but the solution above will involve less code if performance is not an issue.

4
// First Get DataTable as dt
// DataRowComparer Compare columns numbers in each row & data in each row

IEnumerable<DataRow> Distinct = dt.AsEnumerable().Distinct(DataRowComparer.Default);

foreach (DataRow row in Distinct)
{
    Console.WriteLine("{0,-15} {1,-15}",
        row.Field<int>(0),
        row.Field<string>(1)); 
}
2

Since we are talking about having every element exactly once, a "set" makes more sense to me.

Example with classes and IEqualityComparer implemented:

 public class Product
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public Product(int x, string y)
        {
            Id = x;
            Name = y;
        }
    }

    public class ProductCompare : IEqualityComparer<Product>
    {
        public bool Equals(Product x, Product y)
        {  //Check whether the compared objects reference the same data.
            if (Object.ReferenceEquals(x, y)) return true;

            //Check whether any of the compared objects is null.
            if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
                return false;

            //Check whether the products' properties are equal.
            return x.Id == y.Id && x.Name == y.Name;
        }
        public int GetHashCode(Product product)
        {
            //Check whether the object is null
            if (Object.ReferenceEquals(product, null)) return 0;

            //Get hash code for the Name field if it is not null.
            int hashProductName = product.Name == null ? 0 : product.Name.GetHashCode();

            //Get hash code for the Code field.
            int hashProductCode = product.Id.GetHashCode();

            //Calculate the hash code for the product.
            return hashProductName ^ hashProductCode;
        }
    }

Now

List<Product> originalList = new List<Product> {new Product(1, "ad"), new Product(1, "ad")};
var setList = new HashSet<Product>(originalList, new ProductCompare()).ToList();

setList will have unique elements

I thought of this while dealing with .Except() which returns a set-difference

1

Since .NET 6 DistinctBy from the framework itself can be used. Something along these lines (leveraging value tuples):

var query = doc.Elements("whatever")
   .DistinctBy(e => ((int) e.Attribute("id"), (int) e.Attribute("cat")))

Not the answer you're looking for? Browse other questions tagged or ask your own question.