151

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

0

19 Answers 19

233

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

5
  • This is what I do as well. Probably not efficient for large lists although.
    – tkerwin
    Commented Mar 11, 2011 at 20:49
  • Not necessarily, that will execute the body of the conditional if the list has repeating elements (the "#do something" in the example).
    – yan
    Commented Mar 11, 2011 at 20:49
  • 2
    Fair enough, good solution. I am handling barely < 500 elements, so this should do what I want.
    – user225312
    Commented Mar 11, 2011 at 20:54
  • 4
    For those worried about efficiency with long lists, this is efficient for long lists that are actually unique (where all elements need checking). Early exit solutions take longer (roughly 2x longer in my tests) for actually unique lists. So... if you expect most of your lists to be unique, use this simple set length checking solution. If you expect most of your lists to NOT be unique, use an early exit solution. Which one to use depends on your use case.
    – Russ
    Commented Oct 17, 2017 at 14:08
  • 6
    This answer is nice. However, let's be careful here: len(x) > len(set(x)) is True when the elements in x are NOT unique. This question's title asks exactly the opposite: "Checking if all elements in a list are unique"
    – WhyWhat
    Commented Apr 25, 2020 at 20:54
120
+50

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
3
  • 10
    +1 clean and doesn't iterate through the whole list if not needed.
    – Kos
    Commented Nov 29, 2012 at 15:49
  • 1
    @paul-mcguire: Would you be willing to license this code snippet under an Apache 2.0-compatible license (e.g., Apache 2, 2/3-line BSD, MIT, X11, zlib). I'd like to use it in an Apache 2.0 project I'm using, and because StackOverflow's licensing terms are fubar, I'm asking you as the original author. Commented Sep 25, 2016 at 1:22
  • I've put out other code using MIT license, so that works for me for this snippet. Anything special I need to do?
    – PaulMcG
    Commented Sep 25, 2016 at 4:32
21

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

4
  • I accepted the other answer as I was not particularly looking for optimization.
    – user225312
    Commented Mar 11, 2011 at 21:00
  • 2
    You can shorten this by putting the following line after s = set()... return not any(s.add(x) if x not in s else True for x in g) Commented Mar 11, 2011 at 21:42
  • Could you explain why you would expect len(x) != len(set(x)) to be faster than this if early-exiting is not common? Aren't both operations O(len(x))? (where x is the original list) Commented May 12, 2012 at 14:55
  • Oh, I see: your method is not O(len(x)) because you check if x in s inside of the O(len(x)) for loop. Commented May 12, 2012 at 14:58
18

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)
15

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)
2
  • 1
    Answered one second after yan, ouch. Short and sweet. Any reasons why not to use this solution? Commented Sep 19, 2017 at 21:25
  • Not all sequences (generators especially) support len().
    – PaulMcG
    Commented Apr 18, 2018 at 12:42
8

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)
1
  • 14
    I see absolutely no advantage to using a dict over a set. Seems to unnecessarily complicate things. Commented Feb 21, 2017 at 8:14
8

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

enter image description here

enter image description here


Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()
3

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

3
  • hashing is faster than sorting
    – IceArdor
    Commented Oct 23, 2014 at 3:26
  • Came here to post the same solution using groupby and found this answer. I find this most elegant, since this a single expression and works with the built-in tools without requiring any extra variable or loop-statement. Commented Nov 21, 2019 at 21:10
  • 1
    If your list contains arbitrary objects which are not sortable, you can use the id() function to sort them as this is a prerequisite for groupby() to work: groupby(sorted(seq), key=id) Commented Nov 21, 2019 at 21:19
3

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.

3
  • 1
    H in T does a linear search, and T = L[1:] copies the sliced part of the list, so this will be much slower than the other solutions that have been suggested on big lists. It is O(N^2) I think, while most of the others are O(N) (sets) or O(N log N) (sorting based solutions).
    – Blckknght
    Commented Apr 28, 2013 at 17:36
  • @Blckknght is the slice part a view?
    – qwr
    Commented Mar 20 at 13:04
  • @qwr: It depends on what type of container L is. If it's a list, it makes a new list as a copy, not a view. If it's a numpy array, you do get a view (but individual item access is relatively less efficient compared to other kinds of numpy operations).
    – Blckknght
    Commented Mar 20 at 21:37
3

Here is a recursive O(N2) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True
2

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique, False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False
1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1
1

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique
0

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

1
  • this f5 function will be slower than using set which is better optimized for speed. This code starts to break when the list gets really large due to the expensive "append" operation. with large lists like x = range(1000000) + range(1000000), running set(x) is faster than f5(x). Order is not a requirement in the question but even running sorted(set(x)) is still faster than f5(x)
    – Okezie
    Commented Aug 16, 2014 at 21:50
0

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

0

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

0

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

0

Sometimes you not just need to check whether all items are unique, but also get the first not unique item. Even more - you might need to get the indication for each item whether it's unique or not.

In this case the following function might help (it provides 2 modes - see the comments above the function):

# if extMode is False (default) -> returns index of 1st item from in_data, which is not unique (i.e. repeats some previous item), or -1 if no duplicates found
# if extMode is True -> returns dict with key = index of item from in_data, value = True if the item is unique (otherwise False)
# Note: in_data should be iterable
def checkDuplicates(in_data, extMode = False):
    if None == in_data:
        return {} if extMode else -1 # depending on your needs here could also return None instead of {}
    r = {}
    s = set()
    c = -1
    for i in in_data:
        c += 1
        if i in s: # duplicate found
            if not extMode:
                return c
            r[c] = False
        else: # not duplicating item
            if extMode:
                r[c] = True
            s.add(i)
    if extMode:
        return r
    return -1
-3

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True
1
  • I like this answer, just because it shows quite well what code you don't have to write when using a set. I wouldn't label it "for beginners", as I believe beginners should learn to do it the right way up front; but I met some unexperienced developers who were used to writing such code in other languages.
    – cessor
    Commented Jan 9, 2020 at 16:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.