Checking if all elements in a list are unique

Question

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

yan · Accepted Answer · 2011-03-11 20:47:21Z

233

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

answered Mar 11, 2011 at 20:47

yan

20.8k3 gold badges39 silver badges48 bronze badges

This is what I do as well. Probably not efficient for large lists although.
– tkerwin
Commented Mar 11, 2011 at 20:49
Not necessarily, that will execute the body of the conditional if the list has repeating elements (the "#do something" in the example).
– yan
Commented Mar 11, 2011 at 20:49
2

Fair enough, good solution. I am handling barely < 500 elements, so this should do what I want.
– user225312
Commented Mar 11, 2011 at 20:54
4

For those worried about efficiency with long lists, this is efficient for long lists that are actually unique (where all elements need checking). Early exit solutions take longer (roughly 2x longer in my tests) for actually unique lists. So... if you expect most of your lists to be unique, use this simple set length checking solution. If you expect most of your lists to NOT be unique, use an early exit solution. Which one to use depends on your use case.
– Russ
Commented Oct 17, 2017 at 14:08
6

This answer is nice. However, let's be careful here: len(x) > len(set(x)) is True when the elements in x are NOT unique. This question's title asks exactly the opposite: "Checking if all elements in a list are unique"
– WhyWhat
Commented Apr 25, 2020 at 20:54

Add a comment |

PaulMcG · Accepted Answer · 2012-12-26 21:16:07Z

120

+50

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

edited Dec 26, 2012 at 21:16

answered Mar 12, 2011 at 9:12

PaulMcG

63.4k16 gold badges94 silver badges131 bronze badges

10

+1 clean and doesn't iterate through the whole list if not needed.
– Kos
Commented Nov 29, 2012 at 15:49
1

@paul-mcguire: Would you be willing to license this code snippet under an Apache 2.0-compatible license (e.g., Apache 2, 2/3-line BSD, MIT, X11, zlib). I'd like to use it in an Apache 2.0 project I'm using, and because StackOverflow's licensing terms are fubar, I'm asking you as the original author.
– Ryan Parman
Commented Sep 25, 2016 at 1:22
I've put out other code using MIT license, so that works for me for this snippet. Anything special I need to do?
– PaulMcG
Commented Sep 25, 2016 at 4:32

Add a comment |

6502 · Accepted Answer · 2011-03-11 21:08:22Z

21

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

edited Mar 11, 2011 at 21:08

answered Mar 11, 2011 at 20:50

6502

114k16 gold badges171 silver badges271 bronze badges

I accepted the other answer as I was not particularly looking for optimization.
– user225312
Commented Mar 11, 2011 at 21:00
2

You can shorten this by putting the following line after s = set()... return not any(s.add(x) if x not in s else True for x in g)
– Andrew Clark
Commented Mar 11, 2011 at 21:42
Could you explain why you would expect len(x) != len(set(x)) to be faster than this if early-exiting is not common? Aren't both operations O(len(x))? (where x is the original list)
– Chris Redford
Commented May 12, 2012 at 14:55
Oh, I see: your method is not O(len(x)) because you check if x in s inside of the O(len(x)) for loop.
– Chris Redford
Commented May 12, 2012 at 14:58

Add a comment |

jassinm · Accepted Answer · 2012-11-29 22:50:58Z

18

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

edited Nov 29, 2012 at 22:50

answered Nov 29, 2012 at 20:29

jassinm

7,4633 gold badges34 silver badges42 bronze badges

Add a comment |

Grzegorz Oledzki · Accepted Answer · 2011-03-11 20:48:56Z

15

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)

answered Mar 11, 2011 at 20:48

Grzegorz Oledzki

24.1k16 gold badges70 silver badges110 bronze badges

1

Answered one second after yan, ouch. Short and sweet. Any reasons why not to use this solution?
– jasonleonhard
Commented Sep 19, 2017 at 21:25
Not all sequences (generators especially) support len().
– PaulMcG
Commented Apr 18, 2018 at 12:42

Add a comment |

Tugrul Ates · Accepted Answer · 2011-03-11 20:50:55Z

8

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)

answered Mar 11, 2011 at 20:50

Tugrul Ates

9,6162 gold badges35 silver badges60 bronze badges

14

I see absolutely no advantage to using a dict over a set. Seems to unnecessarily complicate things.
– metasoarous
Commented Feb 21, 2017 at 8:14

Add a comment |

Nico Schlömer · Accepted Answer · 2023-01-03 16:46:03Z

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

PaulMcG · Accepted Answer · 2012-12-27 04:34:26Z

3

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

answered Dec 27, 2012 at 4:34

PaulMcG

63.4k16 gold badges94 silver badges131 bronze badges

hashing is faster than sorting
– IceArdor
Commented Oct 23, 2014 at 3:26
Came here to post the same solution using groupby and found this answer. I find this most elegant, since this a single expression and works with the built-in tools without requiring any extra variable or loop-statement.
– Lars Blumberg
Commented Nov 21, 2019 at 21:10
1

If your list contains arbitrary objects which are not sortable, you can use the id() function to sort them as this is a prerequisite for groupby() to work: groupby(sorted(seq), key=id)
– Lars Blumberg
Commented Nov 21, 2019 at 21:19

Add a comment |

mhourdakis · Accepted Answer · 2013-04-28 16:12:46Z

3

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.

answered Apr 28, 2013 at 16:12

mhourdakis

1898 bronze badges

1

H in T does a linear search, and T = L[1:] copies the sliced part of the list, so this will be much slower than the other solutions that have been suggested on big lists. It is O(N^2) I think, while most of the others are O(N) (sets) or O(N log N) (sorting based solutions).
– Blckknght
Commented Apr 28, 2013 at 17:36
@Blckknght is the slice part a view?
– qwr
Commented Mar 20 at 13:04
@qwr: It depends on what type of container L is. If it's a list, it makes a new list as a copy, not a view. If it's a numpy array, you do get a view (but individual item access is relatively less efficient compared to other kinds of numpy operations).
– Blckknght
Commented Mar 20 at 21:37

Add a comment |

Karol · Accepted Answer · 2014-12-14 05:51:03Z

3

Here is a recursive O(N²) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

answered Dec 14, 2014 at 5:51

Karol

1,2562 gold badges13 silver badges21 bronze badges

Add a comment |

buhtz · Accepted Answer · 2022-05-30 18:42:58Z

2

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique, False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

edited May 30, 2022 at 18:42

buhtz

11.8k19 gold badges83 silver badges173 bronze badges

answered Sep 12, 2019 at 12:37

ArunPratap

4,9647 gold badges27 silver badges44 bronze badges

Add a comment |

yilmazhuseyin · Accepted Answer · 2012-11-08 09:03:02Z

1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

answered Nov 8, 2012 at 9:03

yilmazhuseyin

6,5724 gold badges36 silver badges39 bronze badges

Add a comment |

Tom · Accepted Answer · 2022-03-18 16:59:45Z

1

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique

answered Mar 18, 2022 at 16:59

Tom

1119 bronze badges

Add a comment |

canisrufus · Accepted Answer · 2011-03-11 20:51:09Z

0

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

answered Mar 11, 2011 at 20:51

canisrufus

6632 gold badges8 silver badges20 bronze badges

this f5 function will be slower than using set which is better optimized for speed. This code starts to break when the list gets really large due to the expensive "append" operation. with large lists like x = range(1000000) + range(1000000), running set(x) is faster than f5(x). Order is not a requirement in the question but even running sorted(set(x)) is still faster than f5(x)
– Okezie
Commented Aug 16, 2014 at 21:50

Add a comment |

user1718097 · Accepted Answer · 2016-04-19 22:38:59Z

0

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

answered Apr 19, 2016 at 22:38

user1718097

4,28011 gold badges51 silver badges64 bronze badges

Add a comment |

horseshoe · Accepted Answer · 2021-11-29 14:15:19Z

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

Chris · Accepted Answer · 2022-02-25 15:57:37Z

0

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

answered Feb 25, 2022 at 15:57

Chris

6,1336 gold badges48 silver badges59 bronze badges

Add a comment |

Dr.CKYHC · Accepted Answer · 2024-07-09 19:17:53Z

Sometimes you not just need to check whether all items are unique, but also get the first not unique item. Even more - you might need to get the indication for each item whether it's unique or not.

In this case the following function might help (it provides 2 modes - see the comments above the function):

# if extMode is False (default) -> returns index of 1st item from in_data, which is not unique (i.e. repeats some previous item), or -1 if no duplicates found
# if extMode is True -> returns dict with key = index of item from in_data, value = True if the item is unique (otherwise False)
# Note: in_data should be iterable
def checkDuplicates(in_data, extMode = False):
    if None == in_data:
        return {} if extMode else -1 # depending on your needs here could also return None instead of {}
    r = {}
    s = set()
    c = -1
    for i in in_data:
        c += 1
        if i in s: # duplicate found
            if not extMode:
                return c
            r[c] = False
        else: # not duplicating item
            if extMode:
                r[c] = True
            s.add(i)
    if extMode:
        return r
    return -1

Nikolay Fominyh · Accepted Answer · 2016-03-17 11:20:28Z

-3

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

edited Mar 17, 2016 at 11:20

Nikolay Fominyh

9,1669 gold badges70 silver badges103 bronze badges

answered Nov 4, 2015 at 14:37

DonChriss

25

I like this answer, just because it shows quite well what code you don't have to write when using a set. I wouldn't label it "for beginners", as I believe beginners should learn to do it the right way up front; but I met some unexperienced developers who were used to writing such code in other languages.
– cessor
Commented Jan 9, 2020 at 16:26

Add a comment |

Collectives™ on Stack Overflow

Checking if all elements in a list are unique

19 Answers 19

Not the answer you're looking for? Browse other questions tagged
python
algorithm
list
unique
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

Not the answer you're looking for? Browse other questions tagged pythonalgorithmlistunique or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
algorithm
list
unique
or ask your own question.