92

How can I test if a list contains another list (ie. it's a contiguous subsequence). Say there was a function called contains:

contains([1,2], [-1, 0, 1, 2]) # Returns [2, 3] (contains returns [start, end])
contains([1,3], [-1, 0, 1, 2]) # Returns False
contains([1, 2], [[1, 2], 3]) # Returns False
contains([[1, 2]], [[1, 2], 3]) # Returns [0, 0]

Edit:

contains([2, 1], [-1, 0, 1, 2]) # Returns False
contains([-1, 1, 2], [-1, 0, 1, 2]) # Returns False
contains([0, 1, 2], [-1, 0, 1, 2]) # Returns [1, 3]
7
  • 6
    For what it's worth, returning [start, end+1] is more pythonic as it looks like a slice -- (end+1)-start gives the length of what is found. Commented Oct 2, 2010 at 21:01
  • 16
    This looks like a bad design - sometimes the function returns a bool, sometimes it returns a list. That makes it very hard to use since you have to check the return type before you can do anything with the result. IMHO a function called "contains" should only return True or False.
    – Dave Kirby
    Commented Oct 2, 2010 at 21:01
  • 2
    It's kinda sad that lists don't have the needed functionality built-in, but strings do (str.find). Commented Oct 2, 2010 at 21:12
  • 4
    Why would this, for any reason, return a list and not a tuple!?
    – Grant Paul
    Commented Oct 3, 2010 at 0:11
  • 1
    related: Best Way To Determine if a Sequence is in another sequence in Python
    – jfs
    Commented Mar 23, 2017 at 23:20

19 Answers 19

102

If all items are unique, you can use sets.

>>> items = set([-1, 0, 1, 2])
>>> set([1, 2]).issubset(items)
True
>>> set([1, 3]).issubset(items)
False
11
  • 34
    that's not what op is looking for Commented Oct 2, 2010 at 20:24
  • 6
    It still won't work, even if all items are unique, unless they are also in the same order, with no intermingled items. e.g. finding [1, 3] or [2, 1] in [1, 2, 3] will give a false positive. Assuming that we're looking for the sequence itself rather than just the values contained in the sequence.
    – intuited
    Commented Oct 2, 2010 at 20:34
  • 8
    I think it's better to show how an idea can be wrong (so as to avoid it in future), rather than erasing it entirely.
    – Thomas O
    Commented Oct 2, 2010 at 22:03
  • 31
    Yes, especially since this is the correct solution for some people (like me) who find this page, who do not care about sequence.
    – mk12
    Commented Feb 26, 2012 at 5:18
  • 2
    Note to viewer: the results provided in this example is not accurate. That being said, this example helped me out.
    – pangyuteng
    Commented Oct 10, 2014 at 17:04
54

There's an all() and any() function to do this. To check if big contains ALL elements in small

result = all(elem in big for elem in small)

To check if small contains ANY elements in big

result = any(elem in big for elem in small)

the variable result would be boolean (TRUE/FALSE).

4
  • 5
    If I am not mistaken all(elem in list1 for elem in list2) is testing all items in list2 against items in list1 so it is other way around. This checks list2 contains all items in list1
    – rifle2000
    Commented Oct 8, 2019 at 8:04
  • 4
    all(elem in big for elem in small) would return true regardless of position, which isn't what OP wants. They asked for "contiguous subsequence". Commented Jul 27, 2021 at 11:05
  • In case anyone confused , here is example ColumnToCheck = [1,2,3,4] ListofItems =[1,2,3,4,251,54,8,65,5,85,25] #if all items from ColumnToCheck exist in ListofItems print(all(x in ListofItems for x in ColumnToCheck)) Commented Jan 20, 2022 at 14:00
  • 2
    Seems to have answered the wrong question. Commented Oct 13, 2022 at 13:38
43

Here is my version:

def contains(small, big):
    for i in xrange(len(big)-len(small)+1):
        for j in xrange(len(small)):
            if big[i+j] != small[j]:
                break
        else:
            return i, i+len(small)
    return False

It returns a tuple of (start, end+1) since I think that is more pythonic, as Andrew Jaffe points out in his comment. It does not slice any sublists so should be reasonably efficient.

One point of interest for newbies is that it uses the else clause on the for statement - this is not something I use very often but can be invaluable in situations like this.

This is identical to finding substrings in a string, so for large lists it may be more efficient to implement something like the Boyer-Moore algorithm.

Note: If you are using Python3, change xrange to range.

7
  • +1 for the note about efficient string searching algorithms. One disadvantage of yours is the addition of an interpreted inner loop (the slice comparison is, I imagine, faster, although the copy might offset that). I'm going to try a performance comparison.
    – Tim Yates
    Commented Oct 2, 2010 at 21:47
  • 1
    After further tests, yours is the best so far for large subsequences. I would pick this, even despite the small disadvantage is has on smaller data sets.
    – Tim Yates
    Commented Oct 2, 2010 at 22:52
  • 3
    +1: Didn't know about for's else clause! Just today I created an awkward construct involving setting a boolean to do exactly this.
    – mk12
    Commented Feb 26, 2012 at 5:16
  • I think it's simpler to use slicing, similar to martineau's answer: if big[i:i+len(small)] == small: return i, i+len(small).
    – wjandrea
    Commented May 2, 2021 at 16:50
  • 2
    Amazing that there are two higher-voted answers that solve completely unrelated problems. I guess the accept button has some value.... Commented Aug 8, 2022 at 1:38
6

May I humbly suggest the Rabin-Karp algorithm if the big list is really big. The link even contains almost-usable code in almost-Python.

5

This works and is fairly fast since it does the linear searching using the builtin list.index() method and == operator:

def contains(sub, pri):
    M, N = len(pri), len(sub)
    i, LAST = 0, M-N+1
    while True:
        try:
            found = pri.index(sub[0], i, LAST) # find first elem in sub
        except ValueError:
            return False
        if pri[found:found+N] == sub:
            return [found, found+N-1]
        else:
            i = found+1
2
  • Not sure if there is a flaw can you matineau double check. I tried print (contains((1,2,3),(1,2,3))) and I got [0,2] instead of [0,1,2] Commented May 4, 2020 at 17:16
  • @user-asterix: The function is effectively returning [start, end] as designed (and OP wants).
    – martineau
    Commented May 4, 2020 at 17:58
5

I've Summarized and evaluated Time taken by different techniques

Used methods are:

def containsUsingStr(sequence, element:list):
    return str(element)[1:-1] in str(sequence)[1:-1]


def containsUsingIndexing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        for j in range(lE):
            if sequence[i+j] != element[j]: break
        else: return True
    return False


def containsUsingSlicing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        if sequence[i : i+lE] == element: return True
    return False


def containsUsingAny(sequence:list, element:list):
    lE = len(element)
    return any(element == sequence[i:i+lE] for i in range(len(sequence)-lE+1))

Code for Time analysis (averaging over 1000 iterations):

from time import perf_counter

functions = (containsUsingStr, containsUsingIndexing, containsUsingSlicing, containsUsingAny)
fCount = len(functions)


for func in functions:
    print(str.ljust(f'Function : {func.__name__}', 32), end='   ::   Return Values: ')
    print(func([1,2,3,4,5,5], [3,4,5,5]) , end=', ')
    print(func([1,2,3,4,5,5], [1,3,4,5]))



avg_times = [0]*fCount
for _ in range(1000):
    perf_times = []
    for func in functions:
        startTime = perf_counter()
        func([1,2,3,4,5,5], [3,4,5,5])
        timeTaken = perf_counter()-startTime
        perf_times.append(timeTaken)
        

    for t in range(fCount): avg_times[t] += perf_times[t]

minTime = min(avg_times)
print("\n\n Ratio of Time of Executions : ", ' : '.join(map(lambda x: str(round(x/minTime, 4)), avg_times)))

Output:

Output

Conclusion: In this case, Slicing operation proves to be the fastest

4

If we refine the problem talking about testing if a list contains another list with as a sequence, the answer could be the next one-liner:

def contains(subseq, inseq):
    return any(inseq[pos:pos + len(subseq)] == subseq for pos in range(0, len(inseq) - len(subseq) + 1))

Here unit tests I used to tune up this one-liner:

https://gist.github.com/anonymous/6910a85b4978daee137f

2
  • I think this is the answer for me! nice one-liner, looks right.
    – hwjp
    Commented Dec 27, 2016 at 12:06
  • 2
    incidentally though, I think return True might pass all those unit tests.
    – hwjp
    Commented Dec 28, 2016 at 17:37
3

After OP's edit:

def contains(small, big):
    for i in xrange(1 + len(big) - len(small)):
        if small == big[i:i+len(small)]:
            return i, i + len(small) - 1
    return False
7
  • But it fails with contains([1,2], [-1, 0, 1, 1, 2]) which returns [2,4] instead of what I assume is the expected [3,4] Commented Oct 2, 2010 at 20:40
  • 2
    This is going to be horribly inefficient for big lists, since it is constantly creating and destroying temporary lists every time it does big[i:i+len(small)]
    – Dave Kirby
    Commented Oct 2, 2010 at 21:22
  • @Dave: did you timeit this against your solution? Commented Oct 2, 2010 at 21:41
  • 1
    According to my tests, this has slightly better performance than Dave Kirby's solution, even on large lists (1 million elements, with the matching subset at the end): 4.1s for 10 repetitions versus 5.6s for Dave's. I would love to post my test code, but there isn't an easy way to do that.
    – Tim Yates
    Commented Oct 2, 2010 at 22:20
  • 1
    UPDATE: I spoke too soon--my small lists were too small. This algorithm exploded once I increased their size to 1000, while the others stayed constant. It looks like Dave Kirby's wins for large lists after all. pastebin.com/NZwU6PUx
    – Tim Yates
    Commented Oct 2, 2010 at 22:46
1

Here's a straightforward algorithm that uses list methods:

#!/usr/bin/env python

def list_find(what, where):
    """Find `what` list in the `where` list.

    Return index in `where` where `what` starts
    or -1 if no such index.

    >>> f = list_find
    >>> f([2, 1], [-1, 0, 1, 2])
    -1
    >>> f([-1, 1, 2], [-1, 0, 1, 2])
    -1
    >>> f([0, 1, 2], [-1, 0, 1, 2])
    1
    >>> f([1,2], [-1, 0, 1, 2])
    2
    >>> f([1,3], [-1, 0, 1, 2])
    -1
    >>> f([1, 2], [[1, 2], 3])
    -1
    >>> f([[1, 2]], [[1, 2], 3])
    0
    """
    if not what: # empty list is always found
        return 0
    try:
        index = 0
        while True:
            index = where.index(what[0], index)
            if where[index:index+len(what)] == what:
                return index # found
            index += 1 # try next position
    except ValueError:
        return -1 # not found

def contains(what, where):
    """Return [start, end+1] if found else empty list."""
    i = list_find(what, where)
    return [i, i + len(what)] if i >= 0 else [] #NOTE: bool([]) == False

if __name__=="__main__":
    import doctest; doctest.testmod()
1

Smallest code:

def contains(a,b):
    str(a)[1:-1].find(str(b)[1:-1])>=0
1

Here is my answer. This function will help you to find out whether B is a sub-list of A. Time complexity is O(n).

`def does_A_contain_B(A, B): #remember now A is the larger list
    b_size = len(B)
    for a_index in range(0, len(A)):
        if A[a_index : a_index+b_size]==B:
            return True
    else:
        return False`
0

I tried to make this as efficient as possible.

It uses a generator; those unfamiliar with these beasts are advised to check out their documentation and that of yield expressions.

Basically it creates a generator of values from the subsequence that can be reset by sending it a true value. If the generator is reset, it starts yielding again from the beginning of sub.

Then it just compares successive values of sequence with the generator yields, resetting the generator if they don't match.

When the generator runs out of values, i.e. reaches the end of sub without being reset, that means that we've found our match.

Since it works for any sequence, you can even use it on strings, in which case it behaves similarly to str.find, except that it returns False instead of -1.

As a further note: I think that the second value of the returned tuple should, in keeping with Python standards, normally be one higher. i.e. "string"[0:2] == "st". But the spec says otherwise, so that's how this works.

It depends on if this is meant to be a general-purpose routine or if it's implementing some specific goal; in the latter case it might be better to implement a general-purpose routine and then wrap it in a function which twiddles the return value to suit the spec.

def reiterator(sub):
    """Yield elements of a sequence, resetting if sent ``True``."""
    it = iter(sub)
    while True:
        if (yield it.next()):
            it = iter(sub)

def find_in_sequence(sub, sequence):
    """Find a subsequence in a sequence.

    >>> find_in_sequence([2, 1], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([-1, 1, 2], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([0, 1, 2], [-1, 0, 1, 2])
    (1, 3)
    >>> find_in_sequence("subsequence",
    ...                  "This sequence contains a subsequence.")
    (25, 35)
    >>> find_in_sequence("subsequence", "This one doesn't.")
    False

    """
    start = None
    sub_items = reiterator(sub)
    sub_item = sub_items.next()
    for index, item in enumerate(sequence):
        if item == sub_item:
            if start is None: start = index
        else:
            start = None
        try:
            sub_item = sub_items.send(start is None)
        except StopIteration:
            # If the subsequence is depleted, we win!
            return (start, index)
    return False
3
  • A valiant effort, but this has worse performance than either eumiro or Dave Kirby's solutions. 8.2s on the benchmark I described in the other comments.
    – Tim Yates
    Commented Oct 2, 2010 at 22:21
  • Interesting to see the speed difference for native code. It seems like this algorithm would be relatively faster for longer subsequences.. how long was/were the subsequence(s) you used in the test?
    – intuited
    Commented Oct 2, 2010 at 22:37
  • You're right. This performed much better than eumiro's solution with larger subsequences, but it still didn't perform as well as Dave's.
    – Tim Yates
    Commented Oct 2, 2010 at 22:50
0

I think this one is fast...

def issublist(subList, myList, start=0):
    if not subList: return 0
    lenList, lensubList = len(myList), len(subList)
    try:
        while lenList - start >= lensubList:
            start = myList.index(subList[0], start)
            for i in xrange(lensubList):
                if myList[start+i] != subList[i]:
                    break
            else:
                return start, start + lensubList - 1
            start += 1
        return False
    except:
        return False
0
a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([0,5,4]))

It provides true output.

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([1,3]))

It provides false output.

1
  • This seems version specific. For example, it fails on LeetCode. Commented Jul 27, 2021 at 11:42
0

Dave answer is good. But I suggest this implementation which is more efficient and doesn't use nested loops.

def contains(small_list, big_list):
    """
    Returns index of start of small_list in big_list if big_list
    contains small_list, otherwise -1.
    """
    loop = True
    i, curr_id_small= 0, 0
    while loop and i<len(big_list):
        if big_list[i]==small_list[curr_id_small]:
            if curr_id_small==len(small_list)-1:
                loop = False
            else:
                curr_id_small += 1
        else:
            curr_id_small = 0
        i=i+1
    if not loop:
        return i-len(small_list)
    else:
        return -1
0

Here's a simple and efficient function to check whether big list contains a small one in matching order:

def contains(big, small):
    i = 0
    for value in big:
        if value == small[i]:
            i += 1
            if i == len(small):
                return True
        else:
            i = 1 if value == small[0] else 0
    return False

Usage:

"""
>>> contains([1,2,3,4,5], [2,3,4])
True
>>> contains([4,2,3,2,4], [2,3,4])
False
>>> contains([1,2,3,2,3,2,2,4,3], [2,4,3])
True
"""
-1

The problem of most of the answers, that they are good for unique items in list. If items are not unique and you still want to know whether there is an intersection, you should count items:

from collections import Counter as count

def listContains(l1, l2):
  list1 = count(l1)
  list2 = count(l2)

  return list1&list2 == list1

print( listContains([1,1,2,5], [1,2,3,5,1,2,1]) ) # Returns True
print( listContains([1,1,2,8], [1,2,3,5,1,2,1]) ) # Returns False

You can also return the intersection by using ''.join(list1&list2)

-1

Here a solution with less line of code and easily understandable (or at least I like to think so).

If you want to keep order (match only if the smaller list is found in the same order on the bigger list):

def is_ordered_subset(l1, l2):
    # First check to see if all element of l1 are in l2 (without checking order)
    if not set(l1).issubset(l2): 
        return False

    length = len(l1)
    # Make sublist of same size than l1
    list_of_sublist = [l2[i:i+length] for i, x in enumerate(l2)]
    #Check if one of this sublist is l1
    return l1 in list_of_sublist 
-1

You can use numpy:

def contains(l1, l2):
   """ returns True if l2 conatins l1 and False otherwise """

   if len(np.intersect1d(l1,l2))==len(l1):
      return = True
   else:
      return = False

Not the answer you're looking for? Browse other questions tagged or ask your own question.