How to test if a list contains another list as a contiguous subsequence?

Question

How can I test if a list contains another list (ie. it's a contiguous subsequence). Say there was a function called contains:

contains([1,2], [-1, 0, 1, 2]) # Returns [2, 3] (contains returns [start, end])
contains([1,3], [-1, 0, 1, 2]) # Returns False
contains([1, 2], [[1, 2], 3]) # Returns False
contains([[1, 2]], [[1, 2], 3]) # Returns [0, 0]

Edit:

contains([2, 1], [-1, 0, 1, 2]) # Returns False
contains([-1, 1, 2], [-1, 0, 1, 2]) # Returns False
contains([0, 1, 2], [-1, 0, 1, 2]) # Returns [1, 3]

For what it's worth, returning [start, end+1] is more pythonic as it looks like a slice -- (end+1)-start gives the length of what is found. — Andrew Jaffe, Commented Oct 2, 2010 at 21:01
This looks like a bad design - sometimes the function returns a bool, sometimes it returns a list. That makes it very hard to use since you have to check the return type before you can do anything with the result. IMHO a function called "contains" should only return True or False. — Dave Kirby, Commented Oct 2, 2010 at 21:01
It's kinda sad that lists don't have the needed functionality built-in, but strings do (str.find). — Jochen Ritzel, Commented Oct 2, 2010 at 21:12
Why would this, for any reason, return a list and not a tuple!? — Grant Paul, Commented Oct 3, 2010 at 0:11
related: Best Way To Determine if a Sequence is in another sequence in Python — jfs, Commented Mar 23, 2017 at 23:20

SiHa · Accepted Answer · 2017-11-13 12:38:53Z

102

If all items are unique, you can use sets.

>>> items = set([-1, 0, 1, 2])
>>> set([1, 2]).issubset(items)
True
>>> set([1, 3]).issubset(items)
False

edited Nov 13, 2017 at 12:38

SiHa

8,28313 gold badges37 silver badges46 bronze badges

answered Oct 2, 2010 at 20:22

Thomas O

6,18212 gold badges44 silver badges61 bronze badges

34

that's not what op is looking for
– SilentGhost
Commented Oct 2, 2010 at 20:24
6

It still won't work, even if all items are unique, unless they are also in the same order, with no intermingled items. e.g. finding [1, 3] or [2, 1] in [1, 2, 3] will give a false positive. Assuming that we're looking for the sequence itself rather than just the values contained in the sequence.
– intuited
Commented Oct 2, 2010 at 20:34
8

I think it's better to show how an idea can be wrong (so as to avoid it in future), rather than erasing it entirely.
– Thomas O
Commented Oct 2, 2010 at 22:03
31

Yes, especially since this is the correct solution for some people (like me) who find this page, who do not care about sequence.
– mk12
Commented Feb 26, 2012 at 5:18
2

Note to viewer: the results provided in this example is not accurate. That being said, this example helped me out.
– pangyuteng
Commented Oct 10, 2014 at 17:04

| Show 6 more comments

renan-eccel · Accepted Answer · 2020-06-16 19:21:39Z

54

There's an all() and any() function to do this. To check if big contains ALL elements in small

result = all(elem in big for elem in small)

To check if small contains ANY elements in big

result = any(elem in big for elem in small)

the variable result would be boolean (TRUE/FALSE).

edited Jun 16, 2020 at 19:21

renan-eccel

18212 bronze badges

answered May 31, 2018 at 23:00

ericyan3000

6215 silver badges6 bronze badges

5

If I am not mistaken all(elem in list1 for elem in list2) is testing all items in list2 against items in list1 so it is other way around. This checks list2 contains all items in list1
– rifle2000
Commented Oct 8, 2019 at 8:04
4

all(elem in big for elem in small) would return true regardless of position, which isn't what OP wants. They asked for "contiguous subsequence".
– Abhijit Sarkar
Commented Jul 27, 2021 at 11:05
In case anyone confused , here is example ColumnToCheck = [1,2,3,4] ListofItems =[1,2,3,4,251,54,8,65,5,85,25] #if all items from ColumnToCheck exist in ListofItems print(all(x in ListofItems for x in ColumnToCheck))
– user1586957
Commented Jan 20, 2022 at 14:00
2

Seems to have answered the wrong question.
– Jason R. Coombs
Commented Oct 13, 2022 at 13:38

Add a comment |

wjandrea · Accepted Answer · 2021-05-02 16:46:34Z

43

Here is my version:

def contains(small, big):
    for i in xrange(len(big)-len(small)+1):
        for j in xrange(len(small)):
            if big[i+j] != small[j]:
                break
        else:
            return i, i+len(small)
    return False

It returns a tuple of (start, end+1) since I think that is more pythonic, as Andrew Jaffe points out in his comment. It does not slice any sublists so should be reasonably efficient.

One point of interest for newbies is that it uses the else clause on the for statement - this is not something I use very often but can be invaluable in situations like this.

This is identical to finding substrings in a string, so for large lists it may be more efficient to implement something like the Boyer-Moore algorithm.

Note: If you are using Python3, change xrange to range.

edited May 2, 2021 at 16:46

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Oct 2, 2010 at 21:23

Dave Kirby

26.4k5 gold badges68 silver badges84 bronze badges

+1 for the note about efficient string searching algorithms. One disadvantage of yours is the addition of an interpreted inner loop (the slice comparison is, I imagine, faster, although the copy might offset that). I'm going to try a performance comparison.
– Tim Yates
Commented Oct 2, 2010 at 21:47
1

After further tests, yours is the best so far for large subsequences. I would pick this, even despite the small disadvantage is has on smaller data sets.
– Tim Yates
Commented Oct 2, 2010 at 22:52
3

+1: Didn't know about for's else clause! Just today I created an awkward construct involving setting a boolean to do exactly this.
– mk12
Commented Feb 26, 2012 at 5:16
I think it's simpler to use slicing, similar to martineau's answer: if big[i:i+len(small)] == small: return i, i+len(small).
– wjandrea
Commented May 2, 2021 at 16:50
2

Amazing that there are two higher-voted answers that solve completely unrelated problems. I guess the accept button has some value....
– Karl Knechtel
Commented Aug 8, 2022 at 1:38

| Show 2 more comments

9000 · Accepted Answer · 2010-10-03 11:31:38Z

6

May I humbly suggest the Rabin-Karp algorithm if the big list is really big. The link even contains almost-usable code in almost-Python.

answered Oct 3, 2010 at 11:31

9000

40.6k9 gold badges67 silver badges108 bronze badges

Add a comment |

martineau · Accepted Answer · 2020-11-25 23:47:31Z

5

This works and is fairly fast since it does the linear searching using the builtin list.index() method and == operator:

def contains(sub, pri):
    M, N = len(pri), len(sub)
    i, LAST = 0, M-N+1
    while True:
        try:
            found = pri.index(sub[0], i, LAST) # find first elem in sub
        except ValueError:
            return False
        if pri[found:found+N] == sub:
            return [found, found+N-1]
        else:
            i = found+1

edited Nov 25, 2020 at 23:47

answered Oct 2, 2010 at 21:33

martineau

122k29 gold badges176 silver badges307 bronze badges

Not sure if there is a flaw can you matineau double check. I tried print (contains((1,2,3),(1,2,3))) and I got [0,2] instead of [0,1,2]
– user-asterix
Commented May 4, 2020 at 17:16
@user-asterix: The function is effectively returning [start, end] as designed (and OP wants).
– martineau
Commented May 4, 2020 at 17:58

Add a comment |

HIMANSHU PANDEY · Accepted Answer · 2021-11-22 09:46:19Z

I've Summarized and evaluated Time taken by different techniques

Used methods are:

def containsUsingStr(sequence, element:list):
    return str(element)[1:-1] in str(sequence)[1:-1]


def containsUsingIndexing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        for j in range(lE):
            if sequence[i+j] != element[j]: break
        else: return True
    return False


def containsUsingSlicing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        if sequence[i : i+lE] == element: return True
    return False


def containsUsingAny(sequence:list, element:list):
    lE = len(element)
    return any(element == sequence[i:i+lE] for i in range(len(sequence)-lE+1))

Code for Time analysis (averaging over 1000 iterations):

from time import perf_counter

functions = (containsUsingStr, containsUsingIndexing, containsUsingSlicing, containsUsingAny)
fCount = len(functions)


for func in functions:
    print(str.ljust(f'Function : {func.__name__}', 32), end='   ::   Return Values: ')
    print(func([1,2,3,4,5,5], [3,4,5,5]) , end=', ')
    print(func([1,2,3,4,5,5], [1,3,4,5]))



avg_times = [0]*fCount
for _ in range(1000):
    perf_times = []
    for func in functions:
        startTime = perf_counter()
        func([1,2,3,4,5,5], [3,4,5,5])
        timeTaken = perf_counter()-startTime
        perf_times.append(timeTaken)
        

    for t in range(fCount): avg_times[t] += perf_times[t]

minTime = min(avg_times)
print("\n\n Ratio of Time of Executions : ", ' : '.join(map(lambda x: str(round(x/minTime, 4)), avg_times)))

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Oleksiy · Accepted Answer · 2014-07-08 14:41:09Z

4

If we refine the problem talking about testing if a list contains another list with as a sequence, the answer could be the next one-liner:

def contains(subseq, inseq):
    return any(inseq[pos:pos + len(subseq)] == subseq for pos in range(0, len(inseq) - len(subseq) + 1))

Here unit tests I used to tune up this one-liner:

https://gist.github.com/anonymous/6910a85b4978daee137f

answered Jul 8, 2014 at 14:41

Oleksiy

911 silver badge6 bronze badges

I think this is the answer for me! nice one-liner, looks right.
– hwjp
Commented Dec 27, 2016 at 12:06
2

incidentally though, I think return True might pass all those unit tests.
– hwjp
Commented Dec 28, 2016 at 17:37

Add a comment |

Grant Paul · Accepted Answer · 2010-10-03 00:11:25Z

3

After OP's edit:

def contains(small, big):
    for i in xrange(1 + len(big) - len(small)):
        if small == big[i:i+len(small)]:
            return i, i + len(small) - 1
    return False

edited Oct 3, 2010 at 0:11

Grant Paul

5,8822 gold badges34 silver badges36 bronze badges

answered Oct 2, 2010 at 20:26

eumiro

211k35 gold badges304 silver badges262 bronze badges

But it fails with contains([1,2], [-1, 0, 1, 1, 2]) which returns [2,4] instead of what I assume is the expected [3,4]
– Andrew Dalke
Commented Oct 2, 2010 at 20:40
2

This is going to be horribly inefficient for big lists, since it is constantly creating and destroying temporary lists every time it does big[i:i+len(small)]
– Dave Kirby
Commented Oct 2, 2010 at 21:22
@Dave: did you timeit this against your solution?
– SilentGhost
Commented Oct 2, 2010 at 21:41
1

According to my tests, this has slightly better performance than Dave Kirby's solution, even on large lists (1 million elements, with the matching subset at the end): 4.1s for 10 repetitions versus 5.6s for Dave's. I would love to post my test code, but there isn't an easy way to do that.
– Tim Yates
Commented Oct 2, 2010 at 22:20
1

UPDATE: I spoke too soon--my small lists were too small. This algorithm exploded once I increased their size to 1000, while the others stayed constant. It looks like Dave Kirby's wins for large lists after all. pastebin.com/NZwU6PUx
– Tim Yates
Commented Oct 2, 2010 at 22:46

| Show 2 more comments

jfs · Accepted Answer · 2010-10-02 23:44:58Z

Here's a straightforward algorithm that uses list methods:

#!/usr/bin/env python

def list_find(what, where):
    """Find `what` list in the `where` list.

    Return index in `where` where `what` starts
    or -1 if no such index.

    >>> f = list_find
    >>> f([2, 1], [-1, 0, 1, 2])
    -1
    >>> f([-1, 1, 2], [-1, 0, 1, 2])
    -1
    >>> f([0, 1, 2], [-1, 0, 1, 2])
    1
    >>> f([1,2], [-1, 0, 1, 2])
    2
    >>> f([1,3], [-1, 0, 1, 2])
    -1
    >>> f([1, 2], [[1, 2], 3])
    -1
    >>> f([[1, 2]], [[1, 2], 3])
    0
    """
    if not what: # empty list is always found
        return 0
    try:
        index = 0
        while True:
            index = where.index(what[0], index)
            if where[index:index+len(what)] == what:
                return index # found
            index += 1 # try next position
    except ValueError:
        return -1 # not found

def contains(what, where):
    """Return [start, end+1] if found else empty list."""
    i = list_find(what, where)
    return [i, i + len(what)] if i >= 0 else [] #NOTE: bool([]) == False

if __name__=="__main__":
    import doctest; doctest.testmod()

Fabio Lamanna · Accepted Answer · 2016-02-04 16:23:36Z

1

Smallest code:

def contains(a,b):
    str(a)[1:-1].find(str(b)[1:-1])>=0

edited Feb 4, 2016 at 16:23

Fabio Lamanna

21.3k24 gold badges92 silver badges124 bronze badges

answered Feb 4, 2016 at 15:31

Bart Mensfort

1,03412 silver badges22 bronze badges

Add a comment |

Akila D. Perera · Accepted Answer · 2018-06-25 17:01:29Z

1

Here is my answer. This function will help you to find out whether B is a sub-list of A. Time complexity is O(n).

`def does_A_contain_B(A, B): #remember now A is the larger list
    b_size = len(B)
    for a_index in range(0, len(A)):
        if A[a_index : a_index+b_size]==B:
            return True
    else:
        return False`

edited Jun 25, 2018 at 17:01

answered Jan 4, 2017 at 5:29

Akila D. Perera

711 silver badge8 bronze badges

Add a comment |

intuited · Accepted Answer · 2010-10-02 22:22:53Z

I tried to make this as efficient as possible.

It uses a generator; those unfamiliar with these beasts are advised to check out their documentation and that of yield expressions.

Basically it creates a generator of values from the subsequence that can be reset by sending it a true value. If the generator is reset, it starts yielding again from the beginning of sub.

Then it just compares successive values of sequence with the generator yields, resetting the generator if they don't match.

When the generator runs out of values, i.e. reaches the end of sub without being reset, that means that we've found our match.

Since it works for any sequence, you can even use it on strings, in which case it behaves similarly to str.find, except that it returns False instead of -1.

As a further note: I think that the second value of the returned tuple should, in keeping with Python standards, normally be one higher. i.e. "string"[0:2] == "st". But the spec says otherwise, so that's how this works.

It depends on if this is meant to be a general-purpose routine or if it's implementing some specific goal; in the latter case it might be better to implement a general-purpose routine and then wrap it in a function which twiddles the return value to suit the spec.

def reiterator(sub):
    """Yield elements of a sequence, resetting if sent ``True``."""
    it = iter(sub)
    while True:
        if (yield it.next()):
            it = iter(sub)

def find_in_sequence(sub, sequence):
    """Find a subsequence in a sequence.

    >>> find_in_sequence([2, 1], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([-1, 1, 2], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([0, 1, 2], [-1, 0, 1, 2])
    (1, 3)
    >>> find_in_sequence("subsequence",
    ...                  "This sequence contains a subsequence.")
    (25, 35)
    >>> find_in_sequence("subsequence", "This one doesn't.")
    False

    """
    start = None
    sub_items = reiterator(sub)
    sub_item = sub_items.next()
    for index, item in enumerate(sequence):
        if item == sub_item:
            if start is None: start = index
        else:
            start = None
        try:
            sub_item = sub_items.send(start is None)
        except StopIteration:
            # If the subsequence is depleted, we win!
            return (start, index)
    return False

A valiant effort, but this has worse performance than either eumiro or Dave Kirby's solutions. 8.2s on the benchmark I described in the other comments. — Tim Yates, Commented Oct 2, 2010 at 22:21
Interesting to see the speed difference for native code. It seems like this algorithm would be relatively faster for longer subsequences.. how long was/were the subsequence(s) you used in the test? — intuited, Commented Oct 2, 2010 at 22:37
You're right. This performed much better than eumiro's solution with larger subsequences, but it still didn't perform as well as Dave's. — Tim Yates, Commented Oct 2, 2010 at 22:50

ChessMaster · Accepted Answer · 2012-04-05 23:26:08Z

0

I think this one is fast...

def issublist(subList, myList, start=0):
    if not subList: return 0
    lenList, lensubList = len(myList), len(subList)
    try:
        while lenList - start >= lensubList:
            start = myList.index(subList[0], start)
            for i in xrange(lensubList):
                if myList[start+i] != subList[i]:
                    break
            else:
                return start, start + lensubList - 1
            start += 1
        return False
    except:
        return False

edited Apr 5, 2012 at 23:26

answered Oct 3, 2010 at 7:10

ChessMaster

5491 gold badge4 silver badges12 bronze badges

Add a comment |

Tanjim Ahmed Khan · Accepted Answer · 2020-05-08 03:07:44Z

0

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([0,5,4]))

It provides true output.

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([1,3]))

It provides false output.

edited May 8, 2020 at 3:07

Tanjim Ahmed Khan

6961 gold badge9 silver badges22 bronze badges

answered Apr 23, 2020 at 18:26

Sihat Afnan

8628 silver badges15 bronze badges

This seems version specific. For example, it fails on LeetCode.
– Abhijit Sarkar
Commented Jul 27, 2021 at 11:42

Add a comment |

HLeb · Accepted Answer · 2020-11-25 23:36:31Z

Dave answer is good. But I suggest this implementation which is more efficient and doesn't use nested loops.

def contains(small_list, big_list):
    """
    Returns index of start of small_list in big_list if big_list
    contains small_list, otherwise -1.
    """
    loop = True
    i, curr_id_small= 0, 0
    while loop and i<len(big_list):
        if big_list[i]==small_list[curr_id_small]:
            if curr_id_small==len(small_list)-1:
                loop = False
            else:
                curr_id_small += 1
        else:
            curr_id_small = 0
        i=i+1
    if not loop:
        return i-len(small_list)
    else:
        return -1

Granitosaurus · Accepted Answer · 2022-05-13 06:13:53Z

0

Here's a simple and efficient function to check whether big list contains a small one in matching order:

def contains(big, small):
    i = 0
    for value in big:
        if value == small[i]:
            i += 1
            if i == len(small):
                return True
        else:
            i = 1 if value == small[0] else 0
    return False

Usage:

"""
>>> contains([1,2,3,4,5], [2,3,4])
True
>>> contains([4,2,3,2,4], [2,3,4])
False
>>> contains([1,2,3,2,3,2,2,4,3], [2,4,3])
True
"""

edited May 13, 2022 at 6:13

answered May 11, 2022 at 6:57

Granitosaurus

21.2k5 gold badges62 silver badges86 bronze badges

Add a comment |

mafonya · Accepted Answer · 2018-09-03 19:24:44Z

The problem of most of the answers, that they are good for unique items in list. If items are not unique and you still want to know whether there is an intersection, you should count items:

from collections import Counter as count

def listContains(l1, l2):
  list1 = count(l1)
  list2 = count(l2)

  return list1&list2 == list1

print( listContains([1,1,2,5], [1,2,3,5,1,2,1]) ) # Returns True
print( listContains([1,1,2,8], [1,2,3,5,1,2,1]) ) # Returns False

You can also return the intersection by using ''.join(list1&list2)

thibaultbl · Accepted Answer · 2020-09-23 09:19:54Z

Here a solution with less line of code and easily understandable (or at least I like to think so).

If you want to keep order (match only if the smaller list is found in the same order on the bigger list):

def is_ordered_subset(l1, l2):
    # First check to see if all element of l1 are in l2 (without checking order)
    if not set(l1).issubset(l2): 
        return False

    length = len(l1)
    # Make sublist of same size than l1
    list_of_sublist = [l2[i:i+length] for i, x in enumerate(l2)]
    #Check if one of this sublist is l1
    return l1 in list_of_sublist

Atila Abdula · Accepted Answer · 2021-05-07 08:22:21Z

-1

You can use numpy:

def contains(l1, l2):
   """ returns True if l2 conatins l1 and False otherwise """

   if len(np.intersect1d(l1,l2))==len(l1):
      return = True
   else:
      return = False

answered May 7, 2021 at 8:22

Atila Abdula

1

Add a comment |

Collectives™ on Stack Overflow

How to test if a list contains another list as a contiguous subsequence?

19 Answers 19

Used methods are:

Code for Time analysis (averaging over 1000 iterations):

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

Used methods are:

Code for Time analysis (averaging over 1000 iterations):

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.