Check if multiple strings exist in another string

Question

How can I check if any of the strings in an array exists in another string?

For example:

a = ['a', 'b', 'c']
s = "a123"
if a in s:
    print("some of the strings found in s")
else:
    print("no strings found in s")

How can I replace the if a in s: line to get the appropriate result?

I'm surprised there aren't (yet) any answers comparing to a compiled regex in terms of perf, especially compared to size of the string and number of "needles" to search for. — Pat, Commented Apr 22, 2015 at 23:21
@Pat I am not surprised. The question is not about performance. Today most programmers care more for getting it done and readability. The performance question is valid, but a different question. — guettli, Commented Jul 13, 2016 at 6:42
regex [abc] also works perfectly well and will be faster if there are more than a couple of candidates to test. But if the strings are arbitrary and you don't know them in advance to construct a regex, you will have to use the any(x in str for x in a) approach. — smci, Commented Jan 8, 2020 at 13:15
This problem is a special case of stackoverflow.com/questions/1342601. The standard approach is to use any, as seen in the top answers; however, some string-specific optimizations may be possible. — Karl Knechtel, Commented Aug 2, 2022 at 23:42
Well I did look up this question for performance. Otherwise that’s a bachelor-grade algorithmic problem. — Maëlan, Commented Aug 19, 2023 at 21:46

Kelly Bundy · Accepted Answer · 2023-10-27 14:06:30Z

1273

You can use any:

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

Similarly to check if all the strings from the list are found, use all instead of any.

edited Oct 27, 2023 at 14:06

Kelly Bundy

27.1k7 gold badges33 silver badges70 bronze badges

answered Aug 2, 2010 at 16:15

Mark Byers

830k198 gold badges1.6k silver badges1.5k bronze badges

17

any() takes an iterable. I am not sure which version of Python you are using but in 2.6 you will need to put [] around your argument to any(). any([x in str for x in a]) so that the comprehension returns an iterable. But maybe later versions of Python already do this.
– emispowder
Commented Mar 27, 2013 at 1:06
8

@Mark Byers: Sorry for the late comment, but is there a way to print the string that was found? How would you do this. Thank you.
– Shankar Kumar
Commented Aug 1, 2013 at 1:26
4

Not sure I understand, if a is the list, and str is the thing to match against, what is the x? Python newbie ftw. :)
– red
Commented Nov 13, 2013 at 14:01
7

@emispowder It works fine for me as-is in Python 2.6.9.
– MPlanchard
Commented Jul 10, 2015 at 18:25
7

@emispowder: Generator expressions were introduced in 2.4.
– zondo
Commented Apr 22, 2017 at 3:07

| Show 12 more comments

mirekphd · Accepted Answer · 2023-06-08 09:51:07Z

119

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.

If you want the first match (with False as a default):

match = next((x for x in a if x in a_string), False)

If you want to get all matches (including duplicates):

matches = [x for x in a if x in a_string]

If you want to get all non-duplicate matches (disregarding order):

matches = {x for x in a if x in a_string}

If you want to get all non-duplicate matches in the right order:

matches = []
for x in a:
    if x in a_string and x not in matches:
        matches.append(x)

edited Jun 8, 2023 at 9:51

mirekphd

6,1803 gold badges47 silver badges73 bronze badges

answered May 23, 2016 at 22:10

zondo

20.2k8 gold badges49 silver badges86 bronze badges

please add example for the last match too
– Oleg Kokorin
Commented Apr 2, 2018 at 21:46
@OlegKokorin: It creates a list of matching strings in the same order it finds them, but it keeps only the first one if two are the same.
– zondo
Commented Apr 4, 2018 at 0:35
Using an OrderedDict is probably more performant than a list. See this answer on "Removing duplicates in lists"
– wjandrea
Commented May 18, 2020 at 0:11
Can you provide an example?
– Herwini
Commented Nov 16, 2020 at 14:18
1

One nice thing about the first option (using next) is it allows for short-circuit evaluation, so if you just want to know, for example, when a failure happened, if it was any of a certain class of failures, you don't have to generate truth values for each comparison, just all of them until the first match is found.
– hlongmore
Commented Apr 5, 2023 at 4:29

Add a comment |

jbernadas · Accepted Answer · 2010-08-02 19:04:58Z

67

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

answered Aug 2, 2010 at 19:04

jbernadas

2,60018 silver badges12 bronze badges

can Aho-Corasick also find substrings instead of prefixes ?
– RetroCode
Commented Sep 26, 2016 at 19:58
3

Some python Aho-Corasick libraries are here and here
– vorpal
Commented Sep 27, 2017 at 10:54
is there a library for that?
– user313032
Commented Jan 16 at 19:48

Add a comment |

Shankar ARUL · Accepted Answer · 2016-05-23 21:45:58Z

43

Just to add some diversity with regex:

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

answered May 23, 2016 at 21:45

Shankar ARUL

13.4k12 gold badges72 silver badges73 bronze badges

1

This works for the given use case of the question. If the you search for ( or * this fails, since quoting for the regex syntax needs to be done.
– guettli
Commented Jul 12, 2016 at 10:13
5

You can escape it if necessary with '|'.join(map(re.escape, strings_to_match)). You sould probably re.compile('|'.join(...)) as well.
– Artyer
Commented Nov 4, 2017 at 21:50
3

And What's the time complexity ?
– DachuanZhao
Commented Apr 30, 2021 at 1:51

Add a comment |

mirekphd · Accepted Answer · 2023-06-08 09:52:38Z

19

A surprisingly fast approach is to use set:

a = ['a', 'b', 'c']
a_string = "a123"
if set(a) & set(a_string):
    print("some of the strings found in a_string")
else:
    print("no strings found in a_string")

This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.

edited Jun 8, 2023 at 9:52

mirekphd

6,1803 gold badges47 silver badges73 bronze badges

answered Mar 19, 2019 at 15:26

Berislav Lopac

17.1k6 gold badges73 silver badges83 bronze badges

Add a comment |

mirekphd · Accepted Answer · 2023-06-08 09:53:06Z

14

You need to iterate on the elements of a.

a = ['a', 'b', 'c']
a_string = "a123"
found_a_string = False
for item in a:    
    if item in a_string:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

edited Jun 8, 2023 at 9:53

mirekphd

6,1803 gold badges47 silver badges73 bronze badges

answered Aug 2, 2010 at 16:15

Seamus Campbell

17.9k3 gold badges53 silver badges60 bronze badges

3

Yes i knew how to do that but compared to Marks answer, that's horrible code.
– jahmax
Commented Aug 2, 2010 at 16:24
14

Only if you understand Mark's code. The problem you were having is that you weren't examining the elements of your array. There are a lot of terse, pythonic ways to accomplish what you want that would hide the essence of what was wrong with your code.
– Seamus Campbell
Commented Aug 2, 2010 at 16:38
14

It may be 'horrible code' but it's exactly what any() does. Also, this gives you the actual string that matched, whereas any() just tells you there is a match.
– alldayremix
Commented Apr 1, 2013 at 15:21

Add a comment |

Domi W · Accepted Answer · 2017-07-20 20:48:07Z

4

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.

Here is one way to use it in Python:

Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py

Try the alrorithm with the following code:

from aho_corasick import aho_corasick #(string, keywords)

print(aho_corasick(string, ["keyword1", "keyword2"]))

Note that the search is case-sensitive

edited Jul 20, 2017 at 20:48

answered Jul 20, 2017 at 20:23

Domi W

60612 silver badges16 bronze badges

This would be better as a comment on, or edit to that answer.
– Karl Knechtel
Commented Aug 2, 2022 at 23:45

Add a comment |

balki · Accepted Answer · 2020-11-09 15:29:21Z

4

The regex module recommended in python docs, supports this

words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)

output:

['he', 'low', 'or']

Some details on implementation: link

edited Nov 9, 2020 at 15:29

answered Nov 9, 2020 at 15:21

balki

27.3k31 gold badges108 silver badges153 bronze badges

I can't find any documentation on \L. Can you point me to it?
– Danilo Souza Morães
Commented Nov 29, 2021 at 19:39
3

@DaniloSouzaMorães github.com/mrabarnett/mrab-regex#named-lists-hg-issue-11
– balki
Commented Dec 7, 2021 at 0:47

Add a comment |

mluebke · Accepted Answer · 2010-08-02 16:16:40Z

3

a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"

answered Aug 2, 2010 at 16:16

mluebke

8,7857 gold badges36 silver badges31 bronze badges

Add a comment |

Jerald Cogswell · Accepted Answer · 2021-01-12 04:27:50Z

3

A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.

>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring)  # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

answered Jan 12, 2021 at 4:27

Jerald Cogswell

571 silver badge4 bronze badges

The input is specified as a list of strings vs. a longer string that might contain them as a substring, not as two lists of strings.
– Karl Knechtel
Commented Aug 2, 2022 at 23:44

Add a comment |

Aurélien Pierre · Accepted Answer · 2023-04-26 18:32:37Z

I needed to do that in a performance-critical environment, so I benchmarked all the possible variants I could find and think of with Python 3.11. Here are the results:

words =['test', 'èk', 'user_me', '<markup>', '[^1]']

def find_words(words):
    for word in words:
        if "_" in word or "<" in word or ">" in word or "^" in word:
            pass

def find_words_2(words):
    for word in words:
        for elem in [">", "<", "_", "^"]:
            if elem in word:
                pass

def find_words_3(words):
    for word in words:
        if re.search(r"\_|\<|\>|\^", word):
            pass

def find_words_4(words):
    for word in words:
        if re.match(r"\S*(\_|\<|\>|\^)\S*", word):
            pass

def find_words_5(words):
    for word in words:
        if any(elem in word for elem in [">", "<", "_", "^"]):
            pass

def find_words_6(words):
    for word in words:
        if any(map(word.__contains__, [">", "<", "_", "^"])):
            pass

> %timeit find_words(words)
351 ns ± 6.24 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit find_words_2(words)
689 ns ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit find_words_3(words)
2.42 µs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

> %timeit find_words_4(words)
2.75 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

> %timeit find_words_5(words)
2.65 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

> %timeit find_words_6(words)
1.64 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

The naive chained or approach wins (function 1)
The basic iteration over each element to test (function 2) is at least 50% faster than using any(), and even a regex search is faster than the basic any() without map(), so I don't get why it exists at all. Not to mention, the syntax is purely algorithmic so any programmer will understand what it does, even without Python background.
re.match() only searches for patterns starting at the beginning of the line (which is confusing if you come from PHP/Perl regex), so to make it work like PHP/Perl, you need to use re.search() or to tweak the regex to include characters before, which comes with a performance penalty.

If the list of substrings to search for is known at programming time, the ugly chained or is definitely the way to go. Otherwise, use the basic for loop over the list of substrings to search. any() and regex are a loss of time in this context.

For a more down-to-earth application (searching if a file is an image by looking for its extension in a list):

def is_image(word: str ) -> bool:
  if  ".bmp" in word or \
      ".jpg" in word or \
      ".jpeg" in word or \
      ".jpe" in word or \
      ".jp2" in word or \
      ".j2c" in word or \
      ".j2k" in word or \
      ".jpc" in word or \
      ".jpf" in word or \
      ".jpx" in word or \
      ".png" in word or \
      ".ico" in word or \
      ".svg" in word or \
      ".webp" in word or \
      ".heif" in word or \
      ".heic" in word or \
      ".tif" in word or \
      ".tiff" in word or \
      ".hdr" in word or \
      ".exr" in word or \
      ".ppm" in word or \
      ".pfm" in word or \
      ".nef" in word or \
      ".rw2" in word or \
      ".cr2" in word or \
      ".cr3" in word or \
      ".crw" in word or \
      ".dng" in word or \
      ".raf" in word or \
      ".arw" in word or \
      ".srf" in word or \
      ".sr2" in word or \
      ".iiq" in word or \
      ".3fr" in word or \
      ".dcr" in word or \
      ".ari" in word or \
      ".pef" in word or \
      ".x3f" in word or \
      ".erf" in word or \
      ".raw" in word or \
      ".rwz" in word:
    return True
  return False

IMAGE_PATTERN = re.compile(r"\.(bmp|jpg|jpeg|jpe|jp2|j2c|j2k|jpc|jpf|jpx|png|ico|svg|webp|heif|heic|tif|tiff|hdr|exr|ppm|pfm|nef|rw2|cr2|cr3|crw|dng|raf|arw|srf|sr2|iiq|3fr|dcr|ari|pef|x3f|erf|raw|rwz)")

extensions = [".bmp", ".jpg", ".jpeg", ".jpe", ".jp2", ".j2c", ".j2k", ".jpc", ".jpf", ".jpx", ".png", ".ico", ".svg", ".webp", ".heif", ".heic", ".tif", ".tiff", ".hdr", ".exr", ".ppm", ".pfm", ".nef", ".rw2", ".cr2", ".cr3", ".crw", ".dng", ".raf", ".arw", ".srf", ".sr2", ".iiq", ".3fr", ".dcr", ".ari", ".pef", ".x3f", ".erf", ".raw", ".rwz"]

(Note that the extensions are declared in the same order in all variants).

> %timeit is_image("DSC_blablabla_001256.nef") # found
536 ns ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit is_image("DSC_blablabla_001256.noop") # not found
923 ns ± 43.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit IMAGE_PATTERN.search("DSC_blablabla_001256.nef")
221 ns ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit IMAGE_PATTERN.search("DSC_blablabla_001256.noop") # not found
207 ns ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit any(ext in "DSC_blablabla_001256.nef" for ext in extensions) # found
1.53 µs ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

> %timeit any(ext in "DSC_blablabla_001256.noop" for ext in extensions) # not found
2.2 µs ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

With a lot more options to test, regex are actually faster and more legible (for once…) than the chained or. any() ist still the worst.

Empiric tests show that the performance threshold is at 9 elements to test:

below 9 elements, chained or is faster,
above 9 elements, regex search() is faster,
at exactly 9 elements, both run around 225 ns.

Which of these allow you to know which word was found?
– Pod
Commented Dec 5, 2023 at 8:32 — Pod, Commented Dec 5, 2023 at 8:32

Nilesh Birari · Accepted Answer · 2018-06-25 13:51:48Z

2

Just some more info on how to get all list elements availlable in String

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

answered Jun 25, 2018 at 13:51

Nilesh Birari

8837 silver badges13 bronze badges

Add a comment |

Rolf Carlson · Accepted Answer · 2022-03-28 15:44:17Z

If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:

from nltk.tokenize import word_tokenize

Here is the tokenized string from the accepted answer:

a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']

The accepted answer gets modified as follows:

matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]

As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.

matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]

Using word tokenization, "mo" is no longer matched:

[x in tokens for x in matches_2]
Out[44]: [False, False, False]

That is the additional behavior that I wanted. This answer also responds to the duplicate question here.

Trinadh Koya · Accepted Answer · 2016-11-30 05:17:45Z

1

It depends on the context suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

if you want to check any of the character among the original_word: make use of

if any(your_required in yourinput for your_required in original_word ):

if you want all the input you want in that original_word,make use of all simple

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

answered Nov 30, 2016 at 5:17

Trinadh Koya

1,08715 silver badges19 bronze badges

What would be yourinput? I can recognise two things: the sentence where I'm looking for something. The array of words I'm looking for. But you describe three variables and I can't get what the third one is.
– mayid
Commented May 19, 2019 at 22:43

Add a comment |

Stephen Rauch · Accepted Answer · 2017-11-27 00:27:49Z

1

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

edited Nov 27, 2017 at 0:27

Stephen Rauch♦

49.3k31 gold badges113 silver badges139 bronze badges

answered Nov 26, 2017 at 23:58

LeftSpace

841 silver badge7 bronze badges

1

Don't add screenshots, your code is proof enough.
– ucczs
Commented Jul 18, 2023 at 20:33

Add a comment |

Ivan Mikhailov · Accepted Answer · 2018-01-25 13:48:10Z

1

I would use this kind of function for speed:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

answered Jan 25, 2018 at 13:48

Ivan Mikhailov

111 bronze badge

Add a comment |

sjd · Accepted Answer · 2020-09-09 15:16:28Z

1

Yet another solution with set. using set.intersection. For a one-liner.

subset = {"some" ,"words"} 
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
   print("All values present in text")

if subset & set(text.split()):
   print("Atleast one values present in text")

answered Sep 9, 2020 at 15:16

sjd

1,3914 gold badges30 silver badges48 bronze badges

Add a comment |

cwa · Accepted Answer · 2023-06-08 21:36:34Z

I found this question from a link from another closed question: Python: How to check a string for substrings from a list? but don't see an explicit solution to that question in the above answers.

Given a list of substrings and a list of strings, return a unique list of strings that have any of the substrings.

substrings = ['hello','world','python']
strings = ['blah blah.hello_everyone','this is a-crazy_world.here',
       'one more string','ok, one more string with hello world python']
# one-liner
list(set([strings_of_interest for strings_of_interest in strings for substring in substrings if substring in strings_of_interest]))

Robert I · Accepted Answer · 2018-06-15 21:17:27Z

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));

Collectives™ on Stack Overflow

Check if multiple strings exist in another string

19 Answers 19

Not the answer you're looking for? Browse other questions tagged
python
list
contains
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

Not the answer you're looking for? Browse other questions tagged pythonlistcontains or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
list
contains
or ask your own question.