How to get a string after a specific substring?

Question

How can I get a string after a specific substring?

For example, I want to get the string after "world" in

my_string="hello python world, I'm a beginner"

...which in this case is: ", I'm a beginner")

JayRizzo · Accepted Answer · 2022-05-20 07:05:26Z

617

The easiest way is probably just to split on your target word

my_string="hello python world , i'm a beginner"
print(my_string.split("world",1)[1])

split takes the word (or character) to split on and optionally a limit to the number of splits.

In this example, split on "world" and limit it to only one split.

edited May 20, 2022 at 7:05

JayRizzo

3,4863 gold badges35 silver badges51 bronze badges

answered Sep 24, 2012 at 20:27

Joran Beasley

113k13 gold badges164 silver badges184 bronze badges

3

If i need to split a text with the 'low' word and it contains the word lower before it, this will not work!
– Leonardo Hermoso
Commented Jan 12, 2017 at 3:48
3

you would simpley split 2x target.split('lower',1)[-1].split('low',1)[-1]
– Joran Beasley
Commented May 27, 2017 at 19:02
what if the sentence was "hello python Megaworld world , i'm a beginner ". How can I make it look at the whole word and not part of another as 'Megaworld'? Thanks
– pbou
Commented Dec 25, 2018 at 14:50
1

then the string you search is " world " ... or use regex for word boundrys
– Joran Beasley
Commented Dec 27, 2018 at 5:24
17

my_string.partition("world")[-1] (or ...[2]) is faster.
– Martijn Pieters
Commented Jul 16, 2019 at 15:47

| Show 3 more comments

JayRizzo · Accepted Answer · 2022-05-20 07:23:14Z

102

I'm surprised nobody mentioned partition.

def substring_after(s, delim):
    return s.partition(delim)[2]

s1="hello python world, I'm a beginner"
substring_after(s1, "world")

# ", I'm a beginner"

IMHO, this solution is more readable than @arshajii's. Other than that, I think @arshajii's is the best for being the fastest -- it does not create any unnecessary copies/substrings.

edited May 20, 2022 at 7:23

JayRizzo

3,4863 gold badges35 silver badges51 bronze badges

answered May 23, 2013 at 11:35

shx2

63.6k15 gold badges137 silver badges159 bronze badges

2

This is a nice solution, and handles the case where the substring is not part of the base string nicely.
– mattmc3
Commented May 4, 2014 at 1:47
you get distinct id's (that are separated by several thousand) ... im not sure you dont create unneccessary substrings with this (and im too lazy to properly profile it)
– Joran Beasley
Commented Dec 27, 2018 at 5:28
1

@JoranBeasley, it clearly does create unnecessary substings. I think you misread my answer.
– shx2
Commented May 25, 2019 at 9:13
(so does arashi's I think ... )
– Joran Beasley
Commented May 25, 2019 at 15:09
4

Moreover, this is faster than str.split(..., 1).
– Martijn Pieters
Commented Jul 16, 2019 at 15:48

| Show 1 more comment

JayRizzo · Accepted Answer · 2022-05-20 07:06:07Z

79

s1 = "hello python world , i'm a beginner"
s2 = "world"

print(s1[s1.index(s2) + len(s2):])

If you want to deal with the case where s2 is not present in s1, then use s1.find(s2) as opposed to index. If the return value of that call is -1, then s2 is not in s1.

edited May 20, 2022 at 7:06

JayRizzo

3,4863 gold badges35 silver badges51 bronze badges

answered Sep 24, 2012 at 20:27

arshajii

129k26 gold badges243 silver badges291 bronze badges

you get distinct id's (that are separated by several thousand) ... im not sure you dont create unneccessary substrings with this
– Joran Beasley
Commented Dec 27, 2018 at 5:28
@JoranBeasley, we only call index(), len() and slice. There is no reason for index() and len() to create substrings, and if they do (I find it hard to believe), that's just an unnecessary implementation detail. Same for slice -- there is no reason for it to create substrings other than the one returned.
– shx2
Commented Nov 8, 2019 at 18:27
@shx2 print( s1[s1.index(s2) + len(s2):] is s1[s1.index(s2) + len(s2):])
– Joran Beasley
Commented Nov 9, 2019 at 7:00
@JoranBeasley what point are you trying to making with this snippet? That on multiple calls different objects are returned? by "unnecessary substrings" I mean substrings other than the one returned, i.e. substrings which are not necessary to create in order to derive the result.
– shx2
Commented Nov 9, 2019 at 16:14

Add a comment |

Martijn Pieters · Accepted Answer · 2019-07-16 19:24:29Z

You want to use str.partition():

>>> my_string.partition("world")[2]
" , i'm a beginner "

because this option is faster than the alternatives.

Note that this produces an empty string if the delimiter is missing:

>>> my_string.partition("Monty")[2]  # delimiter missing
''

If you want to have the original string, then test if the second value returned from str.partition() is non-empty:

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

You could also use str.split() with a limit of 1:

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

However, this option is slower. For a best-case scenario, str.partition() is easily about 15% faster compared to str.split():

                                missing        first         lower         upper          last
      str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
      str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
            % best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

This shows timings per execution with inputs here the delimiter is either missing (worst-case scenario), placed first (best case scenario), or in the lower half, upper half or last position. The fastest time is marked with [...] and <...> marks the worst.

The above table is produced by a comprehensive time trial for all three options, produced below. I ran the tests on Python 3.7.4 on a 2017 model 15" Macbook Pro with 2.9 GHz Intel Core i7 and 16 GB ram.

This script generates random sentences with and without the randomly selected delimiter present, and if present, at different positions in the generated sentence, runs the tests in random order with repeats (producing the fairest results accounting for random OS events taking place during testing), and then prints a table of the results:

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer

setup = "from __main__ import sentence as s, delimiter as d"
tests = {
    "str.partition(...)[2]": "r = s.partition(d)[2]",
    "str.partition(...) and test": (
        "prefix, success, result = s.partition(d)\n"
        "if not success: result = prefix"
    ),
    "str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}

placement = "missing first lower upper last".split()
delimiter_count = 3

wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
    # macos
    wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]

def gen_sentence(delimiter, where="missing", l=1000):
    """Generate a random sentence of length l

    The delimiter is incorporated according to the value of where:

    "missing": no delimiter
    "first":   delimiter is the first word
    "lower":   delimiter is present in the first half
    "upper":   delimiter is present in the second half
    "last":    delimiter is the last word

    """
    possible = [w for w in words if delimiter not in w]
    sentence = random.choices(possible, k=l)
    half = l // 2
    if where == "first":
        # best case, at the start
        sentence[0] = delimiter
    elif where == "lower":
        # lower half
        sentence[random.randrange(1, half)] = delimiter
    elif where == "upper":
        sentence[random.randrange(half, l)] = delimiter
    elif where == "last":
        sentence[-1] = delimiter
    # else: worst case, no delimiter

    return " ".join(sentence)

delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
    # where, delimiter, sentence
    (w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
    # label, test, where, delimiter sentence
    (*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)

for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
    print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
    t = Timer(test, setup)
    number, _ = t.autorange()
    results = t.repeat(5, number)
    # best time for this specific random sentence and placement
    timings.setdefault(
        label, {}
    ).setdefault(
        where, []
    ).append(min(dt / number for dt in results))

print()

scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))

for row, label in enumerate(tests):
    columns = []
    worst = float("-inf")
    for p in placement:
        timing = min(timings[label][p])
        if timing < bestrow[p][0]:
            bestrow[p] = (timing, row)
        if timing > worstrow[p][0]:
            worstrow[p] = (timing, row)
        worst = max(timing, worst)
        columns.append(timing)

    scale, unit = next((s, u) for s, u in scales if worst >= s)
    rows.append(
        [f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
    )

colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
    for c, p in enumerate(placement, 1):
        if bestrow[p][1] == r:
            row[c] = f"[{row[c][1:-1]}]"
        elif worstrow[p][1] == r:
            row[c] = f"<{row[c][1:-1]}>"
    print(*row, sep="  ")

percentages = []
for p in placement:
    best, worst = bestrow[p][0], worstrow[p][0]
    ratio = ((worst - best) / worst)
    percentages.append(f"{ratio:{colwidth - 1}.1%} ")

print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

great answer! especially because you provide the real reason this is better :P — Joran Beasley, Commented Nov 9, 2019 at 7:02

Community · Accepted Answer · 2015-12-18 20:17:52Z

25

If you want to do this using regex, you could simply use a non-capturing group, to get the word "world" and then grab everything after, like so

(?:world).*

The example string is tested here

edited Dec 18, 2015 at 20:17

CommunityBot

11 silver badge

answered Sep 24, 2012 at 20:31

Tadgh

2,02912 silver badges25 bronze badges

35

some people when faced with a problem think "I know , Ill use a regular expression." ... now you have 2 problems...
– Joran Beasley
Commented Sep 24, 2012 at 20:32
3

haha, my mistake, I thought this was tagged regex so I tried to give a regex answer. Oh well, it's there now.
– Tadgh
Commented Sep 24, 2012 at 20:36
2

its all good ... its certainly one way of skinning this cat... overkill for this problem though (imho)
– Joran Beasley
Commented Sep 24, 2012 at 20:37
The non-capturing group link is no longer pointing to the right thing.
– Apteryx
Commented Dec 17, 2015 at 18:22
4

For those interested. Here is the full code result = re.search(r"(?:world)(.*)", "hello python world , i'm a beginner ").group(1)
– RaduS
Commented Jul 1, 2019 at 8:23

| Show 1 more comment

gntskn · Accepted Answer · 2020-06-10 15:21:25Z

11

In Python 3.9, a new removeprefix method is being added:

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'

Documentation: https://docs.python.org/3.9/library/stdtypes.html#str.removeprefix
Announcement: https://docs.python.org/3.9/whatsnew/3.9.html

answered Jun 10, 2020 at 15:21

gntskn

4044 silver badges10 bronze badges

was going to add this but found this answer
– Brian
Commented Jul 30, 2023 at 20:43

Add a comment |

Hadij · Accepted Answer · 2021-01-20 11:54:09Z

7

You can use the package called substring. Just install using the command pip install substring. You can get the substring by just mentioning the start and end characters/indices.

For example:

import substring
s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")
print(s)

Output:

# s = defghijklmn

edited Jan 20, 2021 at 11:54

Hadij

4,3126 gold badges32 silver badges49 bronze badges

answered Jun 26, 2018 at 4:11

Sriram Veturi

2063 silver badges5 bronze badges

Add a comment |

JayRizzo · Accepted Answer · 2022-05-20 07:12:27Z

6

Try this general approach:

import re

my_string="hello python world , i'm a beginner"
p = re.compile("world(.*)")
print(p.findall(my_string))

# [" , i'm a beginner "]

edited May 20, 2022 at 7:12

JayRizzo

3,4863 gold badges35 silver badges51 bronze badges

answered Feb 27, 2020 at 23:36

Hadij

4,3126 gold badges32 silver badges49 bronze badges

Add a comment |

JayRizzo · Accepted Answer · 2022-05-20 07:16:48Z

6

It's an old question but i faced a very same scenario, i need to split a string using as demiliter the word "low" the problem for me was that i have in the same string the word below and lower.

I solved it using the re module this way

import re

string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'

# use re.split with regex to match the exact word
stringafterword = re.split('\\blow\\b',string)[-1]

print(stringafterword)
# ' reading is seen as positive (or bullish) for the Korean Won.'

# the generic code is:
re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

Hope this can help someone!

edited May 20, 2022 at 7:16

JayRizzo

3,4863 gold badges35 silver badges51 bronze badges

answered Jan 13, 2017 at 3:15

Leonardo Hermoso

83812 silver badges26 bronze badges

1

Perhaps you could also just use: string.partition(" low ")[2]? (Note the spaces on either side of low
– Mtl Dev
Commented Feb 8, 2017 at 15:49

Add a comment |

Tim Sharapov · Accepted Answer · 2023-03-26 12:17:38Z

0

If you prefer to do it using only python regular expressions re library, you can do it with Match.string property and Match.end() method of the Match object:

import re

my_string="hello python world, I'm a beginner"

match = re.search("world", my_string)

if match:
    print(match.string[match.end():])
    # , I'm a beginner

edited Mar 26, 2023 at 12:17

answered Mar 26, 2023 at 12:12

Tim Sharapov

214 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to get a string after a specific substring?

10 Answers 10

Not the answer you're looking for? Browse other questions tagged
python
string
substring
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

Not the answer you're looking for? Browse other questions tagged pythonstringsubstring or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
string
substring
or ask your own question.