Split a filename python on underscore

Question

I have a filename as "Planning_Group_20180108.ind". i only want Planning_Group out of it. File name can also be like Soldto_20180108, that case the output should be Soldto only.

A solution without using reg ex is more preferable as it is easier to read for a person who haven't used regex yet

have a look at the split funciton "Planning_Group_20180108.ind".split('_') — jan-seins, Commented Jan 30, 2018 at 15:27
I have tried split but not able to get to the complete solution. Don't know why everybody is down voting it — user7422128, Commented Jan 30, 2018 at 15:28
Your question is unclear. You don't seem to want to split on any underscore, only the last one, is that correct? — Tim Pietzcker, Commented Jan 30, 2018 at 15:29
Will there always be at least one underscore? If not, what should be the result? — Tim Pietzcker, Commented Jan 30, 2018 at 15:31

jan-seins · Accepted Answer · 2018-01-30 15:34:30Z

13

The following should work for you

s="Planning_Group_20180108.ind"
'_'.join(s.split('_')[:-1])

This way you create a list which is the string split at the _. With the [:-1] you remove the last part. '_'.join() combines your list elements in the resulting list.

edited Jan 30, 2018 at 15:34

answered Jan 30, 2018 at 15:30

jan-seins

1,2731 gold badge18 silver badges34 bronze badges

Add a comment |

Marek · Accepted Answer · 2018-01-30 16:06:49Z

First I would extract filename itself. I'd split it from the extension. You can go easy way by doing:

path = "Planning_Group_20180108.ind"
filename, ext = path.split(".")

It is assuming that path is actually only a filename and extension. If I'd want to stay safe and platform independent, I'd use os module for that:

fullpath = "this/could/be/a/full/path/Planning_Group_20180108.ind"
path, filename = os.path.split(fullpath)

And then extract "root" and extension:

root, ext = os.path.splitext(filename)

That should leave me with Planning_Group_20180108 as root. To discard "_20180108" we need to split string by "_" delimiter, going from the right end, and do it only once. I would use .rsplit() method of string, which lets me specify delimiter, and number of times I want to make splits.

what_i_want, the_rest = root.rsplit("_", 1)

what_i_want should contain left side of Planning_Group_20180108 cut in place of first "_" counting from right side, so it should be Planning_Group

The more compact way of writing the same, but not that easy to read, would be:

what_i_want = os.path.splitext(os.path.split("/my/path/to/Planning_Group_20180108.ind")[1])[0].rsplit("_", 1)

PS. You may skip the part with extracting root and extension if you're sure, that extension will not contain underscore. If you're unsure of that, this step will be necessary. Also you need to think of case with multiple extensions, like /path/to/file/which_has_a.lot.of.periods.and_extentions. In that case would you like to get which_has_a.lot.of.periods.and, or which_has? Think of it while planning your app. If you need latter, you may want to extract root by doing filename.split(".", 1) instead of using os.path.splitext()

reference:

os.path.split(path),

os.path.splitext(path)

str.rsplit(sep=None, maxsplit=-1)

Ajax1234 · Accepted Answer · 2018-01-30 15:34:44Z

3

You can use re:

import re
s = ["Planning_Group_20180108.ind", 'Soldto_20180108']
new_s = list(map(lambda x:re.findall('[a-zA-Z_]+(?=_\d)', x)[0], s))

Output:

['Planning_Group', 'Soldto']

answered Jan 30, 2018 at 15:34

Ajax1234

71.1k9 gold badges64 silver badges107 bronze badges

Add a comment |

romph · Accepted Answer · 2018-01-30 15:44:34Z

3

print("Planning_Group_20180108.ind".rsplit("_", 1)[0])
print("Soldto_20180108".rsplit("_", 1)[0])

rsplit allow you to split X times from the end when "_" is detected. In your case, it will split it in an array of two string ["Planning_Group", "20180108.ind"] and you just need to take the first element [0] (http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html)

edited Jan 30, 2018 at 15:44

answered Jan 30, 2018 at 15:37

romph

4873 silver badges11 bronze badges

Add a comment |

Evan · Accepted Answer · 2018-01-30 15:38:52Z

1

Using regex here is pretty pythonic.

import re
newname = re.sub(r'_[0-9]+', '', 'Planning_Group_20180108.ind"')

Results in:

'Planning_Group.ind'

And the same regex produces 'SoldTo' from 'Soldto_20180108'.

answered Jan 30, 2018 at 15:38

Evan

2,2711 gold badge16 silver badges20 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Split a filename python on underscore

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
python
string
split
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Not the answer you're looking for? Browse other questions tagged pythonstringsplit or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
string
split
or ask your own question.