4

I have a filename as "Planning_Group_20180108.ind". i only want Planning_Group out of it. File name can also be like Soldto_20180108, that case the output should be Soldto only.

A solution without using reg ex is more preferable as it is easier to read for a person who haven't used regex yet

6
  • 1
    have a look at the split funciton "Planning_Group_20180108.ind".split('_')
    – jan-seins
    Commented Jan 30, 2018 at 15:27
  • 1
    I have tried split but not able to get to the complete solution. Don't know why everybody is down voting it Commented Jan 30, 2018 at 15:28
  • 1
    Your question is unclear. You don't seem to want to split on any underscore, only the last one, is that correct? Commented Jan 30, 2018 at 15:29
  • 2
    Will there always be at least one underscore? If not, what should be the result? Commented Jan 30, 2018 at 15:31
  • 2
    filename.rsplit('_', 1)[0]
    – Matthias
    Commented Jan 30, 2018 at 15:33

5 Answers 5

13

The following should work for you

s="Planning_Group_20180108.ind"
'_'.join(s.split('_')[:-1])

This way you create a list which is the string split at the _. With the [:-1] you remove the last part. '_'.join() combines your list elements in the resulting list.

4

First I would extract filename itself. I'd split it from the extension. You can go easy way by doing:

path = "Planning_Group_20180108.ind"
filename, ext = path.split(".")

It is assuming that path is actually only a filename and extension. If I'd want to stay safe and platform independent, I'd use os module for that:

fullpath = "this/could/be/a/full/path/Planning_Group_20180108.ind"
path, filename = os.path.split(fullpath)

And then extract "root" and extension:

root, ext = os.path.splitext(filename)

That should leave me with Planning_Group_20180108 as root. To discard "_20180108" we need to split string by "_" delimiter, going from the right end, and do it only once. I would use .rsplit() method of string, which lets me specify delimiter, and number of times I want to make splits.

what_i_want, the_rest = root.rsplit("_", 1)

what_i_want should contain left side of Planning_Group_20180108 cut in place of first "_" counting from right side, so it should be Planning_Group

The more compact way of writing the same, but not that easy to read, would be:

what_i_want = os.path.splitext(os.path.split("/my/path/to/Planning_Group_20180108.ind")[1])[0].rsplit("_", 1)

PS. You may skip the part with extracting root and extension if you're sure, that extension will not contain underscore. If you're unsure of that, this step will be necessary. Also you need to think of case with multiple extensions, like /path/to/file/which_has_a.lot.of.periods.and_extentions. In that case would you like to get which_has_a.lot.of.periods.and, or which_has? Think of it while planning your app. If you need latter, you may want to extract root by doing filename.split(".", 1) instead of using os.path.splitext()

reference:

os.path.split(path),

os.path.splitext(path)

str.rsplit(sep=None, maxsplit=-1)

3

You can use re:

import re
s = ["Planning_Group_20180108.ind", 'Soldto_20180108']
new_s = list(map(lambda x:re.findall('[a-zA-Z_]+(?=_\d)', x)[0], s))

Output:

['Planning_Group', 'Soldto']
3
print("Planning_Group_20180108.ind".rsplit("_", 1)[0])
print("Soldto_20180108".rsplit("_", 1)[0])

rsplit allow you to split X times from the end when "_" is detected. In your case, it will split it in an array of two string ["Planning_Group", "20180108.ind"] and you just need to take the first element [0] (http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html)

1

Using regex here is pretty pythonic.

import re
newname = re.sub(r'_[0-9]+', '', 'Planning_Group_20180108.ind"')

Results in:

'Planning_Group.ind'

And the same regex produces 'SoldTo' from 'Soldto_20180108'.

Not the answer you're looking for? Browse other questions tagged or ask your own question.