106

I have a path which looks like

/First/Second/Third/Fourth/Fifth

and I would like to remove the First from it, thus obtaining

Second/Third/Fourth/Fifth

The only idea I could come up with is to use recursively os.path.split but this does not seem optimal. Is there a better solution?

5 Answers 5

136

There really is nothing in the os.path module to do this. Every so often, someone suggests creating a splitall function that returns a list (or iterator) of all of the components, but it never gained enough traction.

Partly this is because every time anyone ever suggested adding new functionality to os.path, it re-ignited the long-standing dissatisfaction with the general design of the library, leading to someone proposing a new, more OO-like, API for paths to deprecated the os, clunky API. In 3.4, that finally happened, with pathlib. And it's already got functionality that wasn't in os.path. So:

>>> import pathlib
>>> p = pathlib.Path('/First/Second/Third/Fourth/Fifth')
>>> p.parts[2:]
('Second', 'Third', 'Fourth', 'Fifth')
>>> pathlib.Path(*p.parts[2:])
PosixPath('Second/Third/Fourth/Fifth')

Or… are you sure you really want to remove the first component, rather than do this?

>>> p.relative_to(*p.parts[:2])
PosixPath('Second/Third/Fourth/Fifth')

If you need to do this in 2.6-2.7 or 3.2-3.3, there's a backport of pathlib.

Of course, you can use string manipulation, as long as you're careful to normalize the path and use os.path.sep, and to make sure you handle the fiddly details with non-absolute paths or with systems with drive letters, and…

Or you can just wrap up your recursive os.path.split. What exactly is "non-optimal" about it, once you wrap it up? It may be a bit slower, but we're talking nanoseconds here, many orders of magnitude faster than even calling stat on a file. It will have recursion-depth problems if you have a filesystem that's 1000 directories deep, but have you ever seen one? (If so, you can always turn it into a loop…) It takes a few minutes to wrap it up and write good unit tests, but that's something you just do once and never worry about again. So, honestly, if you don't want to use pathlib, that's what I'd do.

6
  • performance-wise you are totally right: we are talking about nanoseconds; it is more me trying to learn the best way / other ways of doing it
    – meto
    Commented Nov 3, 2014 at 22:48
  • @Hackaholic: As the answer explains in detail, pathlib comes with Python 3.4+, and you can install the backport for 2.6-2.7 or 3.2-3.3.
    – abarnert
    Commented Nov 3, 2014 at 22:52
  • @meto: Yeah, that's perfectly reasonable. It's just that often when people say "optimize" or "efficient" they really are asking about (time) performance, in cases where it doesn't actually matter, so it's better to be sure of what people are asking for…
    – abarnert
    Commented Nov 3, 2014 at 22:53
  • @abarnert as a serious question: what would be the best way of asking stuff like this, in your opinion?
    – meto
    Commented Nov 4, 2014 at 0:12
  • @meto: In Python, usually asking what's "most Pythonic" is a good bet—people will interpret that as not just "most idiomatic in Python", but also "simplest" and "all-around best".
    – abarnert
    Commented Nov 4, 2014 at 0:39
26

A bit like another answer, taking advantage of os.path :

os.path.join(*(x.split(os.path.sep)[2:]))

... assuming your string starts with a separator.

3
  • 1
    Can you explain a bit about the use of the "*" here?
    – Luke
    Commented Feb 24, 2017 at 17:14
  • 1
    @Luke The * is used in order to treat the set generated by (x.split(os.path.sep)[2:]) as the *args keyword. However, this will not work is the path is too short since the argument list will be completely empty
    – asdf
    Commented Oct 13, 2017 at 3:07
  • Works great for me without needing to import (install) any package. Thanks @amyrit!
    – Hamfry
    Commented Jul 29, 2022 at 7:37
23

You can try:

os.path.relpath(your_path, '/First')
1
  • Good for py2/py3 std lib compatibility.
    – chadlagore
    Commented Oct 28, 2022 at 15:34
20

A simple approach

a = '/First/Second/Third/Fourth/Fifth'
"/".join(a.strip("/").split('/')[1:])

output:

Second/Third/Fourth/Fifth

In this above code i have split the string. then joined leaving 1st element

Using itertools.dropwhile:

>>> a = '/First/Second/Third/Fourth/Fifth'
>>> "".join(list(itertools.dropwhile(str.isalnum, a.strip("/"))[1:])
'Second/Third/Fourth/Fifth'
3
  • At first I thought this wouldn't work on paths which start with the path seperator because you seem to wantonly strip the first character from the string, but upon further review, what does the first character matter if you are just removing the first segment. +1, but maybe something in the answer which says this (or maybe a comment by someone you helped)
    – iLoveTux
    Commented Oct 12, 2015 at 19:06
  • @iLoveTux made it more efficient
    – Hackaholic
    Commented Oct 13, 2015 at 10:54
  • 2
    You should replace "/" with os.sep Commented Feb 18, 2022 at 16:38
0

I was looking if there was a native way to do it, but it seems it doesn't.

I know this topic is old, but this is what I did to get me to the best solution: There was two basically two approaches: using split() and using len(). Both had to use slicing.

1) Using split()

import time

start_time = time.time()

path = "/folder1/folder2/folder3/file.zip"
for i in xrange(500000):
    new_path = "/" + "/".join(path.split("/")[2:])

print("--- %s seconds ---" % (time.time() - start_time))

Result: --- 0.420122861862 seconds ---

*Removing the char "/" in the line new_path = "/" + "/".... didn't improve the performance too much.

2) Using len(). This method will only work if you provide the folder if you would like to remove

import time

start_time = time.time()

path = "/folder1/folder2/folder3/file.zip"
folder = "/folder1"
for i in xrange(500000):
    if path.startswith(folder):
        a = path[len(folder):]

print("--- %s seconds ---" % (time.time() - start_time))

Result: --- 0.199596166611 seconds ---

*Even with that "if" to check if the path starts with the file name, it was twice as fast as the first method.

In summary: each method has a pro and con. If you are absolutely sure about the folder you want to remove use method two, otherwise I recommend to use method 1 which people here have mentioned previously.

1
  • What about path.split(os.path.sep,2)[ 2 ] if there is a leading separator, or path.split(os.path.sep,1)[ 1 ] if there is not?
    – Victoria
    Commented Apr 7, 2020 at 4:18

Not the answer you're looking for? Browse other questions tagged or ask your own question.