44

In this other question, the votes clearly show that the os.path.splitext function is preferred over the simple .split('.')[-1] string manipulation. Does anyone have a moment to explain exactly why that is? Is it faster, or more accurate, or what? I'm willing to accept that there's something better about it, but I can't immediately see what it might be. Might importing a whole module to do this be overkill, at least in simple cases?

EDIT: The OS specificity is a big win that's not immediately obvious; but even I should've seen the "what if there isn't a dot" case! And thanks to everybody for the general comments on library usage.

1
  • 7
    Because this is not Perl and there is only one way to do things. ;)
    – Nick T
    Commented Jan 20, 2012 at 5:30

11 Answers 11

43

Well, there are separate implementations for separate operating systems. This means that if the logic to extract the extension of a file differs on Mac from that on Linux, this distinction will be handled by those things. I don't know of any such distinction so there might be none.


Edit: @Brian comments that an example like /directory.ext/file would of course not work with a simple .split('.') call, and you would have to know both that directories can use extensions, as well as the fact that on some operating systems, forward slash is a valid directory separator.

This just emphasizes the use a library routine unless you have a good reason not to part of my answer.

Thanks @Brian.


Additionally, where a file doesn't have an extension, you would have to build in logic to handle that case. And what if the thing you try to split is a directory name ending with a backslash? No filename nor an extension.

The rule should be that unless you have a specific reason not to use a library function that does what you want, use it. This will avoid you having to maintain and bugfix code others have perfectly good solutions to.

2
  • 2
    The platform distinction is very important when dealing with full paths. eg "./some.dir/file" obviously you need to know that "/" characters are path seperators in order to work out the correct extension. split() would give ".dir/file" as the extension.
    – Brian
    Commented Feb 12, 2009 at 21:42
  • I think a useful addition would be links to relevant source: here's the generic splitting logic, which is used by posix-specific and nt-specific extension splitting functions. Commented Sep 7, 2011 at 4:22
17

os.path.splitext will correctly handle the situation where the file has no extension and return an empty string. .split will return the name of the file.

13

splitext() does a reverse search for '.' and returns the extension portion as soon as it finds it. split('.') will do a forward search for all '.' characters and is therefore almost always slower. In other words splitext() is specifically written for returning the extension unlike split().

(see posixpath.py in the Python source if you want to examine the implementation).

0
6

There exist operating systems that do not use ‘.’ as an extension separator.

(Notably, RISC OS by convention uses ‘/’, since ‘.’ is used there as a path separator.)

6
  • True, but os.path.splitext() only splits on a period according to the python docs, so it'd still fail in that case.
    – Jay
    Commented Feb 12, 2009 at 18:18
  • 3
    Jay: this suggests that either the docu is wrong or that there is not (yet!!!) an implementation of Python for RISC OS or similar systems. Either way, splitext would be future-proof to such changes, while the other method would not be. Commented Feb 12, 2009 at 18:28
  • may i suggest that RISC OS is doesn't fit at all since question was about "at least in simple cases". Commented Feb 12, 2009 at 18:30
  • doc bug then, on platform ‘riscos’, os.path.splitext splits on slash. Wouldn't really make any sense otherwise!
    – bobince
    Commented Feb 12, 2009 at 18:31
  • there's other os peculiarties to be considered as well. for example, posixpath.splitext() looks for / as dir separator, while macpath.splitext() looks for :
    – user3850
    Commented Feb 12, 2009 at 19:09
2

A clearly defined and documented method to get the file extension would always be preferred over splitting a string willy nilly because that method would be more fragile for various reasons.

Edit: This is not language specific.

2
  1. Right tool for the right job
  2. Already thoroughly debugged and tested as part of the Python standard library - no bugs introduced by mistakes in your hand-rolled version (e.g. what if there is no extension, or the file is a hidden file on UNIX like '.bashrc', or there are multiple extensions?)
  3. Being designed for this purpose, the function has useful return values (basename, ext) for the filename passed, which can be more useful in certain cases versus having to split the path manually (again, edge cases could be a concern when figuring out the basename - ext)

The only reason to worry about importing the module is concern for overhead - that's not likely to be a concern in the vast majority of cases, and if it is that tight then it's likely other overhead in Python will be a bigger problem before that.

1

The first and most obvious difference is that the split call has no logic in it to default when there is no extension.

This can also be accomplished by a regex, in order to get it to behave as a 1 liner without extra includes, but still return, empty string if the extension isn't there.

Also, the path library can handle different contexts for paths having different seperators for folders.

0

In the comment to the answer that provided this solution:

"If the file has no extension this incorrectly returns the file name instead of an empty string."

Not every file has an extension.

2
  • 1
    not every file has a basename, but splitext() prefers to ignore leading dot. Commented Feb 12, 2009 at 18:23
  • 1
    because on unix a file with a leadin dot probably means it's a hidden file, not a lone extension
    – user3850
    Commented Feb 13, 2009 at 3:03
0

Besides being standard and therefore guaranteed to be available, os.path.splitext:

Handles edge cases - like that of a missing extension.
Offers guarantees - Besides correctly returning the extension if one exists, it guarantees that root + ext will always return the full path.
Is cross-platform - in the Python source there are actually three different version of os.path, and they are called based on which operating system Python thinks you are on.
Is more readable - consider that your version requires users to know that arrays can be indexed with negative numbers.

btw, it should not be any faster.

0

1) simple split('.')[-1] won't work correctly for the path as C:\foo.bar\Makefile so you need to extract basename first with os.path.basename(), and even in this case it will fail to split file without extension correctly. os.path.splitext do this under the hood.

2) Despite the fact os.path.splitext is cross-platform solution it's not ideal. Let's looking at the special files with leading dot, e.g. .cvsignore, .bzrignore, .hgignore (they are very popular in some VCS as special files). os.path.splitext will return entire file name as extension, although it does not seems right for me. Because in this case name without extension is empty string. Although this is intended behavior of Python standard library, it's may be not what user wants actually.

0

I am not sure Python has been ported on the VMS platform, but assuming it did (*):

  • Filenames are typically of the format: $device-dir-subdir$filename.$type;$version (**)

I hope you realize that using a narrow-scope method, which is only influenced by the systems you are exposed on, isn't optimal for long-term code maintainability and such practices are particularly detrimental to mixing and matching disparate software components in larger software projects.

Essentially, in the latter case the success probability (dependability) is akin to

R(t)=1-(1-Ri)^n

and you can now see how poor/incomplete software implementations result into buggy programs. More broadly speaking, porting software is difficult precisely due to such bugs.

(*) hm, googling revealed quickly about: https://www.vmspython.org
(**) Check at the regex wars over here! https://stackoverflow.com/a/4465456/1574494

Not the answer you're looking for? Browse other questions tagged or ask your own question.