2
\$\begingroup\$

I have a folder containing thousands of files and directories. I would like to move all the files that are not contained in a sub directory into a directory and name it the file name, while leaving the files already inside of directories alone, ie:

create 'newdir' ; move 123.mp4 ; into 'newdir' ; change name 'newdir' to '123' ; repeat .

In response to the mentioned question, I posted an answer with the following script.

#%%
import os

# %%
source = "/Users/me/Desktop/folder" # notice the missing last "/"
filenames = next(os.walk(source))[2]
filenames = [ x for x in filenames if not ".DS_Store" in x]
#%%
dirpaths = [os.path.splitext(file)[0] for file in filenames]

# %%
def making(directory,source=source):
   directory = os.path.join(source,directory+"/")
   os.makedirs(directory)

a = [making(directory) for directory in dirpaths]     #list comprehension could be faster than loops.

# %%
def movingfiles(onefile, source=source):
   os.rename(source + "/" + onefile, source + "/" + os.path.splitext(onefile)[0] + "/"  + onefile ) 

b = [movingfiles(onefile) for onefile in filenames]       #useless list again.

I want to know whether list comprehensions are better the way I used them. Also, I know that I can avoid the directory list and making a lot of empty directories at once by putting them in one function (and make "the" directory on the spot), but can anything else be improved? For e.g., I had to put a note to avoid putting / in source. Also, is this way of using lists to perform a function actually helpful?

\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Like you noticed, no need to use multiple loops when you can do everything in one function and loop only once.

Aside from that, there are many small things that can be improved with this script:

  • This script doesn't work if the script is not in the same directory as the source directory. Correct if me I am wrong, but this is what I encountered when I tested the code. To allow for this flexibility (of using this script at any arbritary location), it is necessary to use absolute paths.
  • We should probably check if the directory we're about to create already exists. It may be possible that two images may have the same basename but different extensions. For this case, using os.path.exists will save us down the road and is simple to implement.
  • I noticed you excluded .DS_Store from filenames. I took it a step further and excluded all hidden files and files that started with __ (like Python build directories/files).
  • When using a loop comprehension for the sake of the for loop and not for the results, there's no need to store the output. For example, when using a = [making(directory) for directory in dirpaths], it is more useful to only write [making(directory) for directory in dirpaths] on that line. Storing it as a result implies that the output is being used somewhere later in the script, when in this case, it is not useful in the final version.
  • It is better to use shutil.move for the movingfiles function. Under the hood, Python will decide whether to use os.rename or if a copy-remove operation is neccesary (see shutil.move for more information).
  • Using os.makedirs is overkill for this example here. os.makedirs is meant for recursively creating directories, not for single-level directory creation. The similiarily named os.mkdir is more appropriate here and probably will be more performant.
  • It is not necessary to check if the source directory string contains a '/'. On the documentation, os.path.join verifies that the result will have "exactly one directory seperator". Therefore, we don't have worry about extra logic.
  • Normally, I wouldn't bother with optimization when the context of the script is for running a few files. In the case of the thousands of objects, however, it is crucial to reduce how many loops we run. In the code below, you see implicit use of generators. Specifically, we will lazily evaluate our list of filenames upto the point we process each filename. This means we only run through our list of filenames once(!)
  • On the topic of performance, when we have a custom function defined (read: not lambda), it almost always better to use map when we don't require storing the output. On Python2, the advantage was neglible, but on Python3, the difference is noticable. So we will use map below. (map is very much faster when using builtin Python functions that call on C code. Why? Because map also reaches down to C, meaning you get a barebones C loop for those operations. Looping in Python is unintuitively bloated compared to C.)
  • The use os.path.splitext is correct for this script. However, there is a slight performance cost to using this function, and so I opted for the leaner .split('.', 1)[0] which is faster. As long as the filenames don't contain '.' other than at the end as an extension, we'll be fine.
#!/usr/bin/env python
import os
import shutil
import argparse
import functools


def mkdir_move_file(file, source):
    directory = file.split('.', 1)[0]  
    if not os.path.exists(directory):
        os.mkdir(directory)
    shutil.move(file, directory)


def mkdir_move_all_files(source):
    if not os.path.exists(args.source):
        raise Exception('Source folder cannot be found or does not exist!')
    filenames = (os.path.join(source, x) for x in next(os.walk(source))[2]
                 if not x.startswith('.') and not x.startswith('__'))

    # basically the same as [mkdir_move_file(file, source) for file in filenames]
    map(functools.partial(mkdir_move_file, source=source), filenames)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Make folders for loose files in folder.')
    parser.add_argument('source', action='store', type=str, help='The source directory to process.')

    args = parser.parse_args()

    mkdir_move_all_files(args.source)

Some further notes, beyond the initial scope of the question:

  • Python makes it simple to create scripts-- the key is using the argparse library. This library lets us parse arguments given through the command line, which can then be passed along the functions internally. Simply add parsing to your script and you can use the executable script from the command line!
  • If our mkdir_move_all_files function was more complicated/intensive, we could use asynchronous programming to speed up our overall program. That way we can fire off multiple evaluations without waiting for the result to complete. However, I don't see it being hugely beneficial for this case.

If you have any questions let me know.

\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.