12

I need to process all files in a directory tree recursively, but with a limited depth.

That means for example to look for files in the current directory and the first two subdirectory levels, but not any further. In that case, I must process e.g. ./subdir1/subdir2/file, but not ./subdir1/subdir2/subdir3/file.

How would I do this best in Python 3?

Currently I use os.walk to process all files up to infinite depth in a loop like this:

for root, dirnames, filenames in os.walk(args.directory):
    for filename in filenames:
        path = os.path.join(root, filename)
        # do something with that file...

I could think of a way counting the directory separators (/) in root to determine the current file's hierarchical level and break the loop if that level exceeds the desired maximum.

I consider this approach as maybe insecure and probably pretty inefficient when there's a large number of subdirectories to ignore. What would be the optimal approach here?

1

2 Answers 2

22

I think the easiest and most stable approach would be to copy the functionality of os.walk straight out of the source and insert your own depth-controlling parameter.

import os
import os.path as path

def walk(top, topdown=True, onerror=None, followlinks=False, maxdepth=None):
    islink, join, isdir = path.islink, path.join, path.isdir

    try:
        names = os.listdir(top)
    except OSError, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs

    if maxdepth is None or maxdepth > 1:
        for name in dirs:
            new_path = join(top, name)
            if followlinks or not islink(new_path):
                for x in walk(new_path, topdown, onerror, followlinks, None if maxdepth is None else maxdepth-1):
                    yield x
    if not topdown:
        yield top, dirs, nondirs

for root, dirnames, filenames in walk(args.directory, maxdepth=2):
    #...

If you're not interested in all those optional parameters, you can pare down the function pretty substantially:

import os

def walk(top, maxdepth):
    dirs, nondirs = [], []
    for name in os.listdir(top):
        (dirs if os.path.isdir(os.path.join(top, name)) else nondirs).append(name)
    yield top, dirs, nondirs
    if maxdepth > 1:
        for name in dirs:
            for x in walk(os.path.join(top, name), maxdepth-1):
                yield x

for x in walk(".", 2):
    print(x)
3
  • That's a pretty long piece of code for a small problem... I'd prefer a more compact solution if possible. And I think you mean for ... in walk(...): in the second last line instead of os.walk, don't you? Commented Feb 10, 2016 at 13:23
  • Funny, I was just composing a shorter version :-) and you're right about the errant os. on the penultimate line; fixed.
    – Kevin
    Commented Feb 10, 2016 at 13:30
  • That short version looks cool. I modified it to not return directories (as I only need files), and to compare if maxdepth != 0 so that 0 means only the current directory and I can use negative values to travel the entire directory structure. Commented Feb 10, 2016 at 13:43
12

Starting in python 3.5, os.scandir is used in os.walk instead of os.listdir. It works many times faster. I corrected @kevin sample a little.

import os

def walk(top, maxdepth):
    dirs, nondirs = [], []
    for entry in os.scandir(top):
        (dirs if entry.is_dir() else nondirs).append(entry.path)
    yield top, dirs, nondirs
    if maxdepth > 1:
        for path in dirs:
            for x in walk(path, maxdepth-1):
                yield x

for x in walk(".", 2):
    print(x)
4
  • 1
    it's much faster on windows. And there are backports (scandir module) for python < 3.5 Commented Dec 13, 2018 at 21:52
  • 3
    walkMaxDepth is not defined. should be walk?
    – pacukluka
    Commented Apr 7, 2021 at 18:49
  • 2
    It's funny that in two years no one paid attention to this mistake. I took the code from two different places and the copy paste resulted in different names. This is recursion and instead of walkMaxDepth there should be the name of the motherboard function walk. I have fixed this in code. Thank you for paying attention to this. I myself suffer a lot when the finished snippet does not work.
    – Arty
    Commented Apr 8, 2021 at 21:30
  • To be sufficiently like os.walk, the nodirs list should consist of basenames only. In these solutions, it contains full paths. This is tad ugly, but it would make walk & os.walk produce similar results: (dirs if entry.is_dir() else nondirs).append(entry.path if entry.is_dir() else os.path.basename(entry.path))
    – Brian K
    Commented Oct 29, 2021 at 23:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.