2

I have huge amount of file pairs of the following filename format: <int>-<code>.txt in a single directory. I need to efficiently loop over the pairs of files with the same <int> part. I would like to avoid full list evaluation due to the number of files in question. You can assume that every file is guaranteed to be in exactly one pair.

Example

0-A.txt
0-B.txt
1-A.txt
1-B.txt
7-A.txt
7-B.txt

The order is not important, just that matching files are returned. I've tried the:

import glob
A_files = glob.iglob('*-A.txt')
B_files = glob.iglob('*-B.txt')
for A_file, B_file in zip(A_files, B_files):
  pass

However, glob has no specified order, so I don't receive matching pairs. Sorting the iterators results in huge lists. Is there an efficient way to loop over matching pairs of files?

2 Answers 2

2

if you know you always have a pair of files one glob is all you need:

A_files = glob.iglob('*-A.txt')
file_pairs = ((file_a,file_a.replace("-A.txt","-B.txt")) for file_a in A_files)
for file_A,file_B in file_pairs:
    pass

this assumes you don't have any *-B.txt files that have no *-A.txt matching file, but since your example uses zip() i assume that is the case

1

Since you know that there are pairs of A-B files, you could just iterate on A files and create B filenames:

import glob
A_files = glob.iglob('*-A.txt')
for A_file in A_files:
  B_file = A_file.partition("-")[0]+"-B.txt"

A_file.partition("-")[0] extracts the digit before the dash so you can generate the other file. You could even make sure that the B file isn't missing (well, you can't do that for A-files, obviously)

1
  • Can't believe I couldn't think of it. Thank you.
    – Aechlys
    Commented Jan 8, 2018 at 9:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.