Loop through pair of files python

Question

I have a script that receives two files as input and creates a dictionary based on lines. Finally, it overwrites the first file.

I am looking for a way to run this script on all file pairs of a folder, choosing as sys.argv[1] and sys.argv[2] based on a pattern in the name.

import re
import sys

datafile = sys.argv[1]
schemaseqs = sys.argv[2]

datafile_lines = []
d = {}
prev = None
with open(datafile, 'r') as f:
    i = 0
    for line in f:
        if i % 2 == 0:
            d[line.strip()]=0
            prev = line.strip()
        else:
            d[prev] = line.strip()
        i+=1

new_d = {}
with open(schemaseqs, 'r') as f:
    i=0
    prev = None
    for line in f:
        if i % 2 == 0:
            new_d[line.strip()]=0
            prev = line.strip()
        else:
            new_d[prev] = line.strip()
        i+=1

for key, value in d.items():
    if value in new_d:
        d[key] = new_d[value]

print(d)

with open(datafile,'w') as filee:
    for k,v in d.items():
        filee.writelines(k)
        filee.writelines('\n')
        filee.writelines(v)
        filee.writelines('\n')

I have hundreds of file pairs all sharing the same pattern proteinXXXX (where XXXX is a number) This number can have up to four digits (e.g. 9,99,999 or 9999). So I have protein 555.txt and protein 555.fasta

I've seen I can use glob or os.listdir to read files from a directory. However, I cannot assign them to a variable and extract the lines one pair at a time in every pair of the directory.

Any help is appreciated.

are you reading files from single folder or multiple folders? — deadshot, Commented May 8, 2020 at 20:37

dlask · Accepted Answer · 2020-05-08 21:39:48Z

Just the concept.

Import required libraries.

import glob
import os.path

Define function that extracts only the basename (the part without extension) from filename.

def basename(fn):
    return os.path.splitext(os.path.basename(fn))[0]

Create two sets, one with .txt files, another with .fasta files.

t = {basename(fn) for fn in glob.glob("protein*.txt")}
f = {basename(fn) for fn in glob.glob("protein*.fasta")}

Calculate intersection of these two sets to be sure that both .txt and .fasta files exist with the same basename. Then add the missing suffixes and let them process with the existing code.

for bn in t.intersection(f):
    process(bn + ".txt", bn + ".fasta")

Collectives™ on Stack Overflow

Loop through pair of files python

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.