I have a script that receives two files as input and creates a dictionary based on lines. Finally, it overwrites the first file.
I am looking for a way to run this script on all file pairs of a folder, choosing as sys.argv[1] and sys.argv[2] based on a pattern in the name.
import re
import sys
datafile = sys.argv[1]
schemaseqs = sys.argv[2]
datafile_lines = []
d = {}
prev = None
with open(datafile, 'r') as f:
i = 0
for line in f:
if i % 2 == 0:
d[line.strip()]=0
prev = line.strip()
else:
d[prev] = line.strip()
i+=1
new_d = {}
with open(schemaseqs, 'r') as f:
i=0
prev = None
for line in f:
if i % 2 == 0:
new_d[line.strip()]=0
prev = line.strip()
else:
new_d[prev] = line.strip()
i+=1
for key, value in d.items():
if value in new_d:
d[key] = new_d[value]
print(d)
with open(datafile,'w') as filee:
for k,v in d.items():
filee.writelines(k)
filee.writelines('\n')
filee.writelines(v)
filee.writelines('\n')
I have hundreds of file pairs all sharing the same pattern proteinXXXX (where XXXX is a number) This number can have up to four digits (e.g. 9,99,999 or 9999). So I have protein 555.txt and protein 555.fasta
I've seen I can use glob or os.listdir to read files from a directory. However, I cannot assign them to a variable and extract the lines one pair at a time in every pair of the directory.
Any help is appreciated.