Iterating through file multiple times (Python)

Question

I have a file that looks like this:

1,var1
2,var2
3,var3
4,var1_val1
5,var2_val2
6,var1_val2
7,var3_val1
8,var2_val1
9,var3_val2

Output file should look like:

var1 1 4 6 
var2 2 8 5
var3 3 7 9

My code is quite complicated. It works, but it's very inefficient. Can this be done more efficiently:

def findv(var):
    with open(inputfile) as f:
        for line in f:
            elems=line.split(',')
            name=elems[0]
            if var!=name:
                continue
            field=elems[0]
        f.seek(0)
        for line in f:
            elems2=line.split(',')
            if elems2[1].endswith(var+'_val1'):
                first=elems2[0]
        f.seek(0)
        for line in f:
            elems3=line.split(',')
            if elems3[1].endswith(var+'_val3'):
                second=elems3[0]
    return var,field,first,second

main part of the code:

with open(inputfile) as f:
    with open(outputfile) as fout:
        for line in f:
            tmp=line.split(',')
        if current[1].endswith('val1') or current[1].endswith('val2'):
            continue
        v=tmp[1]
        result=findv(v)
        f2.write(result)

My function findv(var) is called each time a line in input file starts with varx and then searches through the file multiple times until it finds fields that correspond to varx_val1 and varx_val2.

EDIT: I need to preserve the order of the input file, so var1 has to appear first in the output file, then var2, then var3 etc.

Steven Rumbalski · Accepted Answer · 2015-05-07 06:12:32Z

4

Use a dictionary, with the keys being your labels and a list to store your values. This way, you only have to loop over your file once.

from collections import defaultdict

results = defaultdict(list)

with open('somefile.txt') as f:
   for line in f:
      if line.strip():
         value, key = line.split(',')
         if '_' in key:
             key = key.split('_')[0] # returns var1 from var1_val1
         results[key].append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

Here is a version that includes the below comments:

from collections import OrderedDict

results = OrderedDict

with open('somefile.txt') as f:
   for line in f:
      line = line.strip()
      if line:
         value, key = line.split(',')
         key = key.split('_')[0] # returns var1 from var1_val1
         results.setdefault(key, []).append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

edited May 7, 2015 at 6:12

Steven Rumbalski

45.3k9 gold badges92 silver badges122 bronze badges

answered May 7, 2015 at 5:41

Burhan Khalid

173k19 gold badges249 silver badges289 bronze badges

I clarified my question. I need to preserve the order of input file, so the output file has to be in order var1, var2,var3
– Anastasia
Commented May 7, 2015 at 5:48
1

@Anastasia: Then make results be an OrderedDict. Change results[key].append(value) to results.setdefault(key, []).append(value).
– Steven Rumbalski
Commented May 7, 2015 at 5:55
Also, the names of variables are words, they don't end with a number, so I can't simply re-order the file based on the numerical value.
– Anastasia
Commented May 7, 2015 at 5:56
No need to guard key = key.split('_')[0] with if '_' in key: because "nounderscore" == "nounderscore".split('_')[0].
– Steven Rumbalski
Commented May 7, 2015 at 5:57
@Anastasia: No need to sort the result. collections.OrderedDict preserves the insertion order of keys.
– Steven Rumbalski
Commented May 7, 2015 at 5:58

| Show 3 more comments

chw21 · Accepted Answer · 2015-05-07 06:34:13Z

0

I have written a python program that iterates over the file only once, reads all the important data into a dict, and then writes the dict into the output file.

#!/usr/bin/env python3
import collections

output = collections.OrderedDict()

with open(inputfile, 'r') as infile:
    for line in infile:
        dat, tmp = line.strip().split(',')
        if '_val' in tmp:
            key, idxstr = tmp.split('_val')
            idx = int(idxstr)
        else:
            key = tmp
            idx = 0
        output.setdefault(key, ["", "", ""])[idx] = dat

with open(outoutfile, 'w') as outfile:
    for var in output:
        v = output[var]
        outfile.write('{} {}\n'.format(var, ' '.join(v)))

Update: modified according to comments

edited May 7, 2015 at 6:34

answered May 7, 2015 at 6:10

chw21

8,0601 gold badge17 silver badges33 bronze badges

Don't use naked excepts. Use except ValueError. No need to specify 'r' mode, since this is the default for open(). Use dict.setdefault() to assign to keys that may have missing values.
– Joel Cornett
Commented May 7, 2015 at 6:16

Add a comment |

Collectives™ on Stack Overflow

Iterating through file multiple times (Python)

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.