1

I have a file that looks like this:

1,var1
2,var2
3,var3
4,var1_val1
5,var2_val2
6,var1_val2
7,var3_val1
8,var2_val1
9,var3_val2

Output file should look like:

var1 1 4 6 
var2 2 8 5
var3 3 7 9

My code is quite complicated. It works, but it's very inefficient. Can this be done more efficiently:

def findv(var):
    with open(inputfile) as f:
        for line in f:
            elems=line.split(',')
            name=elems[0]
            if var!=name:
                continue
            field=elems[0]
        f.seek(0)
        for line in f:
            elems2=line.split(',')
            if elems2[1].endswith(var+'_val1'):
                first=elems2[0]
        f.seek(0)
        for line in f:
            elems3=line.split(',')
            if elems3[1].endswith(var+'_val3'):
                second=elems3[0]
    return var,field,first,second

main part of the code:

with open(inputfile) as f:
    with open(outputfile) as fout:
        for line in f:
            tmp=line.split(',')
        if current[1].endswith('val1') or current[1].endswith('val2'):
            continue
        v=tmp[1]
        result=findv(v)
        f2.write(result)

My function findv(var) is called each time a line in input file starts with varx and then searches through the file multiple times until it finds fields that correspond to varx_val1 and varx_val2.

EDIT: I need to preserve the order of the input file, so var1 has to appear first in the output file, then var2, then var3 etc.

2 Answers 2

4

Use a dictionary, with the keys being your labels and a list to store your values. This way, you only have to loop over your file once.

from collections import defaultdict

results = defaultdict(list)

with open('somefile.txt') as f:
   for line in f:
      if line.strip():
         value, key = line.split(',')
         if '_' in key:
             key = key.split('_')[0] # returns var1 from var1_val1
         results[key].append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))

Here is a version that includes the below comments:

from collections import OrderedDict

results = OrderedDict

with open('somefile.txt') as f:
   for line in f:
      line = line.strip()
      if line:
         value, key = line.split(',')
         key = key.split('_')[0] # returns var1 from var1_val1
         results.setdefault(key, []).append(value)

for k,v in results.iteritems():
    print('{} {}'.format(k, ' '.join(v)))
8
  • I clarified my question. I need to preserve the order of input file, so the output file has to be in order var1, var2,var3
    – Anastasia
    Commented May 7, 2015 at 5:48
  • 1
    @Anastasia: Then make results be an OrderedDict. Change results[key].append(value) to results.setdefault(key, []).append(value). Commented May 7, 2015 at 5:55
  • Also, the names of variables are words, they don't end with a number, so I can't simply re-order the file based on the numerical value.
    – Anastasia
    Commented May 7, 2015 at 5:56
  • No need to guard key = key.split('_')[0] with if '_' in key: because "nounderscore" == "nounderscore".split('_')[0]. Commented May 7, 2015 at 5:57
  • @Anastasia: No need to sort the result. collections.OrderedDict preserves the insertion order of keys. Commented May 7, 2015 at 5:58
0

I have written a python program that iterates over the file only once, reads all the important data into a dict, and then writes the dict into the output file.

#!/usr/bin/env python3
import collections

output = collections.OrderedDict()

with open(inputfile, 'r') as infile:
    for line in infile:
        dat, tmp = line.strip().split(',')
        if '_val' in tmp:
            key, idxstr = tmp.split('_val')
            idx = int(idxstr)
        else:
            key = tmp
            idx = 0
        output.setdefault(key, ["", "", ""])[idx] = dat

with open(outoutfile, 'w') as outfile:
    for var in output:
        v = output[var]
        outfile.write('{} {}\n'.format(var, ' '.join(v)))

Update: modified according to comments

1
  • Don't use naked excepts. Use except ValueError. No need to specify 'r' mode, since this is the default for open(). Use dict.setdefault() to assign to keys that may have missing values. Commented May 7, 2015 at 6:16

Not the answer you're looking for? Browse other questions tagged or ask your own question.