2

I am processing a text file with an irregular structure that consists of a header and of data in different sections. What I aim to do is walk through a list and jump to the next section once a certain character is encountered. I made a simple example below. What is the elegant way of dealing with this problem?

lines = ['a','b','c','$', 1, 2, 3]

for line in lines:
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

# Here, I start again, but I would like to continue with the actual
# state of the iterator, in order to only read the remaining elements.
for line in lines:
    print("Reading numbers")
4
  • 3
    can't you just call lines.index('$') to get the separator position?
    – EdChum
    Commented Mar 16, 2018 at 13:11
  • The use case is more complex than that, I am just searching for a general way of doing this.
    – Chiel
    Commented Mar 16, 2018 at 13:34
  • it would be productive if you post your real problem rather than this trivial one then
    – EdChum
    Commented Mar 16, 2018 at 13:35
  • The real problem has first a header, seperator and then an set of arbitrary sections with case specific number of lines.
    – Chiel
    Commented Mar 16, 2018 at 13:42

4 Answers 4

3

You actually can have one iterator for both loops by creating your line iterator outside the for loop with the builtin function iter. This way it will be partially exhausted in the first loop and reusable in the next loop.

lines = ['a','b','c','$', 1, 2, 3]

iter_lines = iter(lines) # This creates and iterator on lines

for line in iter_lines :
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

for line in iter_lines:
    print("Reading numbers")

The above prints this result.

Reading letters
Reading letters
Reading letters
FOUND END OF HEADER
Reading numbers
Reading numbers
Reading numbers
2
  • This is the most python answer. +1 Commented Mar 16, 2018 at 13:42
  • This is exactly what I was looking for
    – Chiel
    Commented Mar 16, 2018 at 13:43
1

You could use enumerate to keep track of where you are in the iteration:

lines = ['a','b','c','$', 1, 2, 3]

for i, line in enumerate(lines):
    if line == '$':
        print("FOUND END OF HEADER")
        break
    else:
        print("Reading letters")

print(lines[i+1:]) #prints [1,2,3]

But, unless you actually need to process the header portion, the idea of @EdChum to simply use index is probably better.

0

A simpler way and maybe more pythonic:

lines = ['a','b','c','$', 1, 2, 3]
print([i for i in lines[lines.index('$')+1:]])
# [1, 2, 3]

If you want to read each element after $ to different variables, try this:

lines = ['a','b','c','$', 1, 2, 3]
a, b, c = [i for i in lines[lines.index('$')+1:]]
print(a, b, c)
# 1 2 3

Or if you are unaware of how many elements follow $, you could do something like this:

lines = ['a','b','c','$', 1, 2, 3, 4, 5, 6]
a, *b = [i for i in lines[lines.index('$')+1:]]
print(a, *b)
# 1 2 3 4 5 6
0

If you have more that one kind of separators, the most generic solution would be to built a mini-state machine to parse your data:

def state0(line):
  pass # processing function for state0

def state1(line):
  pass # processing function for state1

# and so on...

states = (state0, state1, ...)     # tuple grouping all processing functions
separators = {'$':1, '#':2, ...}   # linking separators and states
state = 0                          # initial state

for line in text:
  if line in separators:
    print('Found separator', line)
    state = separators[line]       # change state
  else:
    states[state](line)            # process line with associated function

This solution is able to correctly process arbitrary number of separators in arbitrary order with arbitrary number of repetitions. The only constraint is that a given separator is always followed by the same kind of data, that can be process by its associated function.

Not the answer you're looking for? Browse other questions tagged or ask your own question.