2
def read_large_file(file_handler, block_size=10000):
    block = []
    for line in file_handler:
        block.append(line)
        if len(block) == block_size:
            yield block
            block = []

    # don't forget to yield the last block
    if block:
        yield block

with open(path) as file_handler:
    for block in read_large_file(file_handler):
        print(block)

I am reading this piece of code above written by another. For this line:

if len(block) == block_size:
   yield block
   block = []

Does the block=[] have a chance to be executed? I had thought yield is like a return statement. Also, why is there an if block checking?

4
  • Yes. You can say yield is a way to pause execution and resume it again from the next line. Place a print statement to check if it gets executed. Commented Feb 10, 2020 at 7:29
  • 1
    By adding yield to your function, the function becomes a generator function. For details you can check here
    – abc
    Commented Feb 10, 2020 at 7:30
  • BTW, the function will only yield if the size is exactly block_size. It might be better to use if len(block) >= block_size:.
    – Matthias
    Commented Feb 10, 2020 at 7:43
  • Does stackoverflow.com/questions/231767/… answer the question? Commented Feb 10, 2020 at 7:50

2 Answers 2

1

yes, it will be executed when the function resumes on the next iteration. Remember, yield is like a pause button for a generator, and generators are usually used within a loop. The yield is sort of returning a value (i say "sort of", because yield is not the same as return), but when the generator is next accessed, it will pick up at that same spot. The purpose of block = [] is to reset the block to an empty list before the next go around (it might be faster to use block.clear() instead).

This code is building up blocks from a file, and handing them back to the caller as soon as they are sufficiently large. The last if block is to return the last bit, if there is some leftover that didn't fit in a complete block.

1

yield produces the next output of the generator and then allows it to continue generating values.

Here, lines are read in to a block (a list of lines). Whenever a block is populated with enough lines it's yielded as the next value from the generator, and then the block is re-initialized to an empty list, and the reading can continue.

Not the answer you're looking for? Browse other questions tagged or ask your own question.