16

My question is related to file-input in Python, using open(). I have a text file mytext.txt with 3 lines. I am trying to do two things with this file: print the lines, and print the number of lines.

I tried the following code:

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
for line in input_file:
    count_lines += 1
print 'number of lines:', count_lines

Result: it prints the 3 lines correctly, but prints "number of lines: 0" (instead of 3)


I found two ways to solve it, and get it to print 3:

1) I use one loop instead of two

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
    count_lines += 1
print 'number of lines:', count_lines

2) after the first loop, I define input_file again

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
input_file = open('mytext.txt', 'r')
for line in input_file:
    count_lines += 1
print 'number of lines:', count_lines

To me, it seems like the definition input_file = ... is valid for only one looping, as if it was deleted after I use it for a loop. But I don't understand why, probably it is not 100% clear to me yet, how variable = open(filename) treated in Python.

By the way, I see that in this case it is better to use only one loop. However, I feel I have to get this question clear, since there might be cases when I can/must make use of it.

3
  • If you want to process lines, why not use readlines()
    – tMC
    Commented Jul 30, 2012 at 17:20
  • 3
    readlines will make your machine thrash and possibly crash if you suddenly use it with a large file. It's usually better to read one line at a time. Commented Jul 30, 2012 at 17:32
  • you can also use readline(), which reads one line at a time. Or to avoid thrashing/crashing with readlines, use the optional size hint parameter: readlines(size hint). This will return the number of entire lines that can fit into a buffer of "size hint."
    – ncultra
    Commented Jul 30, 2012 at 20:51

4 Answers 4

26

The file handle is an iterator. After iterating over the file, the pointer will be positioned at EOF (end of file) and the iterator will raise StopIteration which exits the loop. If you try to use an iterator for a file where the pointer is at EOF it will just raise StopIteration and exit: that is why it counts zero in the second loop. You can rewind the file pointer with input_file.seek(0) without reopening it.

That said, counting lines in the same loop is more I/O efficient, otherwise you have to read the whole file from disk a second time just to count the lines. This is a very common pattern:

with open('filename.ext') as input_file:
    for i, line in enumerate(input_file):
        print line,
print "{0} line(s) printed".format(i+1)

In Python 2.5, the file object has been equipped with __enter__ and __exit__ to address the with statement interface. This is syntactic sugar for something like:

input_file = open('filename.txt')
try:
    for i, line in enumerate(input_file):
        print line,
finally:
    input_file.close()
print "{0} line(s) printed".format(i+1)

I think cPython will close file handles when they get garbage collected, but I'm not sure this holds true for every implementation - IMHO it is better practice to explicitly close resource handles.

1
  • Now I got it. Thank you a lot! Btw, so I guess it is not something specific to Python, but probably most languages work like this. It is good to know. Thanks again.
    – user1563285
    Commented Jul 30, 2012 at 17:32
5

Is there some reason you could not use the following:

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
    count_lines += 1
print 'number of lines:', count_lines

The thing returned by open is a file object. File objects keep track of their own internal position as you loop over them, so in order to do what you tried first, you would have to rewind it to the beginning manually, it won't do it by itself.

1
  • As I mentioned in my post, I know this is a better way. The reason of my post is that I would like to understand the behavior of Python for which the first option does not work.
    – user1563285
    Commented Jul 30, 2012 at 17:23
2

Try adding a input_file.seek(0) between the two loops. This will rewind the file back to the beginning, so you can loop over it again.

0
0

I thin the module fileinput is you want.

Here is the link

if __name__ == "__main__":
for line in fileinput.input():
    if fileinput.isfirstline():
        print("current file: %s" % fileinput.filename())

    print("line number: %d, current file number: %d" % 
          (fileinput.lineno(), fileinput.filelineno()))