Python iterate over multiple files

Question

I have a series of files that are in the following format:

file_1991.xlsx
file_1992.xlsx
# there are some gaps in the file numbering sequence
file_1995.xlsx
file_1996.xlsx
file_1997.xlsx

For each file I want to do something like:

import pandas as pd
data_1995 = pd.read_excel(open(directory + 'file_1995', 'rb'), sheetname = 'Sheet1')

do some work on the data, and save it as another file:

output_1995 = pd.ExcelWriter('output_1995.xlsx')
data_1995.to_excel(output_1995,'Sheet1')

Instead of doing all these for every single file, how can I iterate through multiple files and repeat the same operation across multiple files? In other words, I would like to iterate over all the files (they mostly following a numerical sequence in their names, but there are some gaps in the sequence).

Thanks for the help in advance.

umutto · Accepted Answer · 2017-03-04 07:51:21Z

You can use os.listdir or glob module to list all files in a directory.

With os.listdir, you can use fnmatch to filter files like this (can use a regex too);

import fnmatch
import os

for file in os.listdir('my_directory'):
    if fnmatch.fnmatch(file, '*.xlsx'):
        pd.read_excel(open(file, 'rb'), sheetname = 'Sheet1')
        """ Do your thing to file """

Or with glob module (which is a shortcut for the fnmatch + listdir) you can do the same like this (or with a regex):

import glob
for file in glob.glob("/my_directory/*.xlsx"):
    pd.read_excel(open(file, 'rb'), sheetname = 'Sheet1')
    """ Do your thing to file """

John Zwinck · Accepted Answer · 2017-02-28 03:15:31Z

2

You should use Python's glob module: https://docs.python.org/3/library/glob.html

For example:

import glob
for path in glob.iglob(directory + "file_*.xlsx"):
    pd.read_excel(path)
    # ...

answered Feb 28, 2017 at 3:15

John Zwinck

246k41 gold badges332 silver badges448 bronze badges

Thanks! Can I use the glob module to assign variable names? For instance, I need to read the file by assigning something like this: data_1995 = pd.read_excel(open('file_1995.xlsx'), sheetname = 'Sheet1')
– kfp_ny
Commented Feb 28, 2017 at 3:34
@kfp_ny Why would you do that? You need to rethink your program.
– Ali Gajani
Commented Feb 28, 2017 at 3:39
1

@kfp_ny no you can not, but if you want to keep the files you can use a dictionary and name the key values after the filename if you want to make a relation. But I would recommend not to do that and find a way to keep it dynamic if you can, as every file will be loaded to memory and you'll run into the same problem otherwise.
– umutto
Commented Feb 28, 2017 at 3:52
@AliGajani Right. I think I got things mixed up. I'll try it again. Thanks!
– kfp_ny
Commented Feb 28, 2017 at 3:58
1

@umutto Thanks! I'll try to sort it out.
– kfp_ny
Commented Feb 28, 2017 at 3:58

Add a comment |

Ali Gajani · Accepted Answer · 2017-02-28 03:24:48Z

2

I would recommend glob.

Doing glob.glob('file_*') returns a list which you can iterate on and do work.

Doing glob.iglob('file_*') returns a generator object which is an iterator.

The first one will give you something like:

['file_1991.xlsx','file_1992.xlsx','file_1995.xlsx','file_1996.xlsx']

answered Feb 28, 2017 at 3:24

Ali Gajani

15k12 gold badges61 silver badges105 bronze badges

Add a comment |

gboffi · Accepted Answer · 2017-03-04 08:37:48Z

2

If you know how your file names can be constructed, you might try to open a file with the 'r' attribute, so that open(..., 'r') fails if the file is non existent.

yearly_data = {}

for year in range(1990,2018):
    try:
        f = open('file_%4.4d.xlsx'%year, 'r')
    except FileNotFoundError:
        continue # to the next year
    yearly_data[year] = ...
    f.close()

answered Mar 4, 2017 at 8:37

gboffi

24.3k9 gold badges59 silver badges92 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Python iterate over multiple files

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
loops
traversal
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonloopstraversal or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
loops
traversal
or ask your own question.