The whole project is quite large, involving parsing the CSV-files, validating them, exporting them to a database and such, but my code in question involves subset of the project: converting multiple excel files. The converter method can be seen below:
import glob
from openpyxl import load_workbook
import csv
import codecs
from zipfile import BadZipFile
def convert_excel(data, meta, meta_additional, sheet_number):
convert_xlsx_to_csv(data, sheet_number)
convert_xlsx_to_csv(meta, sheet_number)
convert_xlsx_to_csv(meta_additional, sheet_number)
def convert_xlsx_to_csv(directory, sheet_number):
excel_files = glob.glob(directory + '*.xlsx')
for excel_file in excel_files:
try:
workbook = load_workbook(excel_file)
sheets = workbook.get_sheet_names()
sheet = sheets[sheet_number]
out = excel_file.replace('.xlsx', '.csv')
write_as_csv(out, workbook, sheet)
# this exception will only occur if the excel file is corrupted.
except BadZipFile:
print("error in file: " + excel_file + " -- ignoring file")
def write_as_csv(output, workbook, sheet):
# utf-8 is used, because the parsing parts rely on that and the database
# itself is encoded as utf-8
with codecs.open(output, 'w', 'utf-8') as converted_file:
wr = csv.writer(converted_file, delimiter='\t')
for row in workbook[sheet].rows:
wr.writerow([cell.value for cell in row])
if __name__ == '__main__':
convert_excel('testdir1/', 'testdir2/', 'testdir3/', 0)
And here is how it's called from another module (the 'main'-module):
import convert_excel
convert_excel.convert_excel(**params)
where sheet number and directories are read from a config file that looks like this:
[parseparams]
data=path/to/dir/
meta=path/to/meta/
meta_additional=path/to/meta_additional/
sheet_number=0
and the config is parsed like this:
from configparser import ConfigParser
def config(filename='config/cfg.ini', section):
"""
parses the directory information
:param filename: path to the config file
:param section: which portion of the config will be parsed;
'parseparams', for parsing, 'dbase' for database config.
:return: returns config information as a dictionary
"""
parser = ConfigParser()
parser.read(filename)
params = {}
if parser.has_section(section):
parameters = parser.items(section)
for param in parameters:
params[param[0]] = param[1]
else:
raise Exception("no config was found!")
return params
I decided to make calling the module a one-liner (and thus add one method, convert_excel()
to the converter module), to keep the main module as short and readable as possible.