5
\$\begingroup\$

The whole project is quite large, involving parsing the CSV-files, validating them, exporting them to a database and such, but my code in question involves subset of the project: converting multiple excel files. The converter method can be seen below:

import glob
from openpyxl import load_workbook
import csv
import codecs
from zipfile import BadZipFile


def convert_excel(data, meta, meta_additional, sheet_number):
    convert_xlsx_to_csv(data, sheet_number)
    convert_xlsx_to_csv(meta, sheet_number)
    convert_xlsx_to_csv(meta_additional, sheet_number)


def convert_xlsx_to_csv(directory, sheet_number):
    excel_files = glob.glob(directory + '*.xlsx')
    for excel_file in excel_files:
        try:
            workbook = load_workbook(excel_file)
            sheets = workbook.get_sheet_names()
            sheet = sheets[sheet_number]
            out = excel_file.replace('.xlsx', '.csv')
            write_as_csv(out, workbook, sheet)
        # this exception will only occur if the excel file is corrupted.
        except BadZipFile:
            print("error in file: " + excel_file + " -- ignoring file")


def write_as_csv(output, workbook, sheet):
    # utf-8 is used, because the parsing parts rely on that and the database
    # itself is encoded as utf-8
    with codecs.open(output, 'w', 'utf-8') as converted_file:
        wr = csv.writer(converted_file, delimiter='\t')
        for row in workbook[sheet].rows:
            wr.writerow([cell.value for cell in row])

if __name__ == '__main__':
    convert_excel('testdir1/', 'testdir2/', 'testdir3/', 0)

And here is how it's called from another module (the 'main'-module):

import convert_excel
convert_excel.convert_excel(**params)

where sheet number and directories are read from a config file that looks like this:

[parseparams]
data=path/to/dir/
meta=path/to/meta/
meta_additional=path/to/meta_additional/
sheet_number=0

and the config is parsed like this:

from configparser import ConfigParser


def config(filename='config/cfg.ini', section):
    """
    parses the directory information
    :param filename: path to the config file
    :param section: which portion of the config will be parsed;
    'parseparams', for parsing, 'dbase' for database config. 
    :return: returns config information as a dictionary
    """
    parser = ConfigParser()
    parser.read(filename)
    params = {}

    if parser.has_section(section):
        parameters = parser.items(section)
        for param in parameters:
            params[param[0]] = param[1]
    else:
        raise Exception("no config was found!")
    return params

I decided to make calling the module a one-liner (and thus add one method, convert_excel() to the converter module), to keep the main module as short and readable as possible.

\$\endgroup\$

0

Browse other questions tagged or ask your own question.