Mapping column names and values of a csv using another csv

Question

I have two csv files, pricat.csv which contains objects I need to populate my DB with, and mapping.csv which specifies how the value in pricat.csv must be displayed in my DB, for ex: 'NW 17-18' in pricat.csv has to be 'Winter Collection 2017/2018' in my DB. Here the csvs, first row in both are the headers:

ean;supplier;brand;catalog_code;collection;season;article_structure_code;article_number;article_number_2;article_number_3;color_code;size_group_code;size_code;size_name;currency;price_buy_gross;price_buy_net;discount_rate;price_sell;material;target_area
8719245200978;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;38;38;EUR;;58.5;;139.95;Aviation;Woman Shoes
8719245200985;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;39;39;EUR;;58.5;;139.95;Aviation;Woman Shoes

source;destination;source_type;destination_type
winter;Winter;season;season
summer;Summer;season;season
NW 17-18;Winter Collection 2017/2018;collection;collection
EU;European sizes;size_group_code;size_group
EU|36;European size 36;size_group_code|size_code;size
EU|37;European size 37;size_group_code|size_code;size
EU|38;European size 38;size_group_code|size_code;size
EU|39;European size 39;size_group_code|size_code;size
EU|40;European size 40;size_group_code|size_code;size
EU|41;European size 41;size_group_code|size_code;size
EU|42;European size 42;size_group_code|size_code;size
4;Boot;article_structure_code;article_structure
5;Sneaker;article_structure_code;article_structure
6;Slipper;article_structure_code;article_structure
7;Loafer;article_structure_code;article_structure
8;Mocassin;article_structure_code;article_structure
9;Sandal;article_structure_code;article_structure
10;Pump;article_structure_code;article_structure
1;Nero;color_code;color
2;Marrone;color_code;color
3;Brandy Nero;color_code;color
4;Indaco Nero;color_code;color
5;Fucile;color_code;color
6;Bosco Nero;color_code;color

In my models.py in Django I have three models: Catalog --> Article --> Variation the attributes of my models are manually named as mapping.csv specifies, for ex: Variation will not have a color_code attribute but color. To populate the DB I've created a custom Django command which reads the rows in pricat.csv and create istances like this:

x = Catalog.objects.get_or_create(brand=info[2], supplier=info[1], catalog_code=info[3],
                                  collection=map_dict[info[4]],
                                  season=map_dict[info[5]], size_group=map_dict[info[11]],
                                  currency=info[14], target_area=info[20])
y = Article.objects.get_or_create(article_structure=map_dict[info[6]],
                                  article_number=info[7], catalog=x[0])
z = Variation.objects.get_or_create(ean=info[0], article=y[0], size_code=info[12], color=map_col[info[10]],
                                    material=info[19], price_buy_gross=info[15], price_buy_net=info[16],
                                    discount_rate=info[17], price_sell=info[18], size=f'{map_dict[info[11]]} {info[12]}')

info is a list of all the value in a pricat.csv row and map_dict and map_col are two dictionaries I create with two func() from the mapping.csv:

def mapping(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[1:19]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            destination_type = info[3]
            map_dict[source] = destination
            map_dict[source_type] = destination_type
        return map_dict


def mapping_color(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[19:]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            destination_type = info[3]
            map_dict[source] = destination
            map_dict[source_type] = destination_type
        return map_dict

map_dict = mapping('mapping.csv')
map_col = mapping_color('mapping.csv')

I had to create two dict because a single one would have duplicate keys.

The code works fine and the DB is populated as intended, but I feel the way I did the mapping is bad practice, also both my command and funcs relies on indeces so the values in my csvs have to be in that specific order to work. I would greatly appreciate any suggestion on how to improve my code or accomplish this task, I hope my explanation is clear.

EDIT:

class Catalog(models.Model):

    brand = models.CharField(max_length=255)
    supplier = models.CharField(max_length=255)
    catalog_code = models.CharField(max_length=255, default=1, blank=True)
    collection = models.CharField(max_length=255)
    season = models.CharField(max_length=255)
    size_group = models.CharField(max_length=2)
    currency = models.CharField(max_length=3)
    target_area = models.CharField(max_length=255)

    def __str__(self):
        return self.brand

    def get_articles(self):
        return Article.objects.filter(catalog=self.pk)


class Article(models.Model):

    article_structure = models.CharField(max_length=255)
    article_number = models.CharField(max_length=255)
    catalog = models.ForeignKey(Catalog, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.article_number} | {self.article_structure}'


class Variation(models.Model):

    ean = models.CharField(max_length=255)
    article = models.ForeignKey(Article, on_delete=models.CASCADE)
    size_code = models.IntegerField()
    size = models.CharField(max_length=255, default=0)
    color = models.CharField(max_length=255)
    material = models.CharField(max_length=255)
    price_buy_gross = models.CharField(max_length=255)
    price_buy_net = models.FloatField()
    discount_rate = models.CharField(max_length=255, default=0)
    price_sell = models.FloatField()

    def __str__(self):
        return f'Ean: {self.ean}, article: {self.article}'

I've created a new mapping()

def mapping(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[1:]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            child_dict = {source: destination}
            map_dict[source_type] = map_dict.get(source_type, {source: destination})
            map_dict[source_type].update(child_dict)
        return map_dict

It returns a nested dict, I'm trying to finda solution using this single nested dict instead of 2 dicts like before.

Please show more of your code, particularly the model classes for Catalog --> Article --> Variation. — Reinderien, Commented Nov 21, 2020 at 17:13

PythonSherpa · Accepted Answer · 2020-11-22 20:40:00Z

You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?

import csv

def create_mapping(map_file):
    with open(map_file) as csvfile:
        reader = csv.DictReader(csvfile, delimiter=';')
        mapping = {row['source']: row['destination'] 
                   for row in reader 
                   if row['source_type'] != 'color_code'}
    return mapping

map_dict = create_mapping('mapping.csv')

We are using dictionary comprehension to create the dictionary. You can do something similar for colors, then you want to have all the rows where source_type equals color_code (so == instead of !=). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv in a similar fashion:

with open('pricat.csv') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=';')
    for row in reader:
        # process row

You'll be able to use the rows as dictionaries:

{'ean': '8719245200985',
 'supplier': 'Rupesco BV',
 'brand': 'Via Vai',
 'catalog_code': '',
 'collection': 'NW 17-18',
 'season': 'winter',
 'article_structure_code': '10',
 'article_number': '15189-02',
 'article_number_2': '15189-02 Aviation Nero',
 'article_number_3': 'Aviation',
 'color_code': '1',
 'size_group_code': 'EU',
 'size_code': '39',
 'size_name': '39',
 'currency': 'EUR',
 'price_buy_gross': '',
 'price_buy_net': '58.5',
 'discount_rate': '',
 'price_sell': '139.95',
 'material': 'Aviation',
 'target_area': 'Woman Shoes'}

So you can do something like:

y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']],
                                  article_number=row['article_number'], catalog=x[0])

This can still be refactored a bit, but now you are no longer dependent on the column numbers.

Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer. — Adamantoisetortoise, Commented Nov 22, 2020 at 16:30

Vogel612 · Accepted Answer · 2020-11-22 23:26:04Z

class Command(BaseCommand):
    help = 'Create a catalog, accept csv as argument'

    def add_arguments(self, parser):
        parser.add_argument('file', nargs='+', type=str)
        parser.add_argument('map', nargs='+', type=str)

    def handle(self, *args, **options):

        map_dict = mapping(options['map'][0])

        with open(options['file'][0], 'r') as f:
            reader = csv.DictReader(f, delimiter=';')
            for row in reader:

                x = Catalog.objects.get_or_create(brand=row['brand'], supplier=row['supplier'],
                                                  catalog_code=row['catalog_code'],
                                                  collection=map_dict['collection'][row['collection']],
                                                  season=map_dict['season'][row['season']],
                                                  size_group=map_dict['size_group_code'][row['size_group_code']],
                                                  currency=row['currency'], target_area=row['target_area'])
                if x[1]:
                    logger_catalog.info(f'Created Catalog instance {x[0]}')
                y = Article.objects.get_or_create(article_structure=map_dict['article_structure_code'][row['article_structure_code']],
                                                  article_number=row['article_number'], catalog=x[0])
                if y[1]:
                    logger_catalog.info(f'Created Article instance {y[0]}')
                z = Variation.objects.get_or_create(ean=row['ean'], article=y[0], size_code=row['size_code'],
                                                    color=map_dict['color_code'][row['color_code']],
                                                    material=row['material'], price_buy_gross=row['price_buy_gross'],
                                                    price_buy_net=row['price_buy_net'],
                                                    discount_rate=row['discount_rate'], price_sell=row['price_sell'],
                                                    size=map_dict['size_group_code|size_code'][f"{row['size_group_code']}|{row['size_code']}"])
                if z[1]:
                    logger_catalog.info(f'Created Variation instance {z[0]}')

Finally I remake my Command, now it's indipendent from indeces so it will correctly populate the database even if the colums in the csv are in a different order.

My mapping() func (see question) returns a nested dict, the keys of the parent dict are the columns names that need to be mapped and the values are dicts with this structure:
{value_presented_in_csv: how_value_should_be_presented_in_DB}.

In my Command I iterate through each row of pricat.csv turning rows in dicts {colum_name: value_presented_in_csv}, if the data don't need to be mapped I get the value from my row dict like brand=row['brand'], if the data need to be mapped I get the value from my nested dict map_dict like this map_dict[column_name][value_presented_in_csv] (this gives me the value of the child dict that is how_value_should_be_presented_in_DB).

It is better because doesn't relies on indeces no more, my first implementation works correctly only if the columns in pricat.csv are in that precise order; with this new implementation the columns can be in any order and the DB would still be populate correctly.

Stack Exchange Network

Mapping column names and values of a csv using another csv

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
csv
django
or ask your own question.

Hot Network Questions

Mapping column names and values of a csv using another csv

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythoncsvdjango or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
csv
django
or ask your own question.