5
\$\begingroup\$

I have two csv files, pricat.csv which contains objects I need to populate my DB with, and mapping.csv which specifies how the value in pricat.csv must be displayed in my DB, for ex: 'NW 17-18' in pricat.csv has to be 'Winter Collection 2017/2018' in my DB. Here the csvs, first row in both are the headers:

ean;supplier;brand;catalog_code;collection;season;article_structure_code;article_number;article_number_2;article_number_3;color_code;size_group_code;size_code;size_name;currency;price_buy_gross;price_buy_net;discount_rate;price_sell;material;target_area
8719245200978;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;38;38;EUR;;58.5;;139.95;Aviation;Woman Shoes
8719245200985;Rupesco BV;Via Vai;;NW 17-18;winter;10;15189-02;15189-02 Aviation Nero;Aviation;1;EU;39;39;EUR;;58.5;;139.95;Aviation;Woman Shoes

source;destination;source_type;destination_type
winter;Winter;season;season
summer;Summer;season;season
NW 17-18;Winter Collection 2017/2018;collection;collection
EU;European sizes;size_group_code;size_group
EU|36;European size 36;size_group_code|size_code;size
EU|37;European size 37;size_group_code|size_code;size
EU|38;European size 38;size_group_code|size_code;size
EU|39;European size 39;size_group_code|size_code;size
EU|40;European size 40;size_group_code|size_code;size
EU|41;European size 41;size_group_code|size_code;size
EU|42;European size 42;size_group_code|size_code;size
4;Boot;article_structure_code;article_structure
5;Sneaker;article_structure_code;article_structure
6;Slipper;article_structure_code;article_structure
7;Loafer;article_structure_code;article_structure
8;Mocassin;article_structure_code;article_structure
9;Sandal;article_structure_code;article_structure
10;Pump;article_structure_code;article_structure
1;Nero;color_code;color
2;Marrone;color_code;color
3;Brandy Nero;color_code;color
4;Indaco Nero;color_code;color
5;Fucile;color_code;color
6;Bosco Nero;color_code;color

In my models.py in Django I have three models: Catalog --> Article --> Variation the attributes of my models are manually named as mapping.csv specifies, for ex: Variation will not have a color_code attribute but color. To populate the DB I've created a custom Django command which reads the rows in pricat.csv and create istances like this:

x = Catalog.objects.get_or_create(brand=info[2], supplier=info[1], catalog_code=info[3],
                                  collection=map_dict[info[4]],
                                  season=map_dict[info[5]], size_group=map_dict[info[11]],
                                  currency=info[14], target_area=info[20])
y = Article.objects.get_or_create(article_structure=map_dict[info[6]],
                                  article_number=info[7], catalog=x[0])
z = Variation.objects.get_or_create(ean=info[0], article=y[0], size_code=info[12], color=map_col[info[10]],
                                    material=info[19], price_buy_gross=info[15], price_buy_net=info[16],
                                    discount_rate=info[17], price_sell=info[18], size=f'{map_dict[info[11]]} {info[12]}')

info is a list of all the value in a pricat.csv row and map_dict and map_col are two dictionaries I create with two func() from the mapping.csv:

def mapping(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[1:19]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            destination_type = info[3]
            map_dict[source] = destination
            map_dict[source_type] = destination_type
        return map_dict


def mapping_color(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[19:]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            destination_type = info[3]
            map_dict[source] = destination
            map_dict[source_type] = destination_type
        return map_dict

map_dict = mapping('mapping.csv')
map_col = mapping_color('mapping.csv')

I had to create two dict because a single one would have duplicate keys.

The code works fine and the DB is populated as intended, but I feel the way I did the mapping is bad practice, also both my command and funcs relies on indeces so the values in my csvs have to be in that specific order to work. I would greatly appreciate any suggestion on how to improve my code or accomplish this task, I hope my explanation is clear.

EDIT:

class Catalog(models.Model):

    brand = models.CharField(max_length=255)
    supplier = models.CharField(max_length=255)
    catalog_code = models.CharField(max_length=255, default=1, blank=True)
    collection = models.CharField(max_length=255)
    season = models.CharField(max_length=255)
    size_group = models.CharField(max_length=2)
    currency = models.CharField(max_length=3)
    target_area = models.CharField(max_length=255)

    def __str__(self):
        return self.brand

    def get_articles(self):
        return Article.objects.filter(catalog=self.pk)


class Article(models.Model):

    article_structure = models.CharField(max_length=255)
    article_number = models.CharField(max_length=255)
    catalog = models.ForeignKey(Catalog, on_delete=models.CASCADE)

    def __str__(self):
        return f'{self.article_number} | {self.article_structure}'


class Variation(models.Model):

    ean = models.CharField(max_length=255)
    article = models.ForeignKey(Article, on_delete=models.CASCADE)
    size_code = models.IntegerField()
    size = models.CharField(max_length=255, default=0)
    color = models.CharField(max_length=255)
    material = models.CharField(max_length=255)
    price_buy_gross = models.CharField(max_length=255)
    price_buy_net = models.FloatField()
    discount_rate = models.CharField(max_length=255, default=0)
    price_sell = models.FloatField()

    def __str__(self):
        return f'Ean: {self.ean}, article: {self.article}'

I've created a new mapping()

def mapping(map_file):
    with open(map_file, 'r') as f:
        f = [l.strip('\n') for l in f]
        map_dict = {}
        for l in f[1:]:
            info = l.strip().split(';')
            source = info[0]
            destination = info[1]
            source_type = info[2]
            child_dict = {source: destination}
            map_dict[source_type] = map_dict.get(source_type, {source: destination})
            map_dict[source_type].update(child_dict)
        return map_dict

It returns a nested dict, I'm trying to finda solution using this single nested dict instead of 2 dicts like before.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Please show more of your code, particularly the model classes for Catalog --> Article --> Variation. \$\endgroup\$
    – Reinderien
    Commented Nov 21, 2020 at 17:13
  • \$\begingroup\$ classes added @Reinderien \$\endgroup\$ Commented Nov 21, 2020 at 17:24

2 Answers 2

3
\$\begingroup\$

You can use the built-in csv.DictReader to easily create dictionaries from CSV files. How about this?

import csv

def create_mapping(map_file):
    with open(map_file) as csvfile:
        reader = csv.DictReader(csvfile, delimiter=';')
        mapping = {row['source']: row['destination'] 
                   for row in reader 
                   if row['source_type'] != 'color_code'}
    return mapping

map_dict = create_mapping('mapping.csv')

We are using dictionary comprehension to create the dictionary. You can do something similar for colors, then you want to have all the rows where source_type equals color_code (so == instead of !=). But perhaps it is a better idea put the color mappings into a different file. Furthermore, if you process the pricat.csv in a similar fashion:

with open('pricat.csv') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=';')
    for row in reader:
        # process row 

You'll be able to use the rows as dictionaries:

{'ean': '8719245200985',
 'supplier': 'Rupesco BV',
 'brand': 'Via Vai',
 'catalog_code': '',
 'collection': 'NW 17-18',
 'season': 'winter',
 'article_structure_code': '10',
 'article_number': '15189-02',
 'article_number_2': '15189-02 Aviation Nero',
 'article_number_3': 'Aviation',
 'color_code': '1',
 'size_group_code': 'EU',
 'size_code': '39',
 'size_name': '39',
 'currency': 'EUR',
 'price_buy_gross': '',
 'price_buy_net': '58.5',
 'discount_rate': '',
 'price_sell': '139.95',
 'material': 'Aviation',
 'target_area': 'Woman Shoes'}

So you can do something like:

y = Article.objects.get_or_create(article_structure=map_dict[row['article_structure_code']],
                                  article_number=row['article_number'], catalog=x[0])

This can still be refactored a bit, but now you are no longer dependent on the column numbers.

\$\endgroup\$
1
  • \$\begingroup\$ Hi! I changed my mapping() (let me know what you think about it) but your use of csv.DictReader gave me a great insight and I think I'm going to come up with a very nice solution soon, if it works I'll accept your answer. \$\endgroup\$ Commented Nov 22, 2020 at 16:30
0
\$\begingroup\$
class Command(BaseCommand):
    help = 'Create a catalog, accept csv as argument'

    def add_arguments(self, parser):
        parser.add_argument('file', nargs='+', type=str)
        parser.add_argument('map', nargs='+', type=str)

    def handle(self, *args, **options):

        map_dict = mapping(options['map'][0])

        with open(options['file'][0], 'r') as f:
            reader = csv.DictReader(f, delimiter=';')
            for row in reader:

                x = Catalog.objects.get_or_create(brand=row['brand'], supplier=row['supplier'],
                                                  catalog_code=row['catalog_code'],
                                                  collection=map_dict['collection'][row['collection']],
                                                  season=map_dict['season'][row['season']],
                                                  size_group=map_dict['size_group_code'][row['size_group_code']],
                                                  currency=row['currency'], target_area=row['target_area'])
                if x[1]:
                    logger_catalog.info(f'Created Catalog instance {x[0]}')
                y = Article.objects.get_or_create(article_structure=map_dict['article_structure_code'][row['article_structure_code']],
                                                  article_number=row['article_number'], catalog=x[0])
                if y[1]:
                    logger_catalog.info(f'Created Article instance {y[0]}')
                z = Variation.objects.get_or_create(ean=row['ean'], article=y[0], size_code=row['size_code'],
                                                    color=map_dict['color_code'][row['color_code']],
                                                    material=row['material'], price_buy_gross=row['price_buy_gross'],
                                                    price_buy_net=row['price_buy_net'],
                                                    discount_rate=row['discount_rate'], price_sell=row['price_sell'],
                                                    size=map_dict['size_group_code|size_code'][f"{row['size_group_code']}|{row['size_code']}"])
                if z[1]:
                    logger_catalog.info(f'Created Variation instance {z[0]}')

Finally I remake my Command, now it's indipendent from indeces so it will correctly populate the database even if the colums in the csv are in a different order.

My mapping() func (see question) returns a nested dict, the keys of the parent dict are the columns names that need to be mapped and the values are dicts with this structure:
{value_presented_in_csv: how_value_should_be_presented_in_DB}.

In my Command I iterate through each row of pricat.csv turning rows in dicts {colum_name: value_presented_in_csv}, if the data don't need to be mapped I get the value from my row dict like brand=row['brand'], if the data need to be mapped I get the value from my nested dict map_dict like this map_dict[column_name][value_presented_in_csv] (this gives me the value of the child dict that is how_value_should_be_presented_in_DB).

It is better because doesn't relies on indeces no more, my first implementation works correctly only if the columns in pricat.csv are in that precise order; with this new implementation the columns can be in any order and the DB would still be populate correctly.

\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.