22

I am using get_or_create to insert objects to database but the problem is that doing 1000 at once takes too long time.

I tried bulk_create but it doesn't provide functionality I need (creates duplicates, ignores unique value, doesn't trigger post_save signals I need).

Is it even possible to do get_or_create in bulk via customized sql query?

Here is my example code:

related_data = json.loads(urllib2.urlopen(final_url).read())

for item in related_data:

    kw = item['keyword']
    e, c = KW.objects.get_or_create(KWuser=kw, author=author)
    e.project.add(id)
    #Add m2m to parent project

related_data cotains 1000 rows looking like this:

[{"cmp":0,"ams":3350000,"cpc":0.71,"keyword":"apple."},
{"cmp":0.01,"ams":3350000,"cpc":1.54,"keyword":"apple -10810"}......]

KW model also sends signal I use to create another parent model:

@receiver(post_save, sender=KW)
def grepw(sender, **kwargs):
    if kwargs.get('created', False):
        id = kwargs['instance'].id
        kww = kwargs['instance'].KWuser
        # KeyO 
        a, b = KeyO.objects.get_or_create(defaults={'keyword': kww}, keyword__iexact=kww)
        KW.objects.filter(id=id).update(KWF=a.id)

This works but as you can imagine doing thousands of rows at once takes long time and even crashes my tiny server, what bulk options do I have?

3 Answers 3

11

As of Django 2.2, bulk_create has an ignore_conflicts flag. Per the docs:

On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values

5

This post may be of use to you:

stackoverflow.com/questions/3395236/aggregating-saves-in-django

Note that the answer recommends using the commit_on_success decorator which is deprecated. It is replaced by the transaction.atomic decorator. Documentation is here:

transactions

from django.db import transaction

@transaction.atomic
def lot_of_saves(queryset):
    for item in queryset:
        modify_item(item)
        item.save()
2

If I understand correctly, "get_or_create" means SELECT or INSERT on the Postgres side.

You have a table with a UNIQUE constraint or index and a large number of rows to either INSERT (if not yet there) and get the newly create ID or otherwise SELECT the ID of the existing row. Not as simple as it may seem on the outside. With concurrent write load, the matter is even more complicated.

And there are various parameters that need to be defined (how to handle conflicts exactly):

Not the answer you're looking for? Browse other questions tagged or ask your own question.