22

Up until now, I've been storing my image filenames in a CharField and saving the actual file directly to S3. This was a fine solution for my own usage. I'd like to reconsider using an ImageField, since now there will be other users and file input validation would be appropriate.

I have a couple of questions that weren't exactly answered after reading the docs and the source code for FileField (which appears to be essentially ImageField minus the Pillow check and dimension field updating functionality).

1) Why use an ImageField at all? Or rather, why use a FileField? Sure, it's convenient for quick-and-easy forms and convenient for inserting to Django templates. But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?

2) How to write to the field file? If it is correct that the file can be read by instance.imagefield (or is it instance.imagefield.file?), if I want to write to it can I simply do the following?

@receiver(pre_save, sender=Image)
def pre_save_image(sender, instance, *args, **kwargs):
    instance.imagefield = process_image(instance.imagefield)

3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists? For example with my code right now I do this, how can it be done with ImageField? I want to do it at the model layer, because if I do repeated tries at the view layer then the pre_save processing would run again which is ghetto (even though it's unlikely that it'll have a second try ever in the lifetime of the service).

for i in range(tries):
    try:
        name = generate_random_name()
        media_storage.save(name + '.jpg', ContentFile(final_bytes))
        break
    except:
        pass

4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request? i.e. I want to know if a new image is incoming to be saved, or if there is no image (some other field in the object is being updated and the image itself remains unchanged).

3
  • 1
    This really should be four different questions you know. It's a wonder that you have not attracted any close votes! My suggestion is for you to close this question now. if you are unhappy with any of the answers you have got for a particlar point 1,2,3,4 please post another question with only that point. That way you will get better more specific answers and you get ask more specific questions.
    – e4c5
    Commented May 20, 2016 at 11:21
  • Thanks for your advice, your solution would work. I don't want to close this question just yet, still want to hear back from dkarchmer re: uploading directly to S3
    – davidtgq
    Commented May 20, 2016 at 11:33
  • So where did you go from here?
    – e4c5
    Commented May 27, 2016 at 4:14

3 Answers 3

27

I don't see any advantage of FileField or ImageField over what you are doing today. In fact, as I see it, the proper/modern/scalable way to deal with uploads is to have the client (browser) upload files directly to S3.

If done correctly (from a security stand point), this scheme allows you to scale in an incredible way without the need to add more computer power on your side. As an example, consider 100 people uploading a picture at the same time. Your server will need to receive all these data, only to upload it again to S3. On the other side, you can have a 1000 people upload at the same time, and I can assure you AWS can handle it. Your server only needs to handle the signing of the URL, which is a lot less work.

Take a look at fine-uploader, as a good technology to use to handle the efficient upload to s3 (loading in chunks, error checking, etc): http://docs.fineuploader.com/endpoint_handlers/amazon-s3.html. Google "django fineuploader" to find a sample application for Django.

In my case, I use a Model with a couple CharFields (bucket, key) plus a few other things specific to my application. My data flow is as follows:

  • Django services a page with the fine-uploader widget, configured based on my settings.
  • Fineuploader requests a signed URL from the django server (endpoint), and uses that to upload to S3 directly.
  • When the upload is complete, fineUploader makes another request to my server to register the completion of the upload, at which time, I create my object on the database. In this case, if the upload fails, I never create an object on the database.
  • On the AWS side, S3 triggers a Lambda function, which I use to create a thumbnail, and store it back to S3. So, I don't even use my own CPU (e.g. Celery) for resizing. So you see, not only can I have thousands of users uploading at the same time, but I can resize those thousand pictures in parallel, and for less than what an EC2 worker will cost me.
  • My Django Model is also used as a wrapper to manage the business logic (e.g. functions like get_original_url() and get_thumbnail_url()), so after the uploads, it is easy for my templates to get the signed read-onlly URLs.

In short, you can implement your own version of Fineuploader if you want, or use many of the alternative, but assuming you follow the recommended security best practices on the AWS side (e.g. create a special IAM with only write permission for the client, even if you are using signed URLs), this, IMO, is the best practice for dealing with uploads, especially if you are using S3 or similar to store these files.

Sorry if I am only really answering question 1, but questions 2 and 3 don't apply if you accept my answer for 1.

11
  • Can you provide more info in the AWS function that is creating your thumbs? Just asking as I'm currently using PIL to create a bunch of different image sizes before I push it to S3. BTW I also store each size in a different bucket.
    – WayBehind
    Commented May 20, 2016 at 4:05
  • 1
    There are several examples and packages out there, but here is the code I use: gist.github.com/dkarchmer/d68e20f6de36827d6c2f0f640bf151e1
    – dkarchmer
    Commented May 20, 2016 at 4:13
  • One of the reasons I go through EC2 is because I want to compress and optimize the image with mozjpeg (All it does is use Popen to pipe the image to mozjpeg and back). Is there something for this purpose with a JS library, or somehow to still do it if uploading directly to S3? Also, anything lacking just using CharFields compared to ImageField?
    – davidtgq
    Commented May 20, 2016 at 11:01
  • If I understood the docs correctly, ImageField is using a varchar in the database with a maximum length of 100. If that's the case, then would it be correct to say that the only difference between them is within Django, and I don't even need to makemigrations switching back and forth between ImageField, FileField, and CharField as long as the files are being uploaded correctly?
    – davidtgq
    Commented May 20, 2016 at 11:16
  • 1
    @david-tan You can do almost anything on JS. Check out imagemin-mozjpeg. But Lambda now supports python so you can use your existing function. But even if you wanted to do it on your own EC2, you will still be better off doing it on a background worker, and not your front end server. You can configure S3 to trigger an SQS message that your worker can pull. And yes, FileField, ImageField (django-storage) are just wrappers on top of a char field. You can write your own custom field if you want to be more pythonic
    – dkarchmer
    Commented May 20, 2016 at 17:25
8

1) Why use an ImageField at all? Or rather, why use a FileField?

It's convenient for quick-and-easy forms and convenient for inserting to Django templates.

But are there any substantial reasons, eg. Is it evidently secured against exploits and malicious uploads?

Yes. I daresay your own code probably does it too, but for a newby using the FileField will probably ensure that your important system files are not getting overwritten by a malicious upload.

2) How to write to the field file?

In your situation you would need to use a special storage backend that makes it possible to write directly to the Amazon S3. As you know, the storage backend for FileFile and ImageField are plugable. Here is one example plugin: `http://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html

There is sample code which demonstrates how it can be written to. So I wll not go into that.`

3) How to try saving with a specific filename, then try again with a new filename if that randomly generated filename already exists?

ImageField and FileField takes care of this for you automatically. It will create a new filename if the old one exists. The code in my answer here did that automatically when I called it over and over again. here are some sample filenames produces (input being bada.png)

"4", "media/bada.png"
"5", "media/bada_aH0gV7t.png"
"7", "media/bada_XkzthgK.png"
"8", "media/bada_YzZuwDi.png"
"9", "media/bada_wpkasI3.png"

4) In the models.py pre_save and post_save signals and in the actual model's save(), how can I tell if a file came in with the request?

Your instance.pk will be None

If this is a modification to an existing file the PK will be set.


If this is a new image upload in the pre_save

6
  • 2) That is what I'm currently using, but I mean in the signal, will trying to change instance.image work? Is that even the correct path, or is it something like instance.image.file? 3) But S3 defaults to overwriting if I'm not mistaken. With my own implementation, I simply use another instance of the storage class with write-only permission, but this doesn't seem to be workable with ImageField. 4) Yes, but this is leaky, updating an existing object with a new image would bypass all the processing.
    – davidtgq
    Commented May 20, 2016 at 10:53
  • 4) I mean, what path or variable or parameter does the stuff come into the model to be saved? I want to check that to see if there's an image file, and operate only on the image files coming in with the request (not any image files already in storage)
    – davidtgq
    Commented May 20, 2016 at 11:12
  • 2) I think your best bet, regardless of whether you use ImageField or not is to use a hash file name. and for saving you shouldn't and need not use the signal. You should do it at the time the view processes the post. I trust you are using a form? As soon as you confirm that the form is valid, save it there.
    – e4c5
    Commented May 20, 2016 at 11:19
  • 4) if the model instance has not been saved, that's a new upload. simple as that. And this complication arises because you are saving in the signal. Please refer above comment.
    – e4c5
    Commented May 20, 2016 at 11:19
  • 1
    yes, exactly what I was trying to say. You would then also have the opportunity of deleting the existing file.
    – e4c5
    Commented May 20, 2016 at 11:29
2

Took me forever to learn how to save an image using ImageField. Turns out it's crazy easy -- once you know how to do it, it is, at least. I mean, it all comes together sensibly after you see it.

So basically, you're working with a FileField. I already looked into the differences between ImageField and FileField:

  • ImageField takes everything FileField takes in terms of attributes, but ImageField also takes a width and height attribute if indicated. ImageField, unlike FileField, validates an upload, making sure it's an image.

Using ImageField comes down to most of the same constructs as FileField does. The biggest things to remember:

request.FILES['name_of_model']

So a form is generated from something in forms.py (or wherever your forms are) like this:

imgfile         =   forms.ImageField(label = 'Choose your image',  
                                          help_text = 'The image should be cool.')

In the model, you might have this in correspondence:

imgfile = models.ImageField(upload_to='images/%m/%d') So there will be a POST request from the user (when the user completes the form). That request will contain basically a dictionary of data. The dictionary holds the submitted files. To focus the request on the file from the field (in our case, an ImageField), you would use:

request.FILES['imgfield']

You would use that when you construct the model object (instantiating your model class):

newPic = ImageModel(imgfile = request.FILES['imgfile'])

To save that the simple way, you'd just use the save() method bestowed upon your object (because Django is that awesome):

if form.is_valid():  
        newPic = Pic(imgfile = request.FILES['imgfile'])
        newPic.save()

Your image will be stored, by default, to the directory you indicate for MEDIA_ROOT in settings.py.

The tough part, which isn't really so tough when you catch on, is accessing the image.

In your template, you could have something like this:

<img src="{{ MEDIA_URL }}{{ image.imgfile.name }}"></img>

Where {{ MEDIA_URL }} is something like /media/, as indicated in settings.py and {{ image.imgfile.name }} is the name of the file and the subdirectory you indicated in the model. "image" in this case is just the current image in a loop of images you might create to access each image in the database:

{% for image in images %}

  • {% endfor %}

    Make SURE you configure your urls properly to handle the image or the image won't work. Add this to your urls:

    urlpatterns += patterns('',  
            url(r'^media/(?P<path>.*)$', 'django.views.static.serve', {
                'document_root': settings.MEDIA_ROOT,
            }),
       )
    
    1
    • This answer allows to create ImageField from request.FILES directly.
      – unlut
      Commented Oct 30, 2020 at 14:59

    Not the answer you're looking for? Browse other questions tagged or ask your own question.