0

I would like to upload files of a given folder to a bucket using gsutil rsync. However, instead of uploading all files, I would like to exclude files that are below a certain size. The Unix rsync command offers the option --min-size=SIZE. Is there an equivalent for the gsutil tool? If not, is there an easy way of excluding small files?

2 Answers 2

1

Ok so the easiest solution I found is to move the small files into a subdirectory, and then use rsync (without the -r option). The code for moving the files:

def filter_images(source, limit):
    imgs = [img for img in glob(join(source, "*.tiff")) if (getsize(img)/(1024*1024.0)) < limit]
    if len(imgs) == 0:
        return

    filtered_dir = join(source, "filtered")
    makedirs(filtered_dir)
    for img in imgs:
        shutil.move(img, filtered_dir)
0

You don't have this option. You can perform manually by scripting this and send file by file. But it's not very efficient. I propose you this command:

find . -type f -size -4000c | xargs -I{} gsutil cp {} gs://my-bucket/path

Here only the file bellow 4k will be copied. Here the list of find unit

c for bytes
w for two-byte words
k for Kilobytes
M for Megabytes
G for Gigabytes

Not the answer you're looking for? Browse other questions tagged or ask your own question.