I would like to upload files of a given folder to a bucket using gsutil rsync
. However, instead of uploading all files, I would like to exclude files that are below a certain size. The Unix rsync
command offers the option --min-size=SIZE
. Is there an equivalent for the gsutil tool? If not, is there an easy way of excluding small files?
Add a comment
|
2 Answers
Ok so the easiest solution I found is to move the small files into a subdirectory, and then use rsync
(without the -r
option). The code for moving the files:
def filter_images(source, limit):
imgs = [img for img in glob(join(source, "*.tiff")) if (getsize(img)/(1024*1024.0)) < limit]
if len(imgs) == 0:
return
filtered_dir = join(source, "filtered")
makedirs(filtered_dir)
for img in imgs:
shutil.move(img, filtered_dir)
You don't have this option. You can perform manually by scripting this and send file by file. But it's not very efficient. I propose you this command:
find . -type f -size -4000c | xargs -I{} gsutil cp {} gs://my-bucket/path
Here only the file bellow 4k will be copied. Here the list of find unit
c for bytes
w for two-byte words
k for Kilobytes
M for Megabytes
G for Gigabytes