Bulk file restore from Google Cloud Storage

Question

Accidentally run delete command on wrong bucket, object versioning is turned on, but I don't really understand what steps should I take in order to restore files, or what's more important, how to do it in bulk as I've deleted few hundreds of them.

Will appreciate any help.

Mike Schwartz · Accepted Answer · 2017-04-12 16:38:42Z

To restore hundreds of objects you could do something as simple as:

gsutil cp -AR gs://my-bucket gs://my-bucket

This will copy all objects (including deleted ones) to the live generation, using metadata-only copying, i.e., not require copying the actual bytes. Caveats:

It will leave the deleted generations in place, so costing you extra storage.
If your bucket isn't empty this command will re-copy any live objects on top of themselves (ending up with an extra archived version of each of those as well, also costing you for extra storage).
If you want to restore a large number of objects this simplistic script would run too slowly - you'd want to parallelize the individual gsutil cp operations. You can't use the gsutil -m option in this case, because gsutil prevents that, in order to preserve generation ordering (e.g., if there were several generations of objects with the same name, parallel copying them would end up with the live generation coming from an unpredictable generation). If you only have 1 generation of each you could parallelize the copying by doing something like:

gsutil ls -a gs://my-bucket/** | sed 's/\(.\)\(#[0-9]\)/gsutil cp \1\2 \1 \&/' > gsutil_script.sh

This generates a listing of all objects (including deleted ones), and transforms it into a sequence of gsutil cp commands to copy those objects (by generation-specific name) back to the live generation in parallel. If the list is long you'll want to break in into parts so you don't (for example) try to fork 100k processes to do the parallel copying (which would overload your machine).

Thanks Mike for in-depth answer, wish documentation was that precise! — Pythonist, Commented Apr 9, 2017 at 20:04
Acctually when I try to use you command for parallel coping there's sed: -e expression #1, char 35: invalid reference \2 on `s' command's RHS error. — Pythonist, Commented Apr 11, 2017 at 6:57
Sorry about that - the backslash characters in the command got swallowed by github formatting, so I had to escape them. I updated the command to fix this - please try it again. — Mike Schwartz, Commented Apr 12, 2017 at 16:40
One other problem with your cp -AR command is that it will put all the versioned objects inside a "folder" called my-bucket hence you will have all of your objects placed in gs://my-bucket/my-bucket — Ian H, Commented May 17, 2017 at 21:35
You can use gsutil cp -AR gs://my-bucket/* gs://my-bucket to avoid creation of folder named "my-bucket". — Vikash Pareek, Commented May 18, 2022 at 7:48

Collectives™ on Stack Overflow

Bulk file restore from Google Cloud Storage

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
google-cloud-storage
cloud-storage
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged google-cloud-storagecloud-storage or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
google-cloud-storage
cloud-storage
or ask your own question.