I'm trying to copy the contents of a large (~350 files, ~40MB total) directory from a Kubernetes pod to my local machine. I'm using the technique described here.
Sometimes it succeeds, but very frequently the standard output piped to the tar xf
command on my host appears to get truncated. When that happens, I see errors like:
<some file in the archive being transmitted over the pipe>: Truncated tar archive
The files in the source directory don't change. The file in the error message is usually different (ie: it appears to be truncated in a different place).
For reference (copied from the document lined to above), this is the analog to what I'm trying to do (I'm using a different pod name and directory names):
kubectl exec -n my-namespace my-pod -- tar cf - /tmp/foo | tar xf - -C /tmp/bar
After running it, I expect the contents of my local /tmp/bar to be the same as those in the pod.
However, more often than not, it fails. My current theory (I have a very limited understanding of how kubectl works, so this is all speculation) is that when kubectl determines that the tar command has completed, it terminates -- regardless of whether or not there are remaining bytes in transit (over the network) containing the contents of standard output.
I've tried various combinations of:
- stdbuf
- Changing tar's blocking factor
- Making the command take longer to run (by adding
&& sleep <x>
)
I'm not going to list all combinations I've tried, but this is an example that uses everything:
kubectl exec -n my-namespace my-pod -- stdbuf -o 0 tar -b 1 -c -f - -C /tmp/foo . && sleep 2 | tar xf - -C /tmp/bar
There are combinations of that command that I can make work pretty reliably. For example, forgetting about stdbuf
and -b 1
and just sleeping for 100 seconds, ie:
kubectl exec -n my-namespace my-pod -- tar -c -f - -C /tmp/foo . && sleep 100 | tar xf - -C /tmp/bar
But even more experimentation led me to believe that the block size of tar (512 bytes, I believe?) was still too large (the arguments of -b
are a count of blocks, not the size of those blocks). This is the command I'm using for now:
kubectl exec -n my-namespace my-pod -- bash -c 'dd if=<(tar cf - -C /tmp/foo .) bs=16 && sleep 10' | tar xf - -C /tmp/bar
And yes, I HAD to make bs that small and sleep "that big" to make it work. But this at least gives me two variables I can mess with. I did find that if I set bs=1, I didn't have to sleep... but it took a LONG time to move all the data (one byte at a time).
So, I guess my questions are:
- Is my theory that kubectl truncates standard output after it determines the command given to
exec
has finished correct? - Is there a better solution to this problem?
z
to thetar
's to compress? You don't describe your use-case but it may be preferable to mount a persistent volume (backed by NFS or cloud storage) into the Pod or have the Pod create the archive and then upload that to cloud storage. You may wish to file an issue on thekubectl
repo.--v=8
to get full log verbosity on thekubectl
command.kubectl exec -n my-namespace my-pod -- bash -c 'dd if=<(tar cf - -C /tmp/foo .) bs=16 && sleep 10' | tar xf - -C /tmp/bar
I'm seeing the exact same issues, and it's very hard to find anyone else that can reproduce this problem. What k8s distro or provider are you using?