3

I'm trying to copy the contents of a large (~350 files, ~40MB total) directory from a Kubernetes pod to my local machine. I'm using the technique described here.

Sometimes it succeeds, but very frequently the standard output piped to the tar xf command on my host appears to get truncated. When that happens, I see errors like: <some file in the archive being transmitted over the pipe>: Truncated tar archive

The files in the source directory don't change. The file in the error message is usually different (ie: it appears to be truncated in a different place).

For reference (copied from the document lined to above), this is the analog to what I'm trying to do (I'm using a different pod name and directory names): kubectl exec -n my-namespace my-pod -- tar cf - /tmp/foo | tar xf - -C /tmp/bar

After running it, I expect the contents of my local /tmp/bar to be the same as those in the pod.

However, more often than not, it fails. My current theory (I have a very limited understanding of how kubectl works, so this is all speculation) is that when kubectl determines that the tar command has completed, it terminates -- regardless of whether or not there are remaining bytes in transit (over the network) containing the contents of standard output.

I've tried various combinations of:

  1. stdbuf
  2. Changing tar's blocking factor
  3. Making the command take longer to run (by adding && sleep <x>)

I'm not going to list all combinations I've tried, but this is an example that uses everything: kubectl exec -n my-namespace my-pod -- stdbuf -o 0 tar -b 1 -c -f - -C /tmp/foo . && sleep 2 | tar xf - -C /tmp/bar

There are combinations of that command that I can make work pretty reliably. For example, forgetting about stdbuf and -b 1 and just sleeping for 100 seconds, ie: kubectl exec -n my-namespace my-pod -- tar -c -f - -C /tmp/foo . && sleep 100 | tar xf - -C /tmp/bar

But even more experimentation led me to believe that the block size of tar (512 bytes, I believe?) was still too large (the arguments of -b are a count of blocks, not the size of those blocks). This is the command I'm using for now: kubectl exec -n my-namespace my-pod -- bash -c 'dd if=<(tar cf - -C /tmp/foo .) bs=16 && sleep 10' | tar xf - -C /tmp/bar

And yes, I HAD to make bs that small and sleep "that big" to make it work. But this at least gives me two variables I can mess with. I did find that if I set bs=1, I didn't have to sleep... but it took a LONG time to move all the data (one byte at a time).

So, I guess my questions are:

  1. Is my theory that kubectl truncates standard output after it determines the command given to exec has finished correct?
  2. Is there a better solution to this problem?
6
  • Perhaps it would have been more accurate to say that my theory is: The last chunk of data sent to standard output by tar (before it exits) appears to be in a race against time to get back to kubectl before kubectl learns that tar has finshed. My command enables me to keep the "chunks" small while independently being able to change how long I have to wait for them. Commented Dec 7, 2022 at 0:50
  • Possibly networking related. Have you tried adding z to the tar's to compress? You don't describe your use-case but it may be preferable to mount a persistent volume (backed by NFS or cloud storage) into the Pod or have the Pod create the archive and then upload that to cloud storage. You may wish to file an issue on the kubectl repo.
    – DazWilkin
    Commented Dec 7, 2022 at 18:03
  • ...and possibly (though I suspect it won't be helpful in this case) add --v=8 to get full log verbosity on the kubectl command.
    – DazWilkin
    Commented Dec 7, 2022 at 18:04
  • This saved my day: kubectl exec -n my-namespace my-pod -- bash -c 'dd if=<(tar cf - -C /tmp/foo .) bs=16 && sleep 10' | tar xf - -C /tmp/bar I'm seeing the exact same issues, and it's very hard to find anyone else that can reproduce this problem. What k8s distro or provider are you using? Commented Nov 8, 2023 at 23:33
  • 1
    I am just running into the same issue, trying to pull a backup from a workload running in the Amazon Elastic Kubernetes Service. No matter what I try, the tar archive is always incomplete (zipping makes no difference). Luckily, the dd command actually worked, even though I do not like the "solution" to add a sleep. Looks like a k8s issue to me. Actually, it is, apparently fixed in 1.30: github.com/kubernetes/kubernetes/issues/60140
    – Bluehorn
    Commented Jun 21 at 19:49

1 Answer 1

1

Maybe you haven't been specific enough for regarding what the full command that it must contend with really is. There might be ambiguity as to who should be responsible for the pipe process. The "--" probably doesn't direct kubectl to include that as part of the command. That is probably being intercepted by the shell.

Have you tried wrapping all of it in double-quotes ?

CMD="tar cf - /tmp/foo | tar xf - -C /tmp/bar"

kubectl exec -n my-namespace my-pod -- "${CMD}"

That way it would include the scope of saving at the target as part of the process to monitor for completion.

Not the answer you're looking for? Browse other questions tagged or ask your own question.