4

I want to create a tar archive with a hierarchical directory structure from Python, using strings for the contents of the files. I've read this question , which shows a way of adding strings as files, but not as directories. How can I add directories on the fly to a tar archive without actually making them?

Something like:

archive.tgz:
    file1.txt
    file2.txt
    dir1/
        file3.txt
        dir2/
            file4.txt

3 Answers 3

11

Extending the example given in the question linked, you can do it as follows:

import tarfile
import StringIO
import time

tar = tarfile.TarFile("test.tar", "w")

string = StringIO.StringIO()
string.write("hello")
string.seek(0)

info = tarfile.TarInfo(name='dir')
info.type = tarfile.DIRTYPE
info.mode = 0755
info.mtime = time.time()
tar.addfile(tarinfo=info)

info = tarfile.TarInfo(name='dir/foo')
info.size=len(string.buf)
info.mtime = time.time()
tar.addfile(tarinfo=info, fileobj=string)

tar.close()

Be careful with mode attribute since default value might not include execute permissions for the owner of the directory which is needed to change to it and get its contents.

2
  • 1
    ... and you might want to set the TarInfo.mtime member to prevent GNU tar's implausibly old time stamp warning, but +1.
    – Fred Foo
    Commented Dec 27, 2011 at 20:41
  • @larsmans You're right. I've updated my answer to set TarInfo.mtime. Thanks for your commetn.
    – jcollado
    Commented Dec 27, 2011 at 20:46
2

A slight modification to the helpful accepted answer so that it works with python 3 as well as python 2 (and matches the OP's example a bit closer):

from io import BytesIO
import tarfile
import time

# create and open empty tar file
tar = tarfile.open("test.tgz", "w:gz")

# Add a file
file1_contents = BytesIO("hello 1".encode())
finfo1 = tarfile.TarInfo(name='file1.txt')
finfo1.size = len(file1_contents.getvalue())
finfo1.mtime = time.time()
tar.addfile(tarinfo=finfo1, fileobj=file1_contents)

# create directory in the tar file
dinfo = tarfile.TarInfo(name='dir')
dinfo.type = tarfile.DIRTYPE
dinfo.mode = 0o755
dinfo.mtime = time.time()
tar.addfile(tarinfo=dinfo)

# add a file to the new directory in the tar file
file2_contents = BytesIO("hello 2".encode())
finfo2 = tarfile.TarInfo(name='dir/file2.txt')
finfo2.size = len(file2_contents.getvalue())
finfo2.mtime = time.time()
tar.addfile(tarinfo=finfo2, fileobj=file2_contents)

tar.close()

In particular, I updated octal syntax following PEP 3127 -- Integer Literal Support and Syntax, switched to BytesIO from io, used getvalue instead of buf, and used open instead of TarFile to show zipped output as in the example. (Context handler usage (with ... as tar:) would also work in both python2 and python3, but cut and paste didn't work with my python2 repl, so I didn't switch it.) Tested on python 2.7.15+ and python 3.7.3.

1

Looking at the tar file format it seems doable. The files that go in each subdirectory get the relative pathname (e.g. dir1/file3.txt) as their name.

The only trick is that you must define each directory before the files that go into it (tar won't create the necessary subdirectories on the fly). There is a special flag you can use to identify a tarfile entry as a directory, but for legacy purposes, tar also accepts file entries having names that end with / as representing directories, so you should be able to just add dir1/ as a file from a zero-length string using the same technique.

Not the answer you're looking for? Browse other questions tagged or ask your own question.