38

I'm trying to create a function in my script that zips the contents of a given source directory (src) to a zip file (dst). For example, zip('/path/to/dir', '/path/to/file.zip'), where /path/to/dir is a directory, and /path/to/file.zip doesn't exist yet. I do not want to zip the directory itself, this makes all the difference in my case. I want to zip the files (and subdirs) in the directory. This is what I'm trying:

def zip(src, dst):
    zf = zipfile.ZipFile("%s.zip" % (dst), "w")
    for dirname, subdirs, files in os.walk(src):
        zf.write(dirname)
        for filename in files:
            zf.write(os.path.join(dirname, filename))
    zf.close()

This creates a zip that is essentially /. For example, if I zipped /path/to/dir, extracting the zip creates a directory with "path" in it, with "to" in that directory, etc.

Does anyone have a function that doesn't cause this problem?

I can't stress this enough, it needs to zip the files in the directory, not the directoy itself.

1

3 Answers 3

57

The zipfile.write() method takes an optional arcname argument that specifies what the name of the file should be inside the zipfile.

You can use this to strip off the path to src at the beginning. Here I use os.path.abspath() to make sure that both src and the filename returned by os.walk() have a common prefix.

#!/usr/bin/env python2.7

import os
import zipfile

def zip(src, dst):
    zf = zipfile.ZipFile("%s.zip" % (dst), "w", zipfile.ZIP_DEFLATED)
    abs_src = os.path.abspath(src)
    for dirname, subdirs, files in os.walk(src):
        for filename in files:
            absname = os.path.abspath(os.path.join(dirname, filename))
            arcname = absname[len(abs_src) + 1:]
            print 'zipping %s as %s' % (os.path.join(dirname, filename),
                                        arcname)
            zf.write(absname, arcname)
    zf.close()

zip("src", "dst")

With a directory structure like this:

src
└── a
    ├── b
    │   └── bar
    └── foo

The script prints:

zipping src/a/foo as a/foo
zipping src/a/b/bar as a/b/bar

And the contents of the resulting zip file are:

Archive:  dst.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  01-28-13 11:36   a/foo
        0  01-28-13 11:36   a/b/bar
 --------                   -------
        0                   2 files
10
  • Looks promising (EDIT: works perfectly), but is there any reason to import os and os.path?
    – tkbx
    Commented Jan 28, 2013 at 18:52
  • Yes—os for os.walk(), and os.path for os.path.abspath() and os.path.join().
    – andrewdotn
    Commented Jan 28, 2013 at 18:55
  • 2
    @tkbx: from os import path puts path at the top level, so you can do path.join instead of os.path.join. This is usually not what you want to do (especially since everyone always has a variable named path somewhere in their code).
    – abarnert
    Commented Jan 28, 2013 at 19:49
  • 1
    @tkbx: No, you can't import sys.argv unless argv is a sub-module under sys. But argv isn't a module, it's just a list. But when you import sys—which is the normal thing you do most of the time—you then do script, vars = sys.argv. (Although really, you wouldn't write that very often, either, because you'll get a ValueError if there are 0 or 2 command-line arguments.)
    – abarnert
    Commented Jan 29, 2013 at 0:54
  • 2
    This function works fine, but it will not add empty folder into the zip file, which in most case is the expected behavior. In another word, any sub-folder without a file in it will be ignored.
    – bobyuan
    Commented Oct 21, 2015 at 7:24
1

From what I can tell you are close. You could use dirname and basename to make sure you are grabbing the right path name:

>>> os.path.dirname("/path/to/dst")
'/path/to'
>>> os.path.basename("/path/to/dst")
'dst'

Then using chdir you can make sure you are in the parent so the paths are relative.

def zip(src, dst):
    parent = os.path.dirname(dst)
    folder = os.path.basename(dst)

    os.chdir(parent):
    for dirname, subdirs, filenames in os.walk(folder):
        ...

This creates:

dst/a.txt
dst/b
dst/b/c.txt
...etc...

If do not want to include the name "dst" you can just do os.chdir(dst) and then os.walk('.').

Hope that helps.

1
1

Use the arcname parameter to control the name/path in the zip file.

For example, for a zip file that contains only files, no directories:

zf.write(os.path.join(dirname, filename), arcname=filename)

Or to invent a new directory inside the zip file:

zf.write(os.path.join(dirname, filename), arcname=os.path.join("my_zip_dir", filename))

Not the answer you're looking for? Browse other questions tagged or ask your own question.