2

I have 2 GB of data in memory (for example data = b'ab' * 1000000000) that I would like to write in a encrypted ZIP or 7Z file.

How to do this without writing data to a temporary on-disk file?

Is it possible with only Python built-in tools (+ optionally 7z)?

I've already looked at this:

  • ZipFile.writestr writes from a in-memory string/bytes which is good but:

  • ZipFile.setpassword: only for read, and not write

  • How to create an encrypted ZIP file? : most answers use a file as input (and cannot work with in-memory data), especially the solutions with pyminizip and those with:

    subprocess.call(['7z', 'a', '-mem=AES256', '-pP4$$W0rd', '-y', 'myarchive.zip']... 
    

    Other solutions require to trust an implementation of cryptography by a third-party tool (see comments), so I would like to avoid them.

11
  • It seems like the only actual question here is about writing the output as encrypted? Unless you have working, native Python code that uses a temporary file? "most answers use a file as input" That doesn't appear to be the case to me; but anyway, did you try the answers that don't? For example, the ones referring to pyminizip, or pyzipper? Alternately, did you try to check if any of the tools you might subprocess.call, take the input data from standard input? If they do, do you see how to use pipes to solve the problem? Commented Mar 14, 2022 at 8:32
  • 1
    @KarlKnechtel Look at github.com/danifus/pyzipper/blob/master/pyzipper/zipfile_aes.py here they are doing crypto themselves, I would like to avoid to trust such a third-party project like this, and use built-in tools (ZipFile, or maybe 7z.exe if we can pipe binary data to it without writing to a temp file)
    – Basj
    Commented Mar 14, 2022 at 9:05
  • @KarlKnechtel pyminizip works from a file, not from in-memory data.
    – Basj
    Commented Mar 14, 2022 at 9:10
  • What do you plan to do with the zip file in memory, because it will be destroyed as soon as the process is completed? This is important, because any solution needs to consider this. Commented Mar 17, 2022 at 14:28
  • 1
    @Lifeiscomplex once it is encrypted (and only then), it will be written on disk.
    – Basj
    Commented Mar 17, 2022 at 14:47

5 Answers 5

2
+50

7z.exe has the -si flag, which lets it read data for a file from stdin. This way you could still use 7z's commandline from a subprocess even without an extra file:

from subprocess import Popen, PIPE

# inputs
szip_exe = r"C:\Program Files\7-Zip\7z.exe"  # ... get from registry maybe
memfiles = {"data.dat" : b'ab' * 1000000000}
arch_filename = "myarchive.zip"
arch_password = "Swordfish"

for filename, data in memfiles.items():
    args = [szip_exe, "a", "-mem=AES256", "-y", "-p{}".format(arch_password),
        "-si{}".format(filename), output_filename]
    proc = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    proc.stdin.write(data)
    proc.stdin.close()    # causes 7z to terminate 
    # proc.stdin.flush()  # instead of close() when on Mac, see comments
    proc.communicate()    # wait till it actually has

The write() takes somewhat above 40 seconds on my machine, which is quite a lot. I can't say though if that's due to any inefficiencies from piping the whole data through stdin or if it's just how long compressing and encrypting a 2GB file takes. EDIT: Packing the file from HDD took 47 seconds on my machine, which speaks for the latter.

6
  • I tried and it works even if data contains NULL bytes or other binary data! How does it work so that \x00 or other binary data won't be considered as end-of-stream/pipe/file @Jeronimo?
    – Basj
    Commented Mar 23, 2022 at 11:15
  • @Basj I just set the archive name as myarchive.7z on my Mac and got no errors. Commented Mar 23, 2022 at 13:43
  • @Lifeiscomplex Thanks for the report, I'll investigate on this separately (I removed my off-topic comment).
    – Basj
    Commented Mar 23, 2022 at 14:11
  • True, I get the same error when using a .7z extension (on Windows). I Can't find much more useful info on that flag and how it's to be used anywhere though. :(
    – Jeronimo
    Commented Mar 23, 2022 at 14:11
  • Regarding the binary data / nullbytes: I'm definitely not an expert here, but I assume it reads from stdin until it reaches an EOF, which is not an actual character (or sequence) in the stream but an option/flag set to the stdin file stream and which can be polled regularly, for example after a read() returned 0 bytes. And that's what's triggered by the stdin.close(). On the cmdline using the a.exe < b.dat schema, the flag should get set by cmd.exe after the last byte of the file has been written to the stdin of the process.
    – Jeronimo
    Commented Mar 23, 2022 at 15:29
2

ORIGINAL POST 03.19.2022

Here is one way to accomplish your use case using pyzipper

import fs
import pyzipper

# create in-memory file system
mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

# generate data 
data = b'ab' * 10

secret_password = b'super secret password'

# Create encrypted password protected ZIP file in-memory
with pyzipper.AESZipFile(mem_fs.open('/hidden_dir/password_protected.zip', 'wb'),
                         'w',
                         compression=pyzipper.ZIP_LZMA,
                         encryption=pyzipper.WZ_AES) as zf:
    zf.setpassword(secret_password)
    zf.writestr('data.txt', data)


# Read encrypted password protected ZIP file from memory
with pyzipper.AESZipFile(mem_fs.open('/hidden_dir/password_protected.zip', 'rb')) as zf:
    zf.setpassword(secret_password)
    my_secrets = zf.read('data.txt')
    print(my_secrets)
    # output 
    b'abababababababababab'

UPDATED 03.21.2022

Reading through our comments you continue to raise concerns about the cryptography components of modules, such as pyzipper, but not 7Z LIB/SDK. Here is an academic paper on 7Z LIB/SDK version 19 cryptography.

Based on your concerns have you considered encrypting your data in memory prior to writing it to a zipfile?

Here is an example for doing this and writing the encrypted data to a file in memory:

import os
import fs
import base64

from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

password = b"password"
salt = os.urandom(16)
kdf = PBKDF2HMAC(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,
    iterations=390000,
)

key = base64.urlsafe_b64encode(kdf.derive(password))
f = Fernet(key)

data = b'ab' * 10

encrypted_message = f.encrypt(data)

with mem_fs.open('hidden_dir/encrypted.text', 'wb') as in_file_in_memory:
    in_file_in_memory.write(encrypted_message)
    in_file_in_memory.close()

with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    decrypted_data = f.decrypt(raw_data)
    print(decrypted_data)
    # output
    b'abababababababababab'

Previously in the comments I mentioned key management, which is similar to maintaining a list of passwords for your zip archives.

I don't know your setup, but you could pregenerate keys in advance and stored them in a secure way for use in your code.

I don't have 7z installed on my Mac, so I could only give you pseudocode. The examples below aren't using 7z.

import os
import fs
import base64
import pyzipper
from zipfile import ZipFile
from cryptography.fernet import Fernet

mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

# pregenerate key
f = Fernet(b'-6_WO-GLrlXexdSbon_fKJoVOVBh66LdYrEM0Kvcwf0=')

data = b'ab' * 10

encrypted_message = f.encrypt(data)

with mem_fs.open('hidden_dir/encrypted.text', 'wb') as in_file_in_memory:
    in_file_in_memory.write(encrypted_message)
    in_file_in_memory.close()

# This uses standard ZIP with no password, but the data
# is encrypted 
with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    with ZipFile('archive.zip', mode='w') as zip_file:
        zip_file.writestr('file.txt', raw_data)

# This uses pyzipper to create a password word protected 
# encrypted file, which stores the encrypted.text.
# overkill, because the data is already encrypted prior
with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    secret_password = b'super secret password'
    # Create encrypted password protected ZIP file in-memory
    with pyzipper.AESZipFile('password_protected.zip',
                             'w',
                             compression=pyzipper.ZIP_LZMA,
                             encryption=pyzipper.WZ_AES) as zf:
        zf.setpassword(secret_password)
        zf.writestr('data.txt', raw_data)

I'm still looking into how to pipe this encrypted.text to subprocess 7-zip.

5
  • Thanks! Here github.com/danifus/pyzipper/blob/master/pyzipper/zipfile_aes.py they are doing their own crypto, which I wanted to avoid (this is very error-prone except if the author is a known crytography expert). Is there a way to call the official zip or 7z SDK/library and let this actual lib do the encryption part, instead of doing the crypto ourselves?
    – Basj
    Commented Mar 18, 2022 at 16:49
  • @Basj doesn't pyzipper and ` py7zr` use the same crypto, which is a fork of PyCrypto? Are you worried that the cryptography in these modules will be cracked or have an unknown weakness? Commented Mar 18, 2022 at 17:28
  • In fact I don't understand why don't these modules just call the ZIP or 7Z LIB/SDK which probably already contains the cryptography part? Do you know why @Lifeiscomplex? And yes, calling PyCryptodome and doing our own stuff (I have already done it too ;)) can always produce an unknown weakness :) I prefer to trust libgzip.dll or lib7z.dll (I don't remember the exact name) which is used by millions of people/projects than a new implementation of 7z encryption done by a lesser-known Python module.
    – Basj
    Commented Mar 18, 2022 at 17:43
  • The built-in ZIP module doesn't have encryption capabilities or a way to set a password for writing. So far I haven't found an update Python module that implements 7Z LIB/SDK. Commented Mar 19, 2022 at 0:12
  • I started looking into the cryptography part of pyzipper. The cryptography linked to this module is well used and has 10,000s of users. The cryptography module was first released in 2014 and has gone through around 47 updates. Could there be a hidden flaw? Yes, but any software can have security issues. Commented Mar 19, 2022 at 4:04
1

It would probably be simplest to use third-party applications such as RadeonRAMDisk to emulate disk operations in-memory, but you stated you prefer not to. Another possibility is to extend PyFilesystem to allow encrypted zip-file operations on a memory filesystem.

0

I don't know about Python, but it is possible to do this using the 7zip C++ interface. However, it is a lot of work. Here's an excerpt from my implementation that I'm using for a project that has to pack zip files:

class CArchiveUpdateCallback : public IArchiveUpdateCallback
{
public:
    STDMETHODIMP GetProperty(UInt32 Index, PROPID PropID, PROPVARIANT* PropValue)
    {
        const std::wstring& FilePath = m_FileList[Index].first;
        const std::wstring& ItemPath = m_FileList[Index].second;
        switch (PropID)
        {
        case kpidPath:
            V_VT(PropValue) = VT_BSTR;
            V_BSTR(PropValue) = SysAllocString(ItemPath.c_str());
            break;
        case kpidSize:
            V_VT(PropValue) = VT_UI8;
            PropValue->uhVal.QuadPart = Utils::GetSize(FilePath);
            break;
        }
        return S_OK;
    }
    STDMETHODIMP GetStream(UInt32 ItemIndex, ISequentialInStream** InStream)
    {
        const std::wstring& FilePath = m_FileList[ItemIndex].first;
        HRESULT hr = CInStream::Create(FilePath, IID_ISequentialInStream, (void**)InStream);
        return hr;
    }
protected:
    std::vector<std::pair<std::wstring, std::wstring>> m_FileList;
};

It currently works exclusively with on-disk files, but could be modified to accommodate in-memory buffers. For example, it could operate on a list of (in-memory) IInStream objects instead of a list of file paths.

0

I am not able able to understand why you don't want temporary on-disk file, as it would reduce the complexity.

And yes I have found few solution which requires only built in modules of python:

    • You can use subprocess to interact with powershell and create zip file using powershell command. You can either run the command or save the command in a .ps1 file and execute it. (This solution requires you to install 7zip software)
def run(self, cmd):
    completed = subprocess.run(["powershell", "-Command", cmd], capture_output=True)
    return completed
  • and the powershell code would be:
# Note this code is not written by me, link is provide to the actual owner
function Write-ZipUsing7Zip([string]$FilesToZip, [string]$ZipOutputFilePath, [string]$Password, [ValidateSet('7z','zip','gzip','bzip2','tar','iso','udf')][string]$CompressionType = 'zip', [switch]$HideWindow)
{
    # Look for the 7zip executable.
    $pathTo32Bit7Zip = "C:\Program Files (x86)\7-Zip\7z.exe"
    $pathTo64Bit7Zip = "C:\Program Files\7-Zip\7z.exe"
    $THIS_SCRIPTS_DIRECTORY = Split-Path $script:MyInvocation.MyCommand.Path
    $pathToStandAloneExe = Join-Path $THIS_SCRIPTS_DIRECTORY "7za.exe"
    if (Test-Path $pathTo64Bit7Zip) { $pathTo7ZipExe = $pathTo64Bit7Zip }
    elseif (Test-Path $pathTo32Bit7Zip) { $pathTo7ZipExe = $pathTo32Bit7Zip }
    elseif (Test-Path $pathToStandAloneExe) { $pathTo7ZipExe = $pathToStandAloneExe }
    else { throw "Could not find the 7-zip executable." }

    # Delete the destination zip file if it already exists (i.e. overwrite it).
    if (Test-Path $ZipOutputFilePath) { Remove-Item $ZipOutputFilePath -Force }

    $windowStyle = "Normal"
    if ($HideWindow) { $windowStyle = "Hidden" }

    # Create the arguments to use to zip up the files.
    # Command-line argument syntax can be found at: http://www.dotnetperls.com/7-zip-examples
    $arguments = "a -t$CompressionType ""$ZipOutputFilePath"" ""$FilesToZip"" -mx9"
    if (!([string]::IsNullOrEmpty($Password))) { $arguments += " -p$Password" }

    # Zip up the files.
    $p = Start-Process $pathTo7ZipExe -ArgumentList $arguments -Wait -PassThru -WindowStyle $windowStyle

    # If the files were not zipped successfully.
    if (!(($p.HasExited -eq $true) -and ($p.ExitCode -eq 0)))
    {
        throw "There was a problem creating the zip file '$ZipFilePath'."
    }
}
  1. Using powershell dependency 7zip4PowerShell and then interact with the shell using subprocess. (Link provided)
  • Launch PowerShell with administrative escalation.
  • Install the 7-zip module by entering the cmdlet below. It does query the PS gallery and uses a third-party repository to download the dependencies. If you’re OK with the security considerations, approve the installation to proceed:

Install-Module -Name 7zip4PowerShell -Verbose

  • Change directories to where you want the compressed file saved.
  • Create a secure string for your compressed file’s encryption by entering the cmdlet below:

$SecureString = Read-Host -AsSecureString

  • Enter the password you wish to use in PowerShell. The password will be obfuscated by asterisks. The plain text entered will be converted to $SecuresString, and you’ll use that in the next step.

  • Enter the following cmdlet to encrypt the resulting compressed file:

Compress-7zip -Path "\path ofiles" -ArchiveFileName "Filename.zip" -Format Zip -SecurePassword $SecureString

  • The resulting ZIP file will be saved to the chosen directory once the command has completed processing.

  • You can either follow the process in powershell terminal, or just interact with the terminal after installing the dependency using subprocess.

References:

Not the answer you're looking for? Browse other questions tagged or ask your own question.