92

I'm looking for a command line wrapper for the DEFLATE algorithm.

I have a file (git blob) that is compressed using DEFLATE, and I want to uncompress it. The gzip command does not seem to have an option to directly use the DEFLATE algorithm, rather than the gzip format.

Ideally I'm looking for a standard Unix/Linux tool that can do this.

edit: This is the output I get when trying to use gzip for my problem:

$ cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip

gzip: stdin: not in gzip format
1

22 Answers 22

53

You can do this with the OpenSSL command line tool:

openssl zlib -d < $IN > $OUT

Unfortunately, at least on Ubuntu, the zlib subcommand is disabled in the default build configuration (--no-zlib --no-zlib-dynamic), so you would need to compile openssl from source to use it. But it is enabled by default on Arch, for example.

Edit: Seems like the zlib command is no longer supported on Arch either. This answer might not be useful anymore :(

5
  • 13
    Note that the zlib sub-command (and the -z option to the enc sub-command) is not available if your build of openssl was configured with the default options, which include --no-zlib and --no-zlib-dynamic. So this answer only works if your openssl was compiled with the no- prefix removed from one of those configure options. You can tell by looking for -DZLIB in the output from openssl version -f
    – Hercynium
    Commented May 13, 2014 at 16:02
  • @Hercynium thanks! In particular this is the case for Ubuntu 14.04 :( Commented Dec 14, 2014 at 9:43
  • Works on Mac as well.
    – Ben
    Commented Aug 28, 2018 at 13:27
  • 3
    Does not work on mac with LibreSSL 2.2.7. I get openssl:Error: 'zlib' is an invalid command. Commented Mar 25, 2019 at 5:24
  • 1
    This works on Windows as well, using the bundled openssl in the git bash shell.
    – codeape
    Commented Jun 15, 2021 at 11:44
53

Something like the following will print the raw content, including the "$type $length\0" header:

perl -MCompress::Zlib -e 'undef $/; print uncompress(<>)' \
     < .git/objects/27/de0a1dd5a89a94990618632967a1c86a82d577
2
  • 1
    [Incorrectly] Empty output and zero exit code on a raw deflate stream without the 78 marker and the final crc.
    – ulidtko
    Commented Apr 24, 2017 at 10:50
  • Works for me also with any data directly compressed in C using zlib, so awesome answer. And as usual: In the end, most world problems are solvable by a PERL one-liner ;)
    – Mecki
    Commented Oct 11, 2017 at 13:22
43

pythonic one-liner (updated for python3's sharp distinction between text and binary data):

$> python -c "import zlib,sys;\
           sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < $IN
2
  • repr(...) seems to wrap everything in quotes ('...'), so I had to remove it (decompressing a zlib compressed JSON file). Commented Apr 9, 2014 at 11:37
  • 1
    Actually it's python -c "import zlib,sys;print(zlib.decompress(sys.stdin.buffer.read()).decode('utf8'))" < $IN, if you expect a utf8 file for instance in Python 3 Commented Jan 30, 2017 at 19:11
39

UPDATE: Mark Adler noted that git blobs are not raw DEFLATE streams, but zlib streams. These can be unpacked by the pigz tool, which comes pre-packaged in several Linux distributions:

$ cat foo.txt 
file foo.txt!

$ git ls-files -s foo.txt
100644 7a79fc625cac65001fb127f468847ab93b5f8b19 0   foo.txt

$ pigz -d < .git/objects/7a/79fc625cac65001fb127f468847ab93b5f8b19 
blob 14file foo.txt!

Edit by kriegaex: Git Bash for Windows users will notice that pigz is unavailable by default. You can find precompiled 32/64-bit versions here. I tried the 64-bit version and it works nicely. You can e.g. copy pigz.exe directly to c:\Program Files\Git\usr\bin in order to put it on the path.

Edit by mjaggard: Homebrew and Macports both have pigz available so you can install with brew install pigz or sudo port install pigz (if you do not have it already, you can install Homebrew by following the instructions on their website)


My original answer, kept for historical reasons:

If I understand the hint in the Wikipedia article mentioned by Marc van Kempen, you can use puff.c from zlib directly.

This is a small example:

#include <assert.h>
#include <string.h>
#include "puff.h"

int main( int argc, char **argv ) {
    unsigned char dest[ 5 ];
    unsigned long destlen = 4;
    const unsigned char *source = "\x4B\x2C\x4E\x49\x03\x00";
    unsigned long sourcelen = 6;    
    assert( puff( dest, &destlen, source, &sourcelen ) == 0 );
    dest[ 4 ] = '\0';
    assert( strcmp( dest, "asdf" ) == 0 );
}
6
  • 4
    Yeah, I looked at that. But I would definitely prefer a commonly packaged tool. Commented Jul 5, 2010 at 10:10
  • Ok, made a very late edit now with a working minimal example.
    – mkluwe
    Commented May 10, 2016 at 19:20
  • 4
    This will not work. git blobs are zlib streams, not raw deflate. This solution works on raw deflate. puff does not process the zlib header and trailer. If you want a utility, you can use pigz, which will decompress the zlib format with the -dz option, as well as generate the zlib format with -z.
    – Mark Adler
    Commented Nov 5, 2017 at 15:41
  • 1
    @MarkAdler -z, --zlib Compress to zlib (.zz) instead of gzip format. As of now this flag is relevant for just compressing, not decompressing. pigz -d < "infile" > "outfile" works just fine.
    – murla
    Commented Mar 18, 2020 at 4:55
  • @mkluwe, I hope you do not mind that I added info about pigz for Windows Git Bash users. This answer is still correct and was very useful for me, I just wanted to further improve it.
    – kriegaex
    Commented Oct 22, 2020 at 0:45
28

You can use zlib-flate, like this:

cat .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 \
    | zlib-flate -uncompress; echo

It's there by default on my machine, but it's part of qpdf - tools for and transforming and inspecting PDF files if you need to install it.

I've popped an echo on the end of the command, as it's easier to read the output that way.

1
  • 6
    No need for cat: zlib-flate -uncompress < .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 Commented May 15, 2017 at 20:26
28

Try the following command:

printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7 | gunzip

No external tools are needed.

Source: How to uncompress zlib data in UNIX? at unix SE

4
  • 1
    You end up with an "unexpected end of file" error, but still a neat hack.
    – Eric
    Commented Mar 18, 2015 at 17:01
  • 3
    Just prefixing with a gzip file header. Nice :) Commented Jan 14, 2016 at 20:32
  • 1
    That's where I also found it - added zlipd() (printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" |cat - $@ |gzip -dc) to my .bashrc now :) Commented May 4, 2016 at 6:05
  • Nice hack! @Eric add 2> /dev/null to send stderr to null.
    – poe84it
    Commented Feb 18, 2018 at 23:19
14

Here is a Ruby one-liner ( cd .git/ first and identify path to any object ):

ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < ./74/c757240ec596063af8cd273ebd9f67073e1208
1
  • to strip the [blob size] header ruby -rzlib -e 'print Zlib::Inflate.inflate($stdin.read).split("\x00")[1..-1].join' < .git/objects/abc
    – yachi
    Commented Apr 2, 2015 at 2:16
12

I got tired of not having a good solution for this, so I put something on NPM:

https://github.com/jezell/zlibber

Now can just pipe to inflate / deflate command.

3
  • How do you use this package?
    – RHPT
    Commented Mar 25, 2016 at 15:05
  • 1
    @RHPT On Windows, do "type #### | inflate", where #### is the checksum of the object.
    – mhenry1384
    Commented Oct 3, 2016 at 1:22
  • Or inflate < filename Commented Mar 14, 2019 at 10:56
10

Here's a example of breaking open a commit object in Python:

$ git show
commit 0972d7651ff85bedf464fba868c2ef434543916a
# all the junk in my commit...
$ python
>>> import zlib
>>> file = open(".git/objects/09/72d7651ff85bedf464fba868c2ef434543916a")
>>> data = file.read()
>>> print data
# binary garbage
>>> unzipped_data = zlib.decompress(data)
>>> print unzipped_data
# all the junk in my commit!

What you will see there is almost identical to the output of 'git cat-file -p [hash]', except that command doesn't print the header ('commit' followed by the size of the content and a null byte).

2
  • 4
    Depending on the operating system you might want to add the "rb" switch for open like: file = open(".git/objects/09/72d7651ff85bedf464fba868c2ef434543916a", "rb")
    – Igor Popov
    Commented Nov 19, 2011 at 11:16
  • unknown compression method for mine.
    – cybernard
    Commented Jan 26, 2019 at 1:17
9

git objects are compressed by zlib rather than gzip, so either using zlib to uncompress it, or git command, i.e. git cat-file -p <SHA1>, to print content.

2
  • 3
    As Jack points out above, the output of git cat-file -p <SHA1> is not the complete contents of the zlib decompression of .git/objects/<SHA1>. The difference is key if you're trying to implement a Git commit hash calculator ...
    – ntc2
    Commented Jan 23, 2014 at 4:15
  • The -p pretty print option is an advantage though when you want to understand the contents of the object. Uncompressing a tree object with pigz will not give you a human-readable result.
    – kheyse
    Commented Mar 1, 2018 at 21:34
9

Looks like Mark Adler has us in mind and wrote an example of just how to do this with: http://www.zlib.net/zpipe.c

It compiles with nothing more than gcc -lz and the zlib headers installed. I copied the resulting binary to my /usr/local/bin/zpipe while working with git stuff.

7
// save this as deflate.go

package main

import (
    "compress/zlib"
    "io"
    "os"
    "flag"
)

var infile = flag.String("f", "", "infile")

func main() {
    flag.Parse()
    file, _ := os.Open(*infile)

    r, err := zlib.NewReader(file)
    if err != nil {
        panic(err)
    }
    io.Copy(os.Stdout, r)

    r.Close()
}

$ go build deflate.go
$ ./deflate -f .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7
1
  • Works beautifully on macOS 10.11, thanks! I had to install Go (which I'd meant to do anyway) from the official website, then it worked perfectly. Did you write this yourself? It's not very nice about unexpected arguments. :)
    – Wildcard
    Commented Oct 2, 2016 at 2:55
4

pigz can do it:

apt-get install pigz
unpigz -c .git/objects/c0/fb67ab3fda7909000da003f4b2ce50a53f43e7
4

git objects are zlib streams (not raw deflate). pigz will decompress those with the -dz option.

2

Python3 oneliner:

python3 -c "import zlib,sys; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))" < infile > outfile

This way the contents is handled as binary data, avoiding conversion to/from unicode.

2

I have repeatedly come across this problem and it seems almost all of answers on the Internet are either wrong, require compiling some less than ideal code, or downloading a whole slew of dependencies untracked by the system! But I found a real solution. It uses PERL since PERL is readily available on most systems.

From a Bash-alike shell:

perl -mIO::Uncompress::RawInflate=rawinflate -erawinflate'"-","-"'

Or, if you're exec/fork-ing manually (without shell quotes, but line separated):

  • perl
  • -mIO::Uncompress::RawInflate=rawinflate
  • -erawinflate"-","-"

Big caveat: If the stream doesn't start off as a valid DEFLATE stream (such as say, uncompressed data), then this command will happily pipe all the data through untouched. Only if the stream begins as a valid DEFLATE stream (with a valid dictionary I suppose? I'm not too sure...), then this command will error somehow. In some situations this may be desirable however.

References:

PERL IO::Uncompress::RawInflate::rawinflate

2
  • Git objects are not raw deflate. They are zlib streams.
    – Mark Adler
    Commented Nov 5, 2021 at 22:12
  • Hello Adler! Yes, this answer doesn't actually answer the question as written. Perhaps it should be moved to its own question and answer, and perhaps the question's title should be changed also. Commented Nov 6, 2021 at 0:18
1

See http://en.wikipedia.org/wiki/DEFLATE#Encoder_implementations

It lists a number of software implementations, including gzip, so that should work. Did you try just running gzip on the file? Does it not recognize the format automatically?

How do you know it is compressed using DEFLATE? What tool was used to compress the file?

3
  • See the bottom of this page: progit.org/book/ch9-2.html Gzip does implement DEFLATE, but it doesn't seem like you can directly apply the algorithm. Gzip expects the data to be in gzip format (which adds a bunch of headers & stuff around the DEFLATE'ed data). (I just edited my post to include the output from gunzip) Commented Jul 5, 2010 at 10:07
  • 2
    Ah ok, so the data is compressed using the zlib library, then it stands to reason you can uncompress using zlib too! You could try a ruby, perl or other binding to wip up a simple deflate script. Or if you're not afraid to try your hands at compiling a c program, try this: zlib.net/zlib_how.html Commented Jul 5, 2010 at 10:20
  • NB I just tried it and zpipe.c works on git objects, compile with 'gcc -o zpipe zpipe.c -I/path/to/zlib.h -L/path/to/zlib -lz' use: ./zpipe -d < .git/objects/83/535d1693580f04824a2ddd22bd241fd00533d8 (use -d for decompression) Commented Jul 5, 2010 at 12:09
1

Why don't you just use git's tools to access the data? This should be able to read any git object:

git show --pretty=raw <object SHA-1>
4
  • 4
    I'm preparing for a little git-workshop I'm going to give soon. One of the examples involves showing what 'git add' does by hand. De-compressing the blob using git itself doesn't make sense since I want to show the underlaying functionality. I will probably end up using ruby or perl, but I was hoping I could stick with a simple bash oneliner. Commented Jul 5, 2010 at 10:58
  • 4
    Or git cat-file -p c0fb67ab3fda7909000da003f4b2ce50a53f43e7 Commented Jul 5, 2010 at 12:51
  • @igorw: only as long as the object is in the tree. knowledge about finding some git-objects in 'lost+found' (after fsck.ext4 put them there) comes in quite handy ...
    – akira
    Commented Nov 30, 2011 at 12:27
  • 2
    As others have pointed out, this does not give you the complete contents of a git object. Important if you trying to programmatically work on git objects. Commented Feb 3, 2015 at 8:14
1

This is how I do it with Powershell.

$fs = New-Object IO.FileStream((Resolve-Path $Path), [IO.FileMode]::Open, [IO.FileAccess]::Read)
$fs.Position = 2
$cs = New-Object IO.Compression.DeflateStream($fs, [IO.Compression.CompressionMode]::Decompress)
$sr = New-Object IO.StreamReader($cs)
$sr.ReadToEnd()

You can then create an alias like:

function func_deflate{
    param(
        [Parameter(Mandatory=$true, ValueFromPipeline = $true)]
        [ValidateScript({Test-Path $_ -PathType leaf})]
        [string]$Path
    )
    $ErrorActionPreference = 'Stop'    
    $fs = New-Object IO.FileStream((Resolve-Path $Path), [IO.FileMode]::Open, [IO.FileAccess]::Read)
    $fs.Position = 2
    $cs = New-Object IO.Compression.DeflateStream($fs, [IO.Compression.CompressionMode]::Decompress)
    $sr = New-Object IO.StreamReader($cs)
    return $sr.ReadToEnd()
}

Set-Alias -Name deflate -Value func_deflate

enter image description here

1

I found this question looking for a work-around with a bug with the -text utility in the new version of the hadoop dfs client I just installed. The -text utility works like cat, except if the file being read is compressed, it transparently decompresses and outputs the plain-text (hence the name).

The answers already posted were definitely helpful, but some of them have one problem when dealing with Hadoop-sized amounts of data - they read everything into memory before decompressing.

So, here are my variations on the Perl and Python answers above that do not have that limitation:

Python:

hadoop fs -cat /path/to/example.deflate |
  python -c 'import zlib,sys;map(lambda b:sys.stdout.write(zlib.decompress(b)),iter(lambda:sys.stdin.read(4096),""))'

Perl:

hadoop fs -cat /path/to/example.deflate |
  perl -MCompress::Zlib -e 'print uncompress($buf) while sysread(STDIN,$buf,4096)'

Note the use of the -cat sub-command, instead of -text. This is so that my work-around does not break after they've fixed the bug. Apologies for the readability of the python version.

1

To add to the collection, here are perl one-liners for deflate/inflate/raw deflate/raw inflate.

Deflate

perl -MIO::Compress::Deflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Compress::Deflate::deflate(\$in, \$out); print $out;'

Inflate

perl -MIO::Uncompress::Inflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Uncompress::Inflate::inflate(\$in, \$out); print $out;'

Raw deflate

perl -MIO::Compress::RawDeflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Compress::RawDeflate::rawdeflate(\$in, \$out); print $out;'

Raw inflate

perl -MIO::Uncompress::RawInflate -e 'undef $/; my ($in, $out) = (<>, undef); IO::Uncompress::RawInflate::rawinflate(\$in, \$out); print $out;'
0
const zlib = require("zlib");
const adler32 = require("adler32");
const data = "hello world~!";
const chksum = adler32.sum(new Buffer(data)).toString(16);
console.log("789c",zlib.deflateRawSync(data).toString("hex"),chksum);
// or
console.log(zlib.deflateSync(data).toString("hex"));
2
  • 1
    Can you explain what this code does and how to run it? That might help people in deciding if your answer is useful for them.
    – mwfearnley
    Commented Apr 11, 2022 at 10:38
  • 1
    This is Javascript, right?
    – JakeRobb
    Commented Jun 30, 2022 at 14:01

Not the answer you're looking for? Browse other questions tagged or ask your own question.