I'm writing an importer for the GIFTI file format. The details of the format are not particularly important, but the basic idea is that it is a relatively simple XML file which includes binary arrays of 32-bit floating point numbers that are represented as "consecutive ASCII characters that are a Base64 text representation of the gzipped binary data". Accordingly, I thought that, given a string (variable name: data) containing the base64 ascii characters, the proper way to extract these would be:
(* 1 *) extractedData = ImportString[data, {"Base64", "GZIP", "Real32"}];
I know that the data string is correctly encoded because I can read it in correctly using other programs, including Matlab and other widely-used GIFTI readers. The code in (1), however, produces incorrect numbers, including Indeterminate values.
On closer inspection, I discovered that the base64 import works fine:
(* 2 *) decodedData64 = ImportString[data, {"Base64", "Binary"}];
The code in (2) produces the same sequence of bytes as many other programs, including the Mac OSX command 'base64 --decode ...') but the GZIP import seems to do nothing:
(* 3 *) decodedData64 == ImportString[FromCharacterCode[decodedData64], {"GZIP","Binary"}]
(* Out[]= True *)
What really has me confused is that the following code produces the correct results (albeit very slowly):
(* 4 *) <<JLink`
InstallJava[];
JavaDecode[data_] := With[
{iis = JavaNew[
"java.util.zip.InflaterInputStream",
JavaNew[
"java.io.ByteArrayInputStream",
data]]},
Most @ NestWhileList[(iis@read[]) &, iis@read[], # != -1 &]];
decodedData = JavaDecode[decodedData64];
reals = ImportString[FromCharacterCode[decodedData], "Real32"];
Execution of the code block (4) puts the correct list of real numbers in the reals variable. However, writing out the decodedData64 variable's contents as a binary file and attempting to gunzip it on the terminal fails (not gzip format). Note, also, that I've tried many combinations of writing the data to a file and importing it directly (rather than importing from a string), so I do not believe this is an ImportString issue.
It seems likely to me that either (A) the GIFTI file spec incorrectly names the gzip compression algorithm as that used in the format, or (B) Mathematica is not correctly unzipping the data. B seems pretty unlikely considering that gunzip itself fails.
According to the documentation for java.util.zip.InflaterInputStream, the compression algorithm used is the "deflate" algorithm, and the class is the bases for the GZIPInputStream class. My questions are these:
Does anyone know what the InflaterInputStream is doing, since it is apparently not gunzipping the data?
Does anyone know the (most elegant) correct way to unpack this data in Mathematica from a string?
As a test case, the following data has been gzipped and base64 encoded using an external GIFTI-compatible program; it should correctly decode into the list Range[0.0, 1.0, 0.05]:
data = "eJwNylEVgDAMQ9EIwAAGMNDvNQiYAQzUAAZmY7MxG60NdJC/vHsCAJW9VWZb83RtB4avOd1sq9MjPhlYeVAfRlw0MwK3rMseWche2eAPY5cekQ==";
decoded = JavaDecode[data];
ImportString[FromCharacterCode[decoded], "Real32"]
(* Out[]= {0., 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.} *)
Edit:
Please see Mark Adler's answer below for an explanation of why the decoding of these data fail. Given the nature of the failure, I've decided to fix up the JLink code that performs the ZLib decoding; this function runs reasonably quickly and relies on only JLink and the Java core classes:
Base64ZLibDecode[string_String] := With[
{x = Apply[
Join,
First @ Last @ Reap @ JavaBlock[
With[
{inflater = JavaNew[
"java.util.zip.InflaterInputStream",
JavaNew[
"java.io.ByteArrayInputStream",
ImportString[string, {"Base64","Binary"}]]],
ar = JavaNew["[B", 1024]},
While[
inflater@available[] != 0,
With[
{k = inflater@read[ar, 0, 1024]},
If[k > 0, Sow[JavaObjectToExpression[ar][[1 ;; k]]]]]]]]]},
(* Java bytes can be negative, but we need positives for FromCharacterCode *)
(-Sign[x] + 1)/2 * (x + 256) + (Sign[x] + 1)/2 * x];