16

I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.

I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).

I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.

Maybe it possible decompress TAR with bare C#?

8 Answers 8

17

While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.

While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.

Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.

I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.

The primary method is this:

public static void ExtractTar(Stream stream, string outputDir)
{
    var buffer = new byte[100];
    while (true)
    {
        stream.Read(buffer, 0, 100);
        var name = Encoding.ASCII.GetString(buffer).Trim('\0');
        if (String.IsNullOrWhiteSpace(name))
            break;
        stream.Seek(24, SeekOrigin.Current);
        stream.Read(buffer, 0, 12);
        var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);

        stream.Seek(376L, SeekOrigin.Current);

        var output = Path.Combine(outputDir, name);
        if (!Directory.Exists(Path.GetDirectoryName(output)))
            Directory.CreateDirectory(Path.GetDirectoryName(output));
        using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
        {
            var buf = new byte[size];
            stream.Read(buf, 0, buf.Length);
            str.Write(buf, 0, buf.Length);
        }

        var pos = stream.Position;

        var offset = 512 - (pos  % 512);
        if (offset == 512)
            offset = 0;

        stream.Seek(offset, SeekOrigin.Current);
    }
}

And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.

public static void ExtractTarGz(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTarGz(stream, outputDir);
}

public static void ExtractTarGz(Stream stream, string outputDir)
{
    // A GZipStream is not seekable, so copy it first to a MemoryStream
    using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
    {
        const int chunk = 4096;
        using (var memStr = new MemoryStream())
        {
            int read;
            var buffer = new byte[chunk];
            do
            {
                read = gzip.Read(buffer, 0, chunk);
                memStr.Write(buffer, 0, read);
            } while (read == chunk);

            memStr.Seek(0, SeekOrigin.Begin);
            ExtractTar(memStr, outputDir);
        }
    }
}

public static void ExtractTar(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTar(stream, outputDir);
}

Here is a gist of the full file with some comments.

7
  • 2
    FYI. Your Path.Join call is only valid in .NET Core 2.1. To make it more universal use Path.Combine.
    – Doug S
    Commented Aug 24, 2018 at 1:06
  • @DougS Good call, didn't even notice I did that. Have been messing around with Ruby lately, had "join" on the mind, lol. Corrected. Commented Aug 24, 2018 at 1:09
  • And unfortunately this code error'ed out on the sample file I tested it with. The error occurred at var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8); with System.FormatException: Additional non-parsable characters are at the end of the string. at System.ParseNumbers.StringToLong(String s, Int32 radix, Int32 flags, Int32* currPos).
    – Doug S
    Commented Aug 24, 2018 at 2:07
  • 1
    I also got that same error (on the same file too) @DougS. It appears it has trailing white space when trying to determine the size for a directory. After that, it also complained about creating 0 sized files for directory entries. Have forked @ForeverZer0's gist with rudimentary fixes for my use-case. Commented Jan 31, 2019 at 22:45
  • 1
    I did not try your solution, but thanks for the effort of making something like that before it was available in .NET itself.
    – Andreas
    Commented Jan 9 at 7:40
11

.NET 7 added several classes to work with TAR files:

Extract to a directory:

await TarFile.ExtractToDirectoryAsync(tarFilePath, outputDir);

Enumerate a TAR file and manually extract its entries:

await using var tarStream = new FileStream(tarFilePath, new FileStreamOptions { Mode = FileMode.Open, Access = FileAccess.Read, Options = FileOptions.Asynchronous });
await using var tarReader = new TarReader(tarStream);
TarEntry entry;
while ((entry = await tarReader.GetNextEntryAsync()) != null)
{
  if (entry.EntryType is TarEntryType.SymbolicLink or TarEntryType.HardLink or TarEntryType.GlobalExtendedAttributes)
  {
     continue;
  }

  Console.WriteLine($"Extracting {entry.Name}");
  await entry.ExtractToFileAsync(Path.Join(outputDirectory, entry.Name));
}
1
  • Microsoft should make this namespaces more Example docs. Can you explain how to extract or create the tar.gz files? Commented Aug 30, 2023 at 0:15
9

Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.

using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;

private static String directoryPath = @"C:\Temp";

public static void unTAR(String tarFilePath)
{
    using (Stream stream = File.OpenRead(tarFilePath))
    {
        var reader = ReaderFactory.Open(stream);
        while (reader.MoveToNextEntry())
        {
            if (!reader.Entry.IsDirectory)
            {
                ExtractionOptions opt = new ExtractionOptions {
                    ExtractFullPath = true,
                    Overwrite = true
                };
                reader.WriteEntryToDirectory(directoryPath, opt);
            }
        }
    }
}
2
  • 2
    thanks for the answer! By way of an update in 2020, the ExtractOptions are now done via instantiation. For example new ExtractionOptions() { ExtractFullPath = true, Overwrite = true} in the constructor of WriteEntryToDirectory. See this link
    – joshmcode
    Commented Mar 18, 2020 at 13:04
  • Well.. the question was about NOT using 3rd party libraries.
    – Andreas
    Commented Jan 9 at 7:41
3

See tar-cs

using (FileStream unarchFile = File.OpenRead(tarfile))
{
    TarReader reader = new TarReader(unarchFile);
    reader.ReadToEnd("out_dir");
}
1
  • 1
    tar-cs failed on some tar files. So I used the Nuget package SharpZipLib instead Commented Jan 22, 2019 at 22:19
2

Since you are not allowed to use outside libraries, you are not restricted to a specific format of the tar file either. In fact, they don't even need it to be all in the same file.

You can write your own tar-like utility in C# that walks a directory tree, and produces two files: a "header" file that consists of a serialized dictionary mapping System.IO.Path instances to an offset/length pairs, and a big file containing the content of individual files concatenated into one giant blob. This is not a trivial task, but it's not overly complicated either.

1

Because of the updates from dotnet 7.0, we can do it all fairly simply with standard dotnet libraries. Here is the solution:

public async Task UnzipToDirectory(
    Stream compressedSource,
    string destinationDirectory,
    CancellationToken cancellationToken = default
)
{
    if (!Directory.Exists(destinationDirectory))
        Directory.CreateDirectory(destinationDirectory);

    await using MemoryStream memoryStream = new();
    await using (GZipStream gzipStream = 
        new(compressedSource, CompressionMode.Decompress))
    {
        await gzipStream.CopyToAsync(memoryStream, cancellationToken);
    }
    memoryStream.Seek(0, SeekOrigin.Begin);
    await TarFile.ExtractToDirectoryAsync(
        memoryStream,
        destinationDirectory,
        overwriteFiles: true,
        cancellationToken: cancellationToken
    );
}
0

Based off ForeverZer0's answer. Fixed some issues. It uses significantly less memory by avoiding stream copies, and handles larger archives and longer filenames (prefix tag). This still doesnt handle 100% of the USTAR tar specification.

public static void ExtractTarGz(string filename, string outputDir)
{
    void ReadExactly(Stream stream, byte[] buffer, int count)
    {
        var total = 0;
        while (true)
        {
            int n = stream.Read(buffer, total, count - total);
            total += n;
            if (total == count)
                return;
        }
    }

    void SeekExactly(Stream stream, byte[] buffer, int count)
    {
        ReadExactly(stream, buffer, count);
    }

    using (var fs = File.OpenRead(filename))
    {
        using (var stream = new GZipStream(fs, CompressionMode.Decompress))
        {
            var buffer = new byte[1024];
            while (true)
            {
                ReadExactly(stream, buffer, 100);
                var name = Encoding.ASCII.GetString(buffer, 0, 100).Split('\0')[0];
                if (String.IsNullOrWhiteSpace(name))
                    break;

                SeekExactly(stream, buffer, 24);

                ReadExactly(stream, buffer, 12);
                var sizeString = Encoding.ASCII.GetString(buffer, 0, 12).Split('\0')[0];
                var size = Convert.ToInt64(sizeString, 8);

                SeekExactly(stream, buffer, 209);

                ReadExactly(stream, buffer, 155);
                var prefix = Encoding.ASCII.GetString(buffer, 0, 155).Split('\0')[0];
                if (!String.IsNullOrWhiteSpace(prefix))
                {
                    name = prefix + name;
                }

                SeekExactly(stream, buffer, 12);

                var output = Path.GetFullPath(Path.Combine(outputDir, name));
                if (!Directory.Exists(Path.GetDirectoryName(output)))
                {
                    Directory.CreateDirectory(Path.GetDirectoryName(output));
                }
                using (var outfs = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
                {
                    var total = 0;
                    var next = 0;
                    while (true)
                    {
                        next = Math.Min(buffer.Length, (int)size - total);
                        ReadExactly(stream, buffer, next);
                        outfs.Write(buffer, 0, next);
                        total += next;
                        if (total == size)
                            break;
                    }
                }

                var offset = 512 - ((int)size % 512);
                if (offset == 512)
                    offset = 0;

                SeekExactly(stream, buffer, offset);
            }
        }
    }
}
-1

there are 2 ways to compress/decompress in .NET first you can use Gzipstream class and DeflatStream both can actually do compress your files in .gz format so if you compressed any file in Gzipstream it can be opened with any popular compression applications such as winzip/ winrar, 7zip but you can't open compressed file with DeflatStream. these two classes are from .NET 2.

and there is another way which is Package class it's actually same as Gzipstream and DeflatStream the only different is you can compress multiple files which then can be opened with winzip/ winrar, 7zip.so that's all .NET has. but it's not even generic .zip file, it something Microsoft uses to compress their *x extension office files. if you decompress any docx file with package class you can see everything stored in it. so don't use .NET libraries for compressing or even decompressing cause you can't even make a generic compress file or even decompress a generic zip file. you have to consider for a third party library such as http://www.icsharpcode.net/OpenSource/SharpZipLib/

or implement everything from the ground floor.

Not the answer you're looking for? Browse other questions tagged or ask your own question.