2

I am using GNU tar to process a tar(layer of a docker image) to modify some jars in that. I am doing:

  • save image to disk as tar
  • extract it, so I have each layer in a dir
  • enter each layer, I have a layer.tar, a json and a VERSION
  • iterate all */*.jar file in layer.tar, trying to find some class file
  • if I found them, extract the jar with file tree structure, remove the class file from it, and put it back to layer.tar, overwriting the original jar
  • package each layer back to a new tar, use docker to load it and push later(not done yet)

I created a script for this, which almost does the work, but with 2 jars one besides another, one with the class to remove, and another without it.

#!/bin/bash

# tar needs find to package without ".". u for update, c for create
function pack_all_without_period() {
    find $1 -printf "%P\n" -type f -o -type l -o -type d | sudo tar -$3vf $2 --no-recursion -C $1 -T -
}

if [ -z $1 ]; then
    printf "Save the image as tar, extract, and enter each layer to remove the vulnerable classes(JMSAppender/SocketServer/SimpleSocketServer)\nPlease provide the image name. \n"
    exit 1
fi
dir="log4j-1.x-fix"
image_tar=amq-image-to-fix.tar
if [ ! -d $dir ]; then 
    mkdir $dir
fi
# save image to tar
docker save $1 -o $image_tar
# extract tar
tar xf $image_tar -C $dir
# each layer is extracted to a folder, each folder has a "layer.tar". 
# Go into each folder, extract `layer.tar`, and use `jar` to remove the classes
# and package them back to `layer.tar` (-a to append), and delete the extracted folders.
# at last, package all layers + manifest.json and so back into another tar, WITHOUT COMPRESSION
cd $dir
# enter layer and exit
for layer in */; do
    echo Processing layer $layer
    cd $layer
    # tar does not support overwrite, as tape cannot be overwritten; so I wanted to remove the original jar from tar, 
    # then append it back with tar -u/-A/-r; but then I found tar --delete is extremely slow(by design)
    # so at last I have to extract all files and package them back
    mkdir temp
    sudo tar --extract --directory=temp --file layer.tar --wildcards "*.jar"   # file tree is preserved, so package them back is easy
    if [[ $? -eq 0 ]]; then 
        for f in $(find . -mindepth 2 -name "*.jar" -not -type l -printf "%P\n"); do # exclude jolokia.jar(link)
            sudo jar -tvf $f | grep -E "(*JMSAppender*.class|*SocketServer.class|*log4j*.class)"
            if [[ $? -eq 0 ]]; then
                echo Found classes in $f
                read -p "Do you want to remove these classes? (Y/N) " option
                if [[ $option == 'Y' || $option == 'y' ]]; then
                    echo Removing class file from $f
                    sudo zip -d $f "*JMSAppender.class" "*SocketServer.class" "*SimpleSocketServer.class"
                    ######### here I need to delete the original jar with the classes I just deleted, but I don't know how ############
                else continue
                fi
            else
                continue
            fi

        done
        # append folders to tar, without leading "."
        echo Appending modified folders to layer.tar anew
        pack_all_without_period temp layer.tar r
    fi
    sudo rm -r $(find . -maxdepth 1 -mindepth 1 -type d -print)
    cd .. # back to $dir
done
cd ..

# tar will always include a folder "." as root. This function get rid of it, so the archive
# only contains the content of the folder
# compress will preserve ownership and group by default; and to extract while preserving the same info,
# we use '--same-owner', which is used by default when using sudo. 
# again, append all layers and files to new tar, without leading "."
echo after processing all layers, we are at $(pwd)
pack_all_without_period $dir amq-image-fixed.tar c
sudo rm -Irv $dir $image_tar




but I found that:

  1. tar can only append, will not overwrite. So I changed it so I would first delete the original jar in layer.tar then append
  2. Then I found that tar --delete some/path/foo.tar does not work with tar --file xxx --delete path-to-jar. GNU tar documentation claims that --delete works in pipe of stdin and stdout(https://www.gnu.org/software/tar/manual/html_node/delete.html) But what is the correct syntax? I tried these but not working:
    sudo tar tf layer.tar $f | sudo tar --delete #not deleting
    sudo tar xf layer.tar --exclude $f | sudo tar cf layer.tar -T -  # create tar of size 0

Some more considerations:

  • I don't want to extract all files, as each layer contains /usr or /boot that I don't want to deal with. My jars are basically under /opt or so(not 100% sure)
  • I need to preserve the ownership/timestamp and so. That's why I use sudo(but not sure if that can achieve my purpose)

I use the script like this:

./remove-log4j-1.x-classes.sh registry.access.redhat.com/jboss-amq-6/amq63-openshift:1.4-44.1638430186

Please help, thanks!

EDIT: I now try with:

tar tf layer.tar -O | tar f - --delete $f > layer-new.tar

or

zcat -f layer.tar | tar f - --delete $f > layer-new.tar

But I fail with error:

tar: opt/amq/lib/optional/log4j-1.2.17.redhat-1.jar: Not found in archive
tar: Exiting with failure status due to previous errors

1 Answer 1

1

Now after checking the version of tar:

$ tar --version
tar (GNU tar) 1.29
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

I went to GNU Tar page and download the latest version, now is 1.34

https://ftp.gnu.org/gnu/tar/tar-latest.tar.gz

Very well organized repo, as it also contains the tests under /tests. Here I found several test cases starting with delete, and in delete02.at, I found the proper syntax(and not surprisingly, it says deleting a member with the archive from stdin was not working properly. Actually it works both with tar 1.29 and 1.34, so you can skip the install of 1.34):


# Deleting a member with the archive from stdin was not working correctly.

AT_SETUP([deleting a member from stdin archive])
AT_KEYWORDS([delete delete02])

AT_TAR_CHECK([
genfile -l 3073 -p zeros --file 1
cp 1 2
cp 2 3
tar cf archive 1 2 3
tar tf archive
cat archive | tar f - --delete 2 > archive2
echo separator
tar tf archive2],
[0],
[1
2
3
separator
1
3
])

AT_CLEANUP

So, the syntax now is:

cat tar_archive | tar f - --delete <filename_to_delete> > another_archive

You use cat to get content of tarball, pipe (|) to tar itself, and with file to process coming from stdin (-, now is the pipe of cat), and delete and redirect (>) to another file. After this you can rename this new file to the original archive name to replace. You CANNOT edit in place, however.

If you want to install it, use ./configure && sudo make && sudo make install. Strangely, it does not replace tar 1.29 under /bin, but installs in /usr/local/bin/tar.

So the complete script now is:

#!/bin/bash

tar=/usr/local/bin/tar # or tar=/bin/tar, the syntax is the same

# tar needs find to package without ".". u for update, c for create
function pack_all_without_period() {
    find $1 -printf "%P\n" -type f -o -type l -o -type d | sudo $tar -$3f $2 --no-recursion -C $1 -T -
}

if [ -z $1 ]; then
    printf "Save the image as tar, extract, and enter each layer to remove the vulnerable classes(JMSAppender/SocketServer/SimpleSocketServer)\nPlease provide the image name. \n"
    exit 1
fi
dir="fix"
image_tar=amq-image-to-fix.tar
if [ ! -d $dir ]; then 
    mkdir $dir
fi
# save image to tar
docker save $1 -o $image_tar
# extract tar
$tar xf $image_tar -C $dir
# each layer is extracted to a folder, each folder has a "layer.tar". 
# Go into each folder, extract `layer.tar`, and use `jar` to remove the classes
# and package them back to `layer.tar` (-a to append), and delete the extracted folders.
# at last, package all layers + manifest.json and so back into another tar, WITHOUT COMPRESSION
cd $dir
# enter layer and exit
for layer in */; do
    echo Processing layer $layer
    cd $layer
    # tar does not support overwrite, as tape cannot be overwritten; so I wanted to remove the original jar from tar, 
    # then append it back with tar -u/-A/-r; but then I found tar --delete is extremely slow(by design)
    # so at last I have to extract all files and package them back
    sudo $tar --extract --directory=. --file layer.tar --wildcards "*.jar"   # file tree is preserved, so package them back is easy
    if [[ $? -eq 0 ]]; then 
        for f in $(find . -mindepth 1 -name "*.jar" -not -type l -printf "%P\n"); do # exclude jolokia.jar(link)
            sudo jar -tvf $f | grep -E "(*JMSAppender*.class|*SocketServer.class|*log4j*.class)"
            if [[ $? -eq 0 ]]; then
                echo Found classes in $f
                read -p "Do you want to remove these classes? (Y/N) " option
                if [[ $option == 'Y' || $option == 'y' ]]; then
                    echo Removing class file from $f
                    sudo zip -d $f "*JMSAppender.class" "*SocketServer.class" "*SimpleSocketServer.class"
                    ######### here the correct syntax, finally #########
                    cat layer.tar | tar f - --delete $f > layer-new.tar
                    sudo mv layer-new.tar layer.tar
                    tar -rf layer.tar $f
                else continue
                fi
            else
                continue
            fi

        done
        
        sudo rm -r $(find . -maxdepth 1 -mindepth 1 -type d -print)
    fi
    cd .. # back to $dir
done

cd ..

# tar will always include a folder "." as root. This function get rid of it, so the archive
# only contains the content of the folder
# compress will preserve ownership and group by default; and to extract while preserving the same info,
# we use '--same-owner', which is used by default when using sudo. 
# again, append all layers and files to new tar, without leading "."
echo after processing all layers, we are at $(pwd)
pack_all_without_period $dir amq-image-fixed.tar c
sudo rm -Irv $dir $image_tar

4
  • 1
    I'm pretty sure this cat tar_archive | tar f - --delete filename_to_delete >another_archive can be simplified to <tar_archive tar f - --delete filename_to_delete >another_archive and probably even to tar f tar_archive --delete filename_to_delete >another_archive. Commented Feb 11, 2022 at 15:57
  • I understand the first simplified version, and for the 2nd, you mean the deletion initially not working/taking toooo long, because I didn't redirect it to another archive right? Let me test again
    – WesternGun
    Commented Feb 11, 2022 at 16:00
  • With your 2nd form, I have error: tar: opt/amq/activemq-all-5.11.0.redhat-630495.jar: Not found in archive, and the resulting tar is much smaller than the original one. It stops at lib/ and did not process dirs afterwards, as tar errors out. The first form works the same. Thanks for helping. I see this tar is so widely used but still with much myth around it.
    – WesternGun
    Commented Feb 11, 2022 at 16:04
  • Oh well, I wrote "probably", not "certainly". Anyway, it's nice you have solved your problem. Upvoted. Commented Feb 11, 2022 at 16:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .