1

I've just been handed a pile of LTO-5s and a tape library, and the exciting task of extracting a subset of files.

Order of magnitude is 60 tapes from said pile, around 10,000 files of 70,000 per tape (in maybe half a dozen or a dozen asstd directories) that I need to pull. Tape library is running tar 1.23

I do have pre-existing dumps of each tape's content as per tar --list so i'm happy to reconcile this with my list of files I need to pull so I can feed in a proper stream of records to grab, but

tar -xvf /dev/nst0 -b $file_to_pull

doesn't halt until i get to the end of the tape. Which makes sense - tar might be fed a wildcard pattern so it just goes to end of archive.

I guess I could solve this by just feeding in the directory glob and purging unwanted files once they're off tape, but I can't help but think there's got to be a way to halt tar at the EOF boundary and skip the purge step. Right? In all my hunting however, I've not found anything of that nature though..

So, questions:

  • Is there a way to get tar (or heck, anything else) to pull just one file from wherever the tape head is, and then stop? Or to signal tar once it's gotten an EOF and stop?

  • Alternately, am I mentally attacking this an odd way? Happy to take alt suggestions if anyone's got them.

Noting of course, this isn't a problem that's unsolvable right now, just my current options seem to be really awkward - this is the first time I've really had to deal with tape at this magnitude (and our other tapes are all LTFS)

1 Answer 1

2

You can ask tar to run an action every few blocks at what they call checkpoints. This action can test if the wanted file has been extracted and if so kill the tar. I tried it using a tar file and it seems to work ok.

Here's my example script to do my test, tarring /usr/bin and extracting usr/bin/bash into /tmp/usr/bin/bash. The default --checkpoint is 10 blocks.

#!/bin/bash

cat <<\! >/tmp/checkdone
#!/bin/bash
# env has TAR_CHECKPOINT TAR_ARCHIVE TAR_VERSION TAR_BLOCKING_FACTOR
# tar -C directory is NOT used for checkpoint action!
want=$1

if size=$(stat --printf='%s\n' "$want" 2>&1)
then if [ "$(</tmp/lastsize)" = "$size" -a -s /tmp/pid ]
    then  echo "same size $size. time to stop"
          ls -l "$want"
          >/tmp/lastsize
          kill -1 $(</tmp/pid)
    else  echo "partial size $size"
          echo "$size" >/tmp/lastsize
    fi
else echo -n "."
fi
!
chmod +x /tmp/checkdone
>/tmp/lastsize
>/tmp/pid

tar -cf /tmp/tar /usr/bin/ # create example tar file
# wanted file. must be in current dir
want=usr/bin/bash
cd /tmp || exit # dont use tar -C dir

tar -xvf /tmp/tar "$want" --checkpoint=10 --checkpoint-action=exec="/tmp/checkdone $want" &
echo $! >/tmp/pid
wait

rm /tmp/tar /tmp/pid /tmp/lastsize /tmp/checkdone
rm -fr /tmp/usr
3
  • Hm, cheers, I think this might be the closest I'll get (doing some tests now) without having to rewrite something low-level, but in the process of building the tape catalog (which I might do anyway for future reference) to get filesizes to know when to halt, I'm already having to do a single pass of the tape at which point I may as well just have done a full dump. Dammit, tar! waves fist
    – tanantish
    Commented Jun 21, 2015 at 11:28
  • From what I remember of tape drives (10+ years ago) you might find the drive wont like stopping part way through a file, and will wind on to an inter-file gap anyway! Good luck, and my sympathies.
    – meuh
    Commented Jun 21, 2015 at 15:39
  • Ah well, it's tape, so the people who are asking me to treat it like a random access filesystem are getting an education in the process. I've just accepted that I might as well burn a couple of terabytes of disk so yeah, the drives are chattering away, and I've got a secondary process to prune unwanted files when the move from staging to live occurs. The things you do :P
    – tanantish
    Commented Jun 22, 2015 at 5:54

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .