0

I need to load multiple files in my shell script with the same filename but appended in the front of each file name is YYMMDDPERSONNEL Examples: 231102PERSONNEL and 230103PERSONNEL There are many other files in the same directory so filenames are important in the sort. I need to sort files to be loaded from the oldest to the newest.

Currently what I have is only designed to handle one file at a time.

7
  • 2
    Please edit your question and add some example file names and then the order you want those file names to be read in. Make sure to include all relevant edge cases (are all files from this century? What are the ****?)
    – terdon
    Commented Nov 7, 2023 at 17:08
  • Also, do you need to rely on the name or can you use the file's modification date instead?
    – terdon
    Commented Nov 7, 2023 at 17:13
  • Please put that extra information in your question, not in the comments, @teejay Commented Nov 7, 2023 at 19:13
  • Sorting in descending order would give you the youngest not oldest first Commented Nov 7, 2023 at 19:28
  • In an earlier version of your question you mentioned the need to uncompress files, is that no longer needed? Commented Nov 7, 2023 at 19:30

2 Answers 2

1

This is a simple task, with pipelines.

find . -maxdepth 1 -type f -name '*PERSONNEL.gz' -print | \
    sed -e 's%^./%%' | \
    sort | \
    xargs -r /bin/echo 

Read the man pages.

Here's a quick overview:

  • find prints a list of filenames to STDOUT
  • sed makes it look nicer (changes ./231102PERSONNEL.gz to 231102PERSONNEL.gz)
  • sort sorts the list.
  • xargs puts as many files as will fit (see xargs --show-limits </dev/null for your limits) and repeats the "command" for the rest of the list
  • /bin/echo is a placeholder.
1

Shell globs are expanded in lexical order by default and a lexical order in your case matches the chronological (sequential) order of your files as long as they're all from the same century (well from 1900 to 1999 for instance, ignoring the fact that the 20th century actually ran from 1901 to 2000).

With zsh:

set -o extendedglob
for f ( [0-9](#c6)PERSONNEL.sql.gz(N) ) gunzip < $f | your-sql-loader

(Or [0-9](#c6)PERSONNEL.sql.gz(NOn) if you need to Order the files by name, in reverse (captital o) as you initially requested).

That assumes your-sql-loader can take its input from stdin. If it needs it to be passed as a filename argument:

for f ( [0-9](#c6)PERSONNEL.sql.gz(N) ) your-sql-loader <(gunzip < $f)

Or if that filename argument has to be regular file:

for f ( [0-9](#c6)PERSONNEL.sql.gz(N) ) your-sql-loader =(gunzip < $f)

Chances are you may just be able to do:

gunzip -dc -- [0-9](#c6)PERSONNEL.sql.gz | your-sql-loader

That is feed the concatenation of all the uncompressed contents of all the files sorted in reverse to your-sql-loader.

More generally, it's rare to ever have to store an uncompressed version of a file on disk. It's much better to uncompress it on the fly and feed it concurrently to whatever application consumes it.

You'd only need to uncompress it on disk (as is done with the =(...) approach above which gets the output of gunzip in a temp file) if the consumer application doesn't read the data sequentially (which would be surprising here if that's meant to be SQL).

If you really had to use (t)csh as the tag suggests which would be very surprising in this century, the equivalent assuming file names don't contain newline characters would be:

gunzip -dc -- "`ls -rd -- [0-9][0-9][01][0-9][0-3][0-9]PERSONNEL.sql.gz`" | your-sql-loader

Where ls -r sorts the files by name in reverse; with "`...`" its output is retrieved and split into its non-empty line constituents.

Since csh has no equivalent of zsh's [0-9](#c6) or ksh's {6}([0123456789]) to match 6 digits, we match each digit specifically and take the opportunity to be stricter in matching month and day numbers.

If you have files from the past century, you can define a custom sort order in zsh with either the oe[code] or o+function glob qualifiers.

[0-9](#c6)PERSONNEL.sql.gz(Noe['REPLY[1,0]=$(( 19 + ($REPLY[1,2] < 70) )'])

That is instruct zsh to order the glob based on the name of the file to which we've prepended either 20 or 19 depending on whether the first 2 digits make up a number that is more or less than 70 (you'll want to adjust the cut-off year based on your actual dataset; 1970 chosen here because that's the start of Unix epoch time).

2
  • since they mention "oldest to newest", shouldn't the default sort order already be ok? (On) would give newest to oldest.
    – ilkkachu
    Commented Nov 7, 2023 at 20:12
  • Yes, but I initially incorrectly said descending order. The answer was correct to the original question.
    – teejay
    Commented Nov 7, 2023 at 20:17

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .