Is there a way to ignore header lines in a UNIX sort?

Question

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.

The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).

Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.

Example:

:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00

should sort to:

:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

For the record: the command line I'm using so far is "sort -t\\ -k1.1,1.6 <file>" [the data can contain spaces, but will never contain a backslash] — Rob Gilliam, Commented Jan 28, 2013 at 12:51

BobS · Accepted Answer · 2013-01-28 15:35:02Z

166

(head -n 2 <file> && tail -n +3 <file> | sort) > newfile

The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.

edited Jan 28, 2013 at 15:35

answered Jan 28, 2013 at 13:03

BobS

2,6181 gold badge16 silver badges15 bronze badges

2

Thanks; I'm accepting this answer as it seems most complete and concise (and I understand what it's doing!) - it should be "head -n 2", though :-)
– Rob Gilliam
Commented Jan 28, 2013 at 14:18
11

Is there a way to have this version work on piped-in data? I tried with tee >(head -n $header_size) | tail -n +$header_size | sort, but head seems to run after the tail|sort pipe, so the header ends up printed in the end. Is this deterministic or a race condition?
– Damien Pollet
Commented Nov 17, 2014 at 17:34
You could probably piece together something where you use cat to redirect the stdin to a temporary file, then run the above command on that new file, but it's starting to get ugly enough that it's probably better to use one of the awk-based solutions given in the other responses.
– BobS
Commented Nov 23, 2014 at 0:04
@DamienPollet: See Dave's answer.
– Jonathan Leffler
Commented Feb 1, 2015 at 22:38
1

@DamienPollet: See freeseek's answer
– fess .
Commented May 4, 2015 at 3:55

| Show 3 more comments

wjandrea · Accepted Answer · 2023-04-03 18:51:42Z

124

If you don't mind using awk, you can take advantage of awk's built-in pipe abilities, e.g.

extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'

This prints the first two lines verbatim and pipes the rest through sort.

Note that this has the very specific advantage of being able to selectively sort parts of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.

edited Apr 3, 2023 at 18:51

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Mar 9, 2014 at 11:54

Dave

3,4832 gold badges17 silver badges14 bronze badges

5

Very nice, and it works with arbitrary pipes, not only files!
– lapo
Commented Nov 24, 2014 at 16:16
10

Beautiful, awk never stops surprising me. Also, you don't need the $0, print is enough.
– nachocab
Commented Jan 28, 2015 at 20:50
2

@SamWatkins freeseek's answer is less ugly.
– fess .
Commented May 4, 2015 at 3:56
What's the -r option doing to sort? Is this supposed to be reverse sort?
– W7GVR
Commented May 14, 2015 at 20:55
I prefer this awk approach by @Dave as it works with arbitrary pipes as opposed to @BobS 's subshell approach which works with files only.
– porg
Commented Oct 14, 2022 at 13:10

Add a comment |

ryenus · Accepted Answer · 2023-10-13 12:35:23Z

105

In simple cases, sed can do the job elegantly:

your_script | (sed -u 1q; sort)

or equivalently,

cat your_data | (sed -u 1q; sort)
cat your_data | { sed -u 1q; sort; }  # to avoid the subshell

The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).

For the example given, 2q will do the trick.

The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.

edited Oct 13, 2023 at 12:35

ryenus

16.8k5 gold badges63 silver badges65 bronze badges

answered May 15, 2019 at 14:31

Andrea

1,0591 gold badge7 silver badges3 bronze badges

3

IMO this is the simplest solution here and easiest to remember. It works with piped data with no special considerations or awkward quoting and escaping, and does not need to be used multiple times if you are sorting on multiple columns by a chain of piped sort commands with the -s flag. eg. bgzip -dc somefile.tsv.gz | (sed -u 2q; sort -k 3,3 -n | sort -k 2,2 -n -s | sort -k 1,1 -s) | bgzip -c > my_sorted_file.tsv.gz. Key though is the edit adding the -u flag which ought to have solved @RobGilliam's problem above.
– slowkoni
Commented Dec 14, 2020 at 20:53
2

Can you explain a bit how pipe and the paenthesis work?
– user746461
Commented Mar 17, 2021 at 11:49
I would use head -n 1 instead of sed -u 1q. This head command is POSIX and much more portable than dealing with sed's -u flag.
– dan
Commented Jun 3, 2022 at 1:11
1

@dan On my system, head seems to be buffered like sed, so that doesn't work. (Ubuntu 20.04, head (GNU coreutils) 8.30)
– wjandrea
Commented Apr 3, 2023 at 18:45
1

@Gqqnbig The parentheses create a subshell. Speaking of that, you could actually use braces instead, which don't create a subshell, just a group of commands, though you'd need to add spaces plus a semicolon at the end: ... { sed -u 1q; sort; }. The pipe works exactly like a regular pipe, but the input goes into the group of commands instead of one command.
– wjandrea
Commented Apr 3, 2023 at 18:49

| Show 2 more comments

wjandrea · Accepted Answer · 2023-04-03 18:52:26Z

51

Here is a version that works on piped data:

(read -r; printf "%s\n" "$REPLY"; sort)

If your header has multiple lines:

(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)

This solution is from here

edited Apr 3, 2023 at 18:52

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Dec 8, 2014 at 23:11

Giulio Genovese

2,8111 gold badge17 silver badges12 bronze badges

16

nice. for the single header case I use extract_data | (read h; echo "$h"; sort) it's short enough to remember. your example covers more edge cases. :) This is the best answer. works on pipes. no awk.
– fess .
Commented May 4, 2015 at 3:51
2

Ok, I straced this and it seems that bash goes to special lengths to make this work. In general, if you coded this in C or another language it would not work because stdio would read more than just the first header line. If you run it on a seekable file, bash reads a larger chunk (128 bytes in my test), then lseeks back to after the end of the first line. If you run it on a pipe, bash reads one char at a time until it passes the end of the line.
– Sam Watkins
Commented May 5, 2015 at 9:01
Nice! If you just want to eat the header, it's even easier to remember: extract_data | (read; sort)
– Jason Suárez
Commented Jan 27, 2017 at 18:53
1

This one is almost perfect but you need to use "IFS= read" instead of "read" to keep leading and trailing spaces.
– Stanislav German-Evtushenko
Commented Jun 23, 2017 at 11:27
11

This should be the accepted answer in my opinion. Simple, concise and more flexible in that it also works on piped data.
– Paul I
Commented Nov 21, 2017 at 23:06

Add a comment |

wjandrea · Accepted Answer · 2023-04-03 18:55:36Z

5

head -2 <your_file> && nawk 'NR>2' <your_file> | sort

example:

> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1

edited Apr 3, 2023 at 18:55

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Jan 28, 2013 at 13:13

Vijay

66.7k90 gold badges234 silver badges325 bronze badges

Add a comment |

wjandrea · Accepted Answer · 2023-04-03 18:56:45Z

5

You can use

tail -n +3 <file> | sort ...

tail will output the file contents from the 3rd line.

edited Apr 3, 2023 at 18:56

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Jan 28, 2013 at 12:56

Anton Kovalenko

21.4k2 gold badges40 silver badges70 bronze badges

4

But this loses the header, which is not desired.
– Serge Stroobandt
Commented Apr 10, 2022 at 20:40
Losing the header, but otherwise it’s the simplest and I’m most likely to remember. But I really want to print the first line (from a command’s output) then sort the rest of the output by column 6, and print that to the screen. I’ll ask that as a new question.
– Old Uncle Ho
Commented Mar 20 at 15:50
@OldUncleHo regarding "Losing the header, but otherwise..." - retaining the header is the only interesting thing about this question so any answer that doesn't retain the header isn't answering the question and that's why it appears to be simple, because it doesn't do the only interesting thing that the question requires of an answer.
– Ed Morton
Commented Mar 20 at 18:32

Add a comment |

wjandrea · Accepted Answer · 2023-04-03 18:55:02Z

So here's a bash function where arguments are exactly like sort. Supporting files and pipes.

function skip_header_sort() {
    if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then
        local file=${@: -1}
        set -- "${@:1:$(($#-1))}"
    fi
    awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}

How it works. This line checks if there is at least one argument and if the last argument is a file.

    if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then

This saves the file to separate argument. Since we're about to erase the last argument.

        local file=${@: -1}

Here we remove the last argument. Since we don't want to pass it as a sort argument.

        set -- "${@:1:$(($#-1))}"

Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.

    awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file

Example usage with a comma separated file.

$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1

# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0

# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0

wjandrea · Accepted Answer · 2023-04-03 18:59:16Z

3

It only takes 2 lines of code...

head -1 test.txt > a.tmp
tail -n+2 test.txt | sort -n >> a.tmp

For a numeric data, -n is required. For alpha sort, the -n is not required.

Example file:

$ cat test.txt
header
8
5
100
1
-1

Result:

$ cat a.tmp
header
-1
1
5
8
100

edited Apr 3, 2023 at 18:59

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Feb 1, 2015 at 21:05

Ian Sherbin

391 bronze badge

3

Isn't this basically the same answer as the accepted answer? (Except BobS's approach puts the result on stdout, allowing you to send the result through other filters before being written to file, if necessary)
– Rob Gilliam
Commented Feb 2, 2015 at 9:57

Add a comment |

Ed Morton · Accepted Answer · 2024-03-20 18:43:28Z

Applying the Decorate-Sort-Undecorate idiom using any version of the mandatory POSIX tools awk, sort, and cut:

$ awk -v OFS='\t' '{print (NR>2), $0}' file | sort -k1 -k2 | cut -f2-
:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

Works just fine on incoming piped input too:

$ cat file | awk -v OFS='\t' '{print (NR>2), $0}' | sort -k1 -k2 | cut -f2-
:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

Darren Bishop · Accepted Answer · 2022-10-26 15:11:34Z

0

Another simple variation on all the others, reading a file once

HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat

answered Oct 26, 2022 at 15:11

Darren Bishop

2,53525 silver badges20 bronze badges

1

Doesn't work in a pipe. Seems like head is buffered, so it reads in a block just to discard most of it, and sort never receives that data. Use Andrea's answer instead.
– wjandrea
Commented Apr 3, 2023 at 19:07
OP doesn’t mention anything about requiring pipe; use process substitution, if available < <(your-script)
– Darren Bishop
Commented Apr 4, 2023 at 22:55
Process substitution does the same thing for me (running Ubuntu 20.04). I know OP didn't mention it, so I didn't downvote, just wanted to comment for the sake of anyone else using a pipe.
– wjandrea
Commented Apr 4, 2023 at 23:00

Add a comment |

wjandrea · Accepted Answer · 2023-04-03 19:01:27Z

Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:

$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r

The shell function:

hsort ()
{
   if [ "$1" == "-h" ]; then
       echo "Sort a file or standard input, treating the first line as a header.";
       echo "The first argument is the file or '-' for standard input. Additional";
       echo "arguments to sort follow the first argument, including other files.";
       echo "File syntax : $ hsort file [sort-options] [file...]";
       echo "STDIN syntax: $ hsort - [sort-options] [file...]";
       return 0;
   elif [ -f "$1" ]; then
       local file=$1;
       shift;
       (head -n 1 $file && tail -n +2 $file | sort $*);
   elif [ "$1" == "-" ]; then
       shift;
       (read -r; printf "%s\n" "$REPLY"; sort $*);
   else
       >&2 echo "Error. File not found: $1";
       >&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
       return 1 ;
   fi
}

wjandrea · Accepted Answer · 2023-04-03 19:05:55Z

0

This is the same as Ian Sherbin answer but my implementation is :-

cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;

edited Apr 3, 2023 at 19:05

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Mar 5, 2016 at 7:56

Bik

1

Add a comment |

Mike Louis Griebel · Accepted Answer · 2023-12-22 23:56:17Z

Using grep

(plus cat, sort, rm)

I tested this on a MacBook Air (Darwin Kernel Version 20.6.0, etc.). The script uses two working files: f1, f2. The entire input file $1 is copied to f1. From f1 the header and body lines are drawn.

Header lines contain the character ":". We grep them, and store them in f2.

The body lines do not contain the character ":", as far as we know, so we grep the remaining lines (grep -v), sort them, and append the result to f2.

Finally we cat f2 to achieve a result which can be used in a pipe line, and then we remove f1 and f2.

##                             ## Executable file "script"
cat $1 > f1                    ## Or cp $1 f1
grep ":" f1 > f2               ## Header lines contain ":"
grep -v ":" f1 | sort >> f2    ## Body lines do not contain ":"
cat f2                         ## List result
rm f1 f2                       ## Clean up

testfile:

:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00

Invoke the script as "./script testfile" (or "cat testfile | ./script > result" etc.).

:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

The working files are handy for debugging, but force a clean up. I thought one would do, maybe zero, and yes, that works if you invoke it as "./script testfile". It doesn't work in a pipe, e.g. "cat testfile | ./script". You cannot rewind the standard input. But the version with two working files functions in a pipeline.

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. — Community, Commented Dec 20, 2023 at 16:25

RARE Kpop Manifesto · Accepted Answer · 2024-03-20 23:17:27Z

if you don't mind a non-portable solution, then

echo '
:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00' |

gnu-sort -g   # -g := --general-numeric-sort   

bsd-sort -g   # -g := --general-numeric-sort
              #       --sort=general-numeric (either)

:0:12345
:1:6:2:3:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

this assumes the header is in this exact format, and the sorting order might be different for random beyond-ASCII unicode headers

The -g flag on bsd-sort also worked for this input, but I don't have much insight regarding whether they're fully interchangeable for this particular flag.

wjandrea · Accepted Answer · 2023-04-03 19:06:54Z

-1

With Python:

import sys
HEADER_ROWS=2

for _ in range(HEADER_ROWS):
    sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
    sys.stdout.write(row)

edited Apr 3, 2023 at 19:06

wjandrea

31.7k9 gold badges67 silver badges88 bronze badges

answered Oct 21, 2014 at 12:28

crusaderky

2,6523 gold badges20 silver badges28 bronze badges

pre-supposes the system has Python installed (mine doesn't)
– Rob Gilliam
Commented Oct 21, 2014 at 18:56

Add a comment |

Sathish G · Accepted Answer · 2016-03-09 12:22:07Z

-7

cat file_name.txt | sed 1d | sort

This will do what you want.

answered Mar 9, 2016 at 12:22

Sathish G

193 bronze badges

2

1) This only removes the header line and sorts the rest, it doesn't sort everything below the header line leaving the header intact. 2) it removes the first line only, when the header is actually two lines (read the question). 3) Why do you use "cat file_name.txt | sed 1d" when "sed 1d < file_name.txt" or even just "sed 1d file_name.txt" has the same effect?
– Rob Gilliam
Commented Mar 9, 2016 at 13:46

Add a comment |

Collectives™ on Stack Overflow

Is there a way to ignore header lines in a UNIX sort?

16 Answers 16

Not the answer you're looking for? Browse other questions tagged
unix
sorting
command-line
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

Not the answer you're looking for? Browse other questions tagged unixsortingcommand-line or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
unix
sorting
command-line
or ask your own question.