How to do custom sorting using unix sort?

Question

I'm using unix sort to sort a comma delimited file with multiple columns. Thus far, this has worked perfectly for sorting the data either numerically or in alphabetical order:

Example file before any sorting:

C,United States,WA,Tacoma,f,1
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2
A,United States,NY,New York,f,1

Sort the file: $ sort -t ',' -k 2,2 -k 3,3 -k 4,4 -k 5,5r -k 6,6nr tmp.csv

Sorted result:

A,Bahamas,Bahamas,Nassau,f,2
A,Canada,QC,Montreal,f,2
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,United States,NY,New York,f,1
C,United States,WA,Tacoma,f,1

Here is the issue: I want to sort column 2 based on a custom sort, meaning I want United States first, then Canada, then Bahamas:

Desired sort:

A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,United States,NY,New York,f,1
C,United States,WA,Tacoma,f,1
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2

Is there some way to pass unix sort a custom sort order that it can then apply? Something like: $ sort -t ',' -k 2,2:'United States, Canada, Bahamas' -k 3,3 -k 4,4 -k 5,5r -k 6,6nr tmp.csv

Thanks!

For these three values, you want reverse alphabetic order. For the general case, you'll need to map the names to a sort order number, and then do the sorting using the sort order number. Or go for a scripting language... One possibility is the join command, but you could end up with a lot of sorting — the input files for join must be sorted in one order, and then you'd be using sort again to put the data into a different order (and losing the sort order column as a post-sort step). — Jonathan Leffler, Commented Oct 17, 2012 at 17:37
In your example input, shouldn't there be t instead of f in the last line? — Lev Levitsky, Commented Oct 17, 2012 at 17:56
Lev: yes, good catch. My bad; too much cutting and pasting (my actual data set is much larger and I accidentally grabbed the wrong rows). — jewelia, Commented Oct 17, 2012 at 19:03

Lev Levitsky · Accepted Answer · 2012-10-17 18:06:53Z

11

The other answer and comment answer the question in general, here's how an implementation can look like:

$ cat order
Bahamas,3
Canada,2
United States,1

$ cat data
C,United States,WA,Tacoma,f,1
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2
A,United States,NY,New York,f,1

$ sort -t, -k2 data | join -t, -11 -22 order - | sort -t, -k2n -k4,5 -k6r -k7nr | cut -d, -f 3,1,4-7
A,United States,MA,Boston,f,0
B,United States,NY,New York,f,5
A,United States,NY,New York,f,1
C,United States,WA,Tacoma,f,1
A,Canada,QC,Montreal,f,2
A,Bahamas,Bahamas,Nassau,f,2

answered Oct 17, 2012 at 18:06

Lev Levitsky

4201 gold badge5 silver badges17 bronze badges

Awesome, thanks for your help. This worked perfectly!
– jewelia
Commented Oct 17, 2012 at 20:52
@jewelia Improved once more, sed was not really needed here.
– Lev Levitsky
Commented Oct 17, 2012 at 21:01

Add a comment |

itsbruce · Accepted Answer · 2012-10-17 17:34:50Z

3

You can't do that with sort. At this point, you really should be reaching for awk/perl/your-language-of-choice. You can fudge it, though. You could, for example, use sed to change "United States" to 0, "Canada" to 1 and "Bahamas" to 2, then do a numeric sort against that column, then sed it back. Or change "United States" to "United States,0" etc, sort against the extra column and then discard it.

answered Oct 17, 2012 at 17:34

itsbruce

1615 bronze badges

Add a comment |

Adam Spiers · Accepted Answer · 2013-06-02 22:21:01Z

I just wrote a helper called csort to make it easy to do this. It prefixes each line with a value of your choosing based on substring or regular expression matches within the line:

$ csort -t, '2=United States' X 2=Canada Y 2=Bahamas Z < tmp.csv | \
sort -t, -k1,1 -k3,3 -k4,4 -k5,5 -k6,6r -k7,7nr
X,A,United States,MA,Boston,f,0
X,B,United States,NY,New York,f,5
X,A,United States,NY,New York,f,1
X,C,United States,WA,Tacoma,f,1
Y,A,Canada,QC,Montreal,f,2
Z,A,Bahamas,Bahamas,Nassau,f,2

The 2=STR notation means "match if the second field equals STR".

You can then optionally pipe the output through cut -c3- to remove the prefix.

Stack Exchange Network

How to do custom sorting using unix sort?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

How to do custom sorting using unix sort?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions