1

I have a sample CSV file with the following contents:

$ cat SAMPLE.CSV 
compid,active,tagno
-2147483646,1,"1"
-2147483645,0,"10000"
-2147483644,0,"1002"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"
-2147468087,1,"76245"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"

I want to sort by the third column called tagno but I want it to respect the alphanumeric values in that column.

The desired result should look like this:

compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

I've tried the following:

$ sort -t'"' -k2n SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466194,1,"1361B.2"
-2147466195,1,"1361B.1"
-2147466196,1,"1361B"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

But you can see that 1361B, 1361B.1 and 1361B.2 are almost reverse ordered.

1 Answer 1

3

Use --version-sort option in sort.

If you look at the manual (man sort), sort has an option for sorting by version numbers. Here is the entry:

-V, --version-sort
             Sort version numbers.  The input lines are treated as file
             names in form PREFIX VERSION SUFFIX, where SUFFIX matches
             the regular expression "(.([A-Za-z~][A-Za-z0-9~]*)?)*".  The
             files are compared by their prefixes and versions (leading
             zeros are ignored in version numbers, see example below).
             If an input string does not match the pattern, then it is
             compared using the byte compare function.  All string com-
             parisons are performed in C locale, the locale environment
             setting is ignored.

This seems to respect alphanumeric values much better than either -n or -g sorting.

Using the -V flag on the 3rd column you get the desired result:

$ sort -t'"' -k2V SAMPLE.CSV
compid,active,tagno
-2147483646,1,"1"
-2147483644,0,"1002"
-2147466196,1,"1361B"
-2147466195,1,"1361B.1"
-2147466194,1,"1361B.2"
-2147483645,0,"10000"
-2147468087,1,"76245"
-2147483127,1,"76245.1"
-2147483126,1,"76245.2"

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .