-1

I have the following dataset:

071,36,035,08422,46217,00000,1,A,Broadalbin village (pt.),New York,18,46,46,46,45,45,44,46,44,46,43,43
071,36,045,29322,56209,00000,1,A,Glen Park village (pt.),New York,42,42,42,42,43,42,44,43,42,42,41,40
162,36,000,64881,00000,00000,0,A,Saltaire village,New York,37,41,41,41,41,41,41,41,41,41,41,41
157,36,103,64881,00000,00000,0,A,Saltaire village,New York,37,41,41,41,41,41,41,41,41,41,41,41
071,36,103,64881,38000,00000,1,A,Saltaire village,New York,37,41,41,41,41,41,41,41,41,41,41,41
061,36,013,00000,13035,00000,1,F,Cattaraugus Reservation,New York,38,38,38,38,37,37,39,39,38,38,38,40
061,36,029,00000,75011,00000,1,F,Tonawanda Reservation,New York,34,34,34,34,34,34,34,37,37,37,37,36
061,36,009,00000,60950,00000,1,A,Red House town,New York,38,33,33,33,34,34,35,35,34,34,35,35
071,36,065,61115,75280,00000,1,A,Remsen village (pt.),New York,28,28,28,28,28,30,28,27,27,29,29,29
071,36,023,35276,18421,00000,1,A,Homer village (pt.),New York,31,31,31,31,31,30,30,30,30,29,29,29
071,36,099,55816,63440,00000,1,A,Ovid village (pt.),New York,19,27,27,27,27,29,29,26,26,26,26,25
071,36,059,22260,56000,00000,1,A,East Hills village (pt.),New York,23,23,23,23,23,23,23,23,23,23,23,23
071,36,059,47636,34000,00000,1,A,Mineola village (pt.),New York,16,22,22,22,22,22,22,22,22,22,22,22
071,36,043,37275,27199,00000,1,A,Ilion village (pt.),New York,22,23,23,23,23,23,22,22,22,22,22,22
071,36,029,82084,15011,00000,1,A,Williamsville village (pt.),New York,18,13,13,13,13,13,13,13,13,13,13,13
162,36,000,20379,00000,00000,0,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
157,36,103,20379,00000,00000,0,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
071,36,103,20379,66839,00000,1,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
071,36,071,26319,29553,00000,1,A,Florida village (pt.),New York,4,5,5,5,5,5,5,5,5,5,5,5
071,36,099,78553,66333,00000,1,A,Waterloo village (pt.),New York,4,4,4,4,4,4,4,4,4,4,4,4
071,36,057,26924,12122,00000,1,A,Fort Plain village (pt.),New York,3,4,4,4,4,4,4,4,4,4,4,4
071,36,053,31709,23305,00000,1,A,Hamilton village (pt.),New York,5,3,3,3,3,3,3,3,3,3,3,3
061,36,003,00000,54523,00000,1,F,Oil Springs Reservation,New York,1,1,1,1,1,1,1,1,1,1,1,1
157,36,099,28640,00000,00000,0,A,Geneva city (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,099,28640,28640,00000,1,A,Geneva city (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,089,46019,51330,00000,1,A,Massena village (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,071,15297,29553,00000,1,A,Chester village (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,059,28178,53000,00000,1,A,Garden City village (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,053,31709,44435,00000,1,A,Hamilton village (pt.),New York,2,0,0,0,0,0,0,0,0,0,0,0
071,36,051,19664,70101,00000,1,A,Dansville village (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
071,36,039,36167,38638,00000,1,A,Hunter village (pt.),New York,A,0,0,0,0,0,0,0,0,0,0,0
071,36,037,18201,19719,00000,1,A,Corfu village (pt.),New York,0,0,0,0,0,0,0,0,0,0,0,0
061,36,099,00000,28640,00000,0,F,Geneva city,New York,0,0,0,0,0,0,0,0,0,0,0,0
061,36,063,00000,75011,00000,1,F,Tonawanda Reservation,New York,0,0,0,0,0,0,0,0,0,0,0,0
061,36,013,00000,14072,00000,1,S,Chautauqua Lake UT,New York,0,0,0,0,0,0,0,0,0,0,0,0
061,36,009,00000,54523,00000,1,F,Oil Springs Reservation,New York,0,0,0,0,0,0,0,0,0,0,0,0

And I want to print out the 10 lines with the smallest number in the last column that isn't zero, so this:

071,36,043,37275,27199,00000,1,A,Ilion village (pt.),New York,22,23,23,23,23,23,22,22,22,22,22,22
071,36,029,82084,15011,00000,1,A,Williamsville village (pt.),New York,18,13,13,13,13,13,13,13,13,13,13,13
162,36,000,20379,00000,00000,0,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
157,36,103,20379,00000,00000,0,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
071,36,103,20379,66839,00000,1,A,Dering Harbor village,New York,11,11,11,11,11,11,11,11,11,11,11,11
071,36,071,26319,29553,00000,1,A,Florida village (pt.),New York,4,5,5,5,5,5,5,5,5,5,5,5
071,36,099,78553,66333,00000,1,A,Waterloo village (pt.),New York,4,4,4,4,4,4,4,4,4,4,4,4
071,36,057,26924,12122,00000,1,A,Fort Plain village (pt.),New York,3,4,4,4,4,4,4,4,4,4,4,4
071,36,053,31709,23305,00000,1,A,Hamilton village (pt.),New York,5,3,3,3,3,3,3,3,3,3,3,3
061,36,003,00000,54523,00000,1,F,Oil Springs Reservation,New York,1,1,1,1,1,1,1,1,1,1,1,1

The command I used to get this far was:

cat [filename] | grep 'New York' | sort -n -r -t, -k20 

But I'm not sure how to do the rest, which I'm sure I'll need grep for

2
  • You want to filter out lines where the 10th field is new york correct?
    – jesse_b
    Commented Oct 6, 2023 at 23:46
  • yeah, what I have already just missing the last command. I don't know how to filter out lines that contain 0 in the last column
    – Mathew
    Commented Oct 6, 2023 at 23:50

2 Answers 2

2
awk -F, '$10 == "New York" && $NF > 0' input | sort -t, -rnk22 | tail -10

The awk will print only lines where the 10th column equals New York and the last column is greater than 0.

sort will reverse sort on column 22 numerically.

tail -10 will print only the last 10 lines.

12
  • This output prints the ones ending in zero on my screen
    – Mathew
    Commented Oct 7, 2023 at 0:13
  • @Mathew what if you change $NF to $22? And/or are you sure your input is exactly what you have pasted here?
    – jesse_b
    Commented Oct 7, 2023 at 0:16
  • Yeah still wouldn't work, found the issue. There was a carriage return in there I didn't account for, the correct commands would be: cat [filename] | grep 'New York' | sort -n -r -t, -k20 | grep -P ',0^M$' -v | tail -n 10. Note ^M here is 'ctrl v ctrl m'
    – Mathew
    Commented Oct 7, 2023 at 0:19
  • I mean that may work but being the "correct" command is debatable. More correct would be mac2unix input; awk -F, '$10 == "New York" && $NF > 0' input | sort -t, -rnk22 | tail -10
    – jesse_b
    Commented Oct 7, 2023 at 0:22
  • 1
    @jesse_b if you add {sub(/\r$/,"")} as the first thing in your awk command then you don't need to call mac2unix input (which would make your answer non-portable) before calling awk.
    – Ed Morton
    Commented Oct 7, 2023 at 8:57
1

Using Miller:

$ mlr --csv -N filter '$10 == "New York" && $22 > 0' then sort -nr 22 then tail file

Using gawk:

$ awk -F, -v var=10 
'$10 == "New York" && $NF > 0 {ar[$0]=$NF}
END{asorti(ar, b, "@val_num_asc"); 
for(i=var;i>=1;i--)print b[i]}' file

$ csvsql -I -H  --query 'SELECT * FROM file WHERE j == "New York" AND v > 0 ORDER BY CAST(v AS INTEGER) DESC' file | tail

-H is used because of header less file. This creates default headers (a,b,c..).

$ csvsql -I -H --query 'SELECT * FROM file WHERE j == "New York" AND v > 0 ORDER BY CAST(v AS INTEGER) DESC LIMIT 10 OFFSET ( SELECT count(*) FROM file WHERE j == "New York" AND v > 0 ) - 10' file | csvformat -K1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .