0

I have a file where there are name and time. I want to keep the entry only with the latest time. How do I do it?

for example:

>cat user.txt
"a","03-May-13
"b","13-May-13
"a","13-Aug-13
"a","13-May-13

I am using command sort -u user.txt. It is giving the following output:

"a","11-May-13
"a","13-Aug-13
"a","13-May-13
"b","13-May-13

but I want the following output.

"a","13-Aug-13
"b","13-May-13

Can someone help?

Thanks.

3
  • Did you mean "a","13-May-13", or did you intend to leave off the trailing '"'?
    – Arafangion
    Commented May 24, 2013 at 6:54
  • Additionally, do you want the first field, or only the date?
    – Arafangion
    Commented May 24, 2013 at 6:55
  • @Arafangion I intentionally left the trailing quote. And I want both the fields. Commented May 24, 2013 at 7:09

4 Answers 4

3

Try this:

sort -t, -k2 user.txt | awk -F, '{a[$1]=$2}END{for(e in a){print e, a[e]}}' OFS=","

Explanation:

Sort the entries by the date field in ascending order, pipe the sorted result to awk, which simply uses the first field as a key, so only the last entry of the entries with an identical key will be kept and finally output.

EDIT

Okay, so I can't sort the entries lexicographically. the date need to be converted to timestamp so it can be compared numerically, use the following:

awk -F",\"" '{ cmd=" date --date " $2 " +%s "; cmd | getline ts; close(cmd); print ts, $0, $2}' user.txt | sort -k1 | awk -F"[, ]" '{a[$2]=$3}END{for(e in a){print e, a[e]}}' OFS=","

If you are using MacOS, use gdate instead:

awk -F",\"" '{ cmd=" gdate --date " $2 " +%s "; cmd | getline ts; close(cmd); print ts, $0, $2}' user.txt | sort -k1 | awk -F"[, ]" '{a[$2]=$3}END{for(e in a){print e, a[e]}}' OFS=","
2
  • Missed that '-k2' option there, that does simplify the answer somewhat!
    – Arafangion
    Commented May 24, 2013 at 6:56
  • Thanks @neevek . Your solution is good but it has issues. It looks to be doing lexicographic comparison. It fails for the following test case. "a","3-May-13 "b","13-May-13 "a","11-May-13 "a","13-May-13 Commented May 24, 2013 at 7:07
1

I think you need to sort year, month and day.

Can you try this

awk -F"\"" '{print $2"-"$4}' data.txt | sort -t- -k4 -k3M -k2 | awk -F- '{kv[$1]=$2"-"$3"-"$4}END{for(k in kv){print k,kv[k]}}'
0

For me this is doing the job. I am sorting on the Month and then applying the logic that @neevek used. Till now I am unable to find a case that fails this. But I am not sure if this is a full proof solution.

sort -t- -k2 -M user1.txt | awk -F, '{a[$1]=$2}END{for(e in a){print e, a[e]}}' OFS=","

Can someone tell me if this solution has any issues?

0

How about this?

grep `cut -d'"' -f4 user.txt | sort -t- -k 3 -k 2M -k 1n | tail -1` user.txt

Explaining: using sort as you have done, get the latest entry with tail -1, extract that date (second column when cutting with a comma delimiter) and then sort and grep on that.

edit: fixed to sort via month.

1
  • @RachitAgrawal i modified the sort to work with month, does this work for you now? Commented May 24, 2013 at 8:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.