A great tool for text processing is awk. The following example is using plain standard awk on FreeBSD 11.1. @RomanPerekhrest has an elegant solution in another answer if you prefer GNU awk.
Your input is comma-separated. Because of this we invoke awk
with the -F,
parameter.
We can then print out columns using the print
statement. $1
is the first column. $2
is the second column.
$ awk -F, '{ print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17
This gives us the 8th column for each row.
This is then the date field you want to manipulate. Rather than setting the delimiter using the command-line parameter we can do it as part of the script. FS for the input delimiter and OFS for the output delimiter.
$ awk 'BEGIN { FS = "," } ; { print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17
When working with dates I often prefer to use the date
util to make sure I handle them correctly. And I do not need to worry if I am using regular or GNU awk. Furthermore I get a big fat failure if the date does not parse correctly.
The interesting parameter are:
-j Specify we do not want to set the date at all
-f The format string we use for input
+ The format string we use for output
So if we run this for one date:
$ date -j -f "%Y-%m-%d %H:%M:%S" +"%Y-%m-%d %H:%M:%S" "2017-1-5 1:07:09"
2017-01-05 01:07:09
We can then combine this with awk. Notice how the quotes are escaped. This is probably the biggest stumbling block for a beginner.
$ awk -F, '{ system("date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"")}' inputfile.csv
2017-01-05 01:07:09
2017-11-25 19:57:17
The system call seems correct - but unfortunately it only allows us to capture the returncode and it prints directly to the output. To avoid this we use the cmd | getline
pattern. The following simple example will read the current date into mydate:
$ awk 'BEGIN { cmd = "date"; cmd | getline mydate; close(cmd); print mydate }'
Thu Mar 1 16:26:15 CET 2018
We use the BEGIN
keyword as we have no input to this simple example.
So let us expand this:
awk 'BEGIN { FS=","; OFS=FS };
{
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";
cmd | getline firstdate;
close(cmd);
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";
cmd | getline seconddate;
close(cmd);
print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate
}' inputfile.csv
And we can collapse it to a one-liner:
awk 'BEGIN {FS=",";OFS=FS};{cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";cmd | getline firstdate;close(cmd);cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";cmd | getline seconddate;close(cmd);print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate}' inputfile.csv
Which gives me the output:
1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
Addendum
As the purpose here is to learn good habit I better update this answer. It is a bad habit to repeat code. When you start doing that you should split things into a function. As you will notice the code below immediately becomes more readable.
awk 'function convertdate(the_date) {
cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""the_date"\"";
cmd | getline formatted_date;
close(cmd);
return formatted_date
}
BEGIN { FS=","; OFS=FS };
{
print $1,$2,$3,$4,$5,$6,$7,convertdate($8),convertdate($9)
}' inputfile.csv
Make a habit of this and you will notice how much easier it will become to introduce error handling later on.