3

I have a csv containing the following data structure:

1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-1-5 1:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

I want to display the dates month and day as always being 2 digits long. I also want the times Hour field to always be 2 digits.

Essentially adding leading zeros if the month/day/hour fields are only a single digit as in the example line above.

Using awk, how would I go about achieving the following result:

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

5 Answers 5

8

A great tool for text processing is awk. The following example is using plain standard awk on FreeBSD 11.1. @RomanPerekhrest has an elegant solution in another answer if you prefer GNU awk.

Your input is comma-separated. Because of this we invoke awk with the -F, parameter.

We can then print out columns using the print statement. $1 is the first column. $2 is the second column.

$ awk -F, '{ print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17

This gives us the 8th column for each row.

This is then the date field you want to manipulate. Rather than setting the delimiter using the command-line parameter we can do it as part of the script. FS for the input delimiter and OFS for the output delimiter.

$ awk 'BEGIN { FS = "," } ; { print $8 }' inputfile.csv
2017-1-5 1:07:09
2017-11-25 19:57:17

When working with dates I often prefer to use the date util to make sure I handle them correctly. And I do not need to worry if I am using regular or GNU awk. Furthermore I get a big fat failure if the date does not parse correctly.

The interesting parameter are:

-j     Specify we do not want to set the date at all
-f     The format string we use for input
+      The format string we use for output

So if we run this for one date:

$ date -j -f "%Y-%m-%d %H:%M:%S" +"%Y-%m-%d %H:%M:%S" "2017-1-5 1:07:09"
2017-01-05 01:07:09

We can then combine this with awk. Notice how the quotes are escaped. This is probably the biggest stumbling block for a beginner.

$ awk -F, '{ system("date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"")}' inputfile.csv
2017-01-05 01:07:09
2017-11-25 19:57:17

The system call seems correct - but unfortunately it only allows us to capture the returncode and it prints directly to the output. To avoid this we use the cmd | getline pattern. The following simple example will read the current date into mydate:

$ awk 'BEGIN { cmd = "date"; cmd | getline mydate; close(cmd); print mydate }'
Thu Mar  1 16:26:15 CET 2018

We use the BEGIN keyword as we have no input to this simple example.

So let us expand this:

awk 'BEGIN { FS=","; OFS=FS };
     { 
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";
         cmd | getline firstdate;
         close(cmd);
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";
         cmd | getline seconddate;
         close(cmd);
         print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate
     }' inputfile.csv

And we can collapse it to a one-liner:

awk 'BEGIN {FS=",";OFS=FS};{cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$8"\"";cmd | getline firstdate;close(cmd);cmd="date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""$9"\"";cmd | getline seconddate;close(cmd);print $1,$2,$3,$4,$5,$6,$7,firstdate,seconddate}' inputfile.csv

Which gives me the output:

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

Addendum

As the purpose here is to learn good habit I better update this answer. It is a bad habit to repeat code. When you start doing that you should split things into a function. As you will notice the code below immediately becomes more readable.

awk 'function convertdate(the_date) {
         cmd = "date -j -f \"%Y-%m-%d %H:%M:%S\" +\"%Y-%m-%d %H:%M:%S\" \""the_date"\"";
         cmd | getline formatted_date;
         close(cmd);
         return formatted_date
     }
     BEGIN { FS=","; OFS=FS };
     { 
         print $1,$2,$3,$4,$5,$6,$7,convertdate($8),convertdate($9)
     }' inputfile.csv

Make a habit of this and you will notice how much easier it will become to introduce error handling later on.

1
  • 1
    Amazingly clear and easy to understand answer! Thank you Claus. Commented Mar 1, 2018 at 17:44
5

If you have GNU awk you could convert the final field into a whitespace-separated datespec string and then re-format it as desired using strftime:

awk 'BEGIN{OFS=FS=","} {gsub(/[-:]/," ",$NF); $NF = strftime("%Y-%m-%d %H:%M:%S", mktime($NF))} 1' file
1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

See The GNU awk User's Guide: Time Functions

2
  • Thanks for you helpful reply. It is working for the final column (field 9) but it doesn't convert the 8th column. Commented Mar 1, 2018 at 14:14
  • @GustavMahler sorry I didn't notice that you have multiple date columns - it should be obvious how to duplicate the recipe for other fields though Commented Mar 1, 2018 at 14:19
5

Straightforward GNU awk solution:

awk 'BEGIN{ FS=OFS="," }{ gsub(/\<[0-9]\>/, "0&", $8); gsub(/\<[0-9]\>/, "0&", $9) }1' file
  • gsub(/\<[0-9]\>/, "0&", <field>) - replace/complement only standalone single digits within datetime string:
    • \< and \> - are word boundaries
    • & - stands for the precise substring that was matched by the regexp pattern

The output:

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
0
1
sed -r 's/([-: ])([0-9]\b)/\10\2/g' input.txt

It replaces all alone digits, which preceded by [-: ] characters and followed by any non-word characters.

This solution short and simple, but errors prone, because it doesn't check the date pattern and just adds leading zero to all alone digits, which conforms to the [-: ][0-9]\b pattern (\b - matches a word boundary). But as variant.

Input

1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-1-5 1:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

Output

1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54
0

The dateutils package has codes to deal with the details of time/date formatted data.

# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }

pl " Input data file $FILE:"
head $FILE

pl " Expected output:"
cat $E

pl " Results, to standard format:"
dateutils.dconv -S <$FILE

pl " Results, to standard format, omitting the \"T\":"
dateutils.dconv -S -f '%F %T' <$FILE

producing:

-----
 Input data file data1:
1111,2222,3333,4444,5555,6666,7777,2017-1-5 1:07:09,2017-1-5 1:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

-----
 Expected output:
1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

-----
 Results, to standard format:
1111,2222,3333,4444,5555,6666,7777,2017-01-05T01:07:09,2017-01-05T01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25T19:57:17,2017-11-25T19:58:54

-----
 Results, to standard format, omitting the "T":
1111,2222,3333,4444,5555,6666,7777,2017-01-05 01:07:09,2017-01-05 01:11:53
1111,2222,3333,4444,5555,6666,7777,2017-11-25 19:57:17,2017-11-25 19:58:54

On a system like:

OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.9 (jessie) 
bash GNU bash 4.3.30
dateutils.dconv dconv 0.3.1

Some details for dconv:

dateutils.dconv Convert DATE/TIMEs between calendrical systems. (man)
Path    : /usr/bin/dateutils.dconv
Package : dateutils
Home    : http://www.fresse.org/dateutils
Version : 0.3.1
Type    : ELF 64-bit LSB shared object, x86-64, version 1 ( ...)
Help    : probably available with -h,--help
Home    : https://github.com/hroptatyr/dateutils (doc)

Best wishes ... cheers, drl

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .