-1

Suppose we have a log file like marks.log and the content looks something like this:

Fname   Lname   Net Algo    
Jack    Miller  15  20  
John    Compton 12  20  
Susan   Wilson  13  19  

I want to add a new column that contains average for each person, and a new row that contains average for each course. The result has to look like this:

Fname   Lname   Net  Algo  Avg
Jack    Miller  15   20    17.5
John    Compton 12   20    16
Susan   Wilson  13   19    16
Average         13.3 19.6  -
4
  • 1
    Re: "I'm new to bash can i can't figure the syntax for loops and awk etc." First, awk is not bash, they are two completely different languages. Second, if you don't know the syntax, you should go read a tutorial instead of delegating it to other people.
    – 4ae1e1
    Commented Oct 3, 2015 at 8:53
  • 1
    Honestly, if this is to run on Linux or any modern UNIX I would suggest scripting it with some other language more suitable for complex scripting. Perl and Python are almost always available to you, and in many cases Ruby and PHP are there as well on a default Linux installation. Although doable in bash, It will be much easier to do in one of these languages (I suspect no more than a few lines of code in either of these)
    – shevron
    Commented Oct 3, 2015 at 8:57
  • 1
    please edit your question to include your expected output, given this sample input. Good luck.
    – shellter
    Commented Oct 3, 2015 at 9:11
  • I would suggest using something like "Overall Average" as the final row, just so it has the same number of fields as the previous rows. It will make life easier for any formatting you do afterwards, using space as the field delimiter.
    – seumasmac
    Commented Oct 3, 2015 at 9:37

3 Answers 3

4

If your data is in datafile.txt, the syntax for awk could be something like:

awk '
  {
  # If it is the first row
  if (NR==1)
  print $0, "Avg";
  else
  # Print all fields, then the average of fields 3 & 4
  print $0,($3+$4)/2;
  # Get the total for field 3 and field 4
  t3+=$3; t4+=$4
  }
  # Once that is done...
  END {
  # Print the final line
  printf "Overall Average %.1f %.1f -\n",
  # The average of field 3 (NR is the Number of Records)
  t3/(NR-1),
  # The average of field 4 (NR is the Number of Records)
  t4/(NR-1);
  }' datafile.txt

That's the long version with comments. The one-liner looks like:

awk '{if (NR==1) print $0, "Avg"; else print $0,($3+$4)/2; t3+=$3; t4+=$4}END{printf "Overall Average %.1f %.1f -\n",t3/(NR-1),t4/(NR-1);}' datafile.txt

This should match the desired output.

1
  • How can you do this if rows are not defined to have 2 integer entries and not restricted to fields 3 or 4? So let's ignore the column names and say row 1 has 2 integers. row 2 has 4, row 3 has zero, row 5 has 1. How can you work the average out of each row in that instance via the methods defined including the Overall Average?
    – Data
    Commented Feb 9, 2021 at 13:58
2

How about:

gawk '{if (NR==1) { print $0, "Avg"; tn = 0; ta = 0; c = 0; } else { print $0,($3+$4)/2; tn = tn + $3; ta = ta + $4; c = c + 1; } } END {print "Average", tn/c, ta/c, c; }' <filename>
1
  • 1
    Thanks for making me feel lazy :) Updated my answer doing the first line properly.
    – seumasmac
    Commented Oct 3, 2015 at 10:15
1

a lengthy solution not using awk could be:

#!/bin/bash
A=0
B=0

process(){
  A=$(( $A + $3 ))
  B=$(( $B + $4 ))
}
get_mean(){
  val=$( echo "($3 + $4)/2" | bc -l)
  printf "%.1f" $val
}

line_id=0
while read line
do
  line_id=$(( $line_id + 1 ))
  if [ $line_id -le 1 ]; then
    echo "Fname   Lname   Net  Algo  Avg"
    continue
  fi

  process $line
  mean=$(get_mean $line)

  echo $line $mean
done
A=$(echo "$A/($line_id-1)" | bc -l)
B=$(echo "$B/($line_id-1)" | bc -l)
printf "Average\t\t%.1f %.1f -" $A $B

Then one can invoke this script as ./test.sh < input.

Not the answer you're looking for? Browse other questions tagged or ask your own question.