How do I get a list of unique names and the sum of numbers from each row containing this name, with 1 command?

Question

The problem

I have a log file with the following format:

2018/12/05 22:43:14 [ChestShop] User bought 1 Boat for 8.00 from Admin Shop at [...] -246, 65, 61
2019/01/02 10:09:38 [ChestShop] User sold 64 Sea Lantern for 27840.00 to Admin Shop at [...] -234, 61, 45
2019/01/02 10:09:38 [ChestShop] User sold 48 Sea Lantern for 20880.00 to Admin Shop at [...] -234, 61, 45
2019/01/02 10:09:42 [ChestShop] User sold 2 Prismarine Bricks for 248.00 to Admin Shop at [...] -233, 62, 45

from which I want to extract certain pieces of information and display them in a summarized list.

The information I want to summarize is names, quantities and sell values. Sell values are the total sell value for the listed quantity. Names (Sea Lantern, Prismarine Bricks etc.) can appear more than once in this log file along with quantities (the number to the left of names) and sell values (the number to the right of "for". Names may contain multiple spaces (never more than 4) or none at all.

... ... [...] ... ... 2 Prismarine Bricks ... 248.00 ... ... ... ... [...] ..., ..., ...

Preferably, I would like the summary to look something like:

totalQuantity1 uniqueName1 totalSellValue1
totalQuantity2 uniqueName2 totalSellValue2

sorted by totalQuantity OR totalSellValue, depending on a small change to the command.

My attemps at solving the problem

I have found out that I can use the following command to get a list of the most occuring items and the amount of times they occur in the log file, sorted by the amount of times they occur (which is not what I want):

cat ChestShop.log | grep -w sold | cut -d ' ' -f 7,8,9,10,11 | awk -F 'for' '{print $1}' | sort | uniq -c | sort -rn

The grep -w sold command is just used to differentiate buying from selling, and as you can see from the log examples above only two words differ when comparing buying to selling.

I have also used this command to summarize the quantity of one particular item from a list containing only quantities for that item:

cat ChestShop.log | grep -w sold | grep -w 'Magma Block' | cut -d ' ' -f 6 | paste -s -d+ - | bc

I have tried countless other modifications to the above commands but have not come any closer to getting what I would like, the above commands are the closest I have gotten. Preferably the command should be as short as possible also, or if that is difficult, an explanation for each part of the command so that I can understand what is going on (especially if awk is used in any other way than I have used it), thanks.

Any help is very much appreciated.

steeldriver · Accepted Answer · 2019-01-06 18:36:37Z

1

With plain Awk, you could do something like this:

$ awk '$5 == "sold" {
    q[$7 FS $8] += $6; v[$7 FS $8] += $6 * $10
  } 
  END {
    for (item in q) print q[item], item, v[item]
  }' ChestShop.log 
2 Prismarine Bricks 496
112 Sea Lantern 2784000

With GNU Awk (gawk) version 4.0+, you can control the sort order as follows:

gawk '$5 == "sold" {
    q[$7 FS $8] += $6; v[$7 FS $8] += $6 * $10
  } 
  END {
    PROCINFO["sorted_in"] = "@val_num_desc";
    for (item in q) print q[item], item, v[item]
  }' ChestShop.log

(sorted in descending order of quantity) or

gawk '$5 == "sold" {
    q[$7 FS $8] += $6; v[$7 FS $8] += $6 * $10
  } 
  END {
    PROCINFO["sorted_in"] = "@val_num_asc";
    for (item in v) print q[item], item, v[item]
  }' ChestShop.log

(sorted in ascending order of value). Note that all of these assume the format of your file is as originally shown, with each item's name consisting of the 7th and 8th whitespace separated field. If it isn't, then you will likely need to parse it with a regular expression and capture the elements - for example, using GNU Awk:

gawk 'match($0, /sold ([0-9]+) (.*) for ([0-9.]+)/, m) {
    q[m[2]] += m[1]; v[m[2]] += m[3]
  } 
  END {
    PROCINFO["sorted_in"] = "@val_num_asc";
    for (item in v) print q[item], item, v[item]
  }' ChestShop.log

Note that this assumes that the keyord for can't appear elsewhere in the line.

If you don't have access to GNU Awk, thn it may be simpler to pre-process the file with another regex tool in order to insert appropriate delimiters so that you can then use POSIX awk with that delimiter.

edited Jan 6, 2019 at 18:36

answered Jan 3, 2019 at 17:56

steeldriver

138k22 gold badges244 silver badges343 bronze badges

I did not make it clear so that's my fault, but instead of q[$7 FS $8] += $6; v[$7 FS $8] += $6 * $10 it should say q[$7 FS $8] += $6; v[$7 FS $8] += $10. The values listed are the total value of the listed quantity for that item. I am wondering though, is it possible to make this work for names with a varying amount of spacing? Some items have no spaces in their name, which I forgot to mention here (added now).
– brr3
Commented Jan 3, 2019 at 18:42
@brr3 if the structure of the file is not readily divided into fields, then you're going to have to do a bit more work e.g. using a regular expression to parse the lines. While that's certainly possible in Awk, you may find that it's easier to pre-process the file in order to insert appropriate delimiters
– steeldriver
Commented Jan 3, 2019 at 20:12
The match command you added returns a syntax error when used (on $0, the first part of the command). I am guessing it has to be used in conjunction with the awk or gawk command, but I don't know where to put it
– brr3
Commented Jan 6, 2019 at 17:59
@brr3 please see revised answer
– steeldriver
Commented Jan 6, 2019 at 18:05
That worked! Thank you so much. The line q[m[2]] = m[1]; v[m[2]] = m[3] should be q[m[2]] += m[1]; v[m[2]] += m[3], but that was easy to fix. Again, thank you, and have a nice day.
– brr3
Commented Jan 6, 2019 at 18:21

| Show 1 more comment

Stack Exchange Network

How do I get a list of unique names and the sum of numbers from each row containing this name, with 1 command?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
command-line
bash
log
extract
.

Hot Network Questions

How do I get a list of unique names and the sum of numbers from each row containing this name, with 1 command?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged command-linebashlogextract.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
command-line
bash
log
extract
.