0

I want to filter all lines from a file that contain mySearchString and after that group them together and count them.

Example find all lines that contain 9791

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
AB-0001___Foo

Using $ grep "9791" myFile.txt gives this result

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar 
// 0001 was filtered out

This result should be grouped and counted (like SQL Group by Count) like this

AB-9791___Foo     2
DE-9791___BAR     1

This answer uses perl but perl is not installed on our machines.

What tool is usefull ( , , , or other) to achieve the second part to group and count?

Update with test records

In my test file Test_2.txt these lines are written

AB-9791___Foo
DE-9791___Bar
AB-0001___Foo
AB-9791___Foo
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
DE-9791___Bar
DE-9791___Bar

I copy und pasted each AB-9791___Foo line so they should be identical. Running $ grep '9791' Test_grep_uniq_sort.txt | uniq -c gave this result

  1     AB-9791___Foo
  1     DE-9791___Bar // expected: 4 actual: 1, 2, 1
  3     AB-9791___Foo // expected: 4 actual: 1, 3
  2     DE-9791___Bar
  1     DE-9791___Bar

Running $ sort Test_2.txt > Test_2_sort_0.txt and then using grep | uniq on Test_2_sort_0.txt did almost return the expected output.

  $ grep '9791' Test_2_sort_0.txt | uniq -c
  4     AB-9791___Foo
  1     DE-9791___Bar // this is due to a missing line break / line feed
  3     DE-9791___Bar

After adding a line break / line feed by hand everything did work

2
  • Simple curiosity, what OS are you using?
    – Toto
    Commented May 24, 2022 at 10:10
  • I am on windows 10 but have a bash under windows available. AFAIK this includes the coreutils pdf
    – surfmuggle
    Commented May 24, 2022 at 10:38

2 Answers 2

2

You have to sort the file before.

You can use grep and uniq like this:

 grep '9791' file1 | uniq -c
      2 AB-9791___Foo
      1 DE-9791___Bar
4
  • Thanks it did work after i figured out that the file has to be sorted first: $ sort -c Test.txt > file1 and then grep '9791' file1 | uniq -c
    – surfmuggle
    Commented May 24, 2022 at 11:09
  • 1
    @surfmuggle: True, I have updated my answer.
    – Toto
    Commented May 24, 2022 at 11:40
  • Thanks. Do you know how to grep for either A or B (9791 or 9891 ) for example grep '9791|9891' file1 <-- but this does not work.
    – surfmuggle
    Commented May 24, 2022 at 12:08
  • 1
    @surfmuggle: Escape the pipe: 9791\|9891
    – Toto
    Commented May 24, 2022 at 12:13
1

uniq -c for count and awk for swap columns:

$ uniq -c <<END | awk '{print $2 " " $1;}'
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
END

AB-9791___Foo 2
DE-9791___Bar 1

A few ideas is here: https://stackoverflow.com/questions/8627014/count-number-of-similar-lines-in-a-file

1
  • You need to grep before. They only want lines that contain a pattern.
    – Toto
    Commented May 24, 2022 at 11:41

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .