How to group and count all lines in a file that contain specific string

Question

I want to filter all lines from a file that contain mySearchString and after that group them together and count them.

Example find all lines that contain 9791

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
AB-0001___Foo

Using $ grep "9791" myFile.txt gives this result

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar 
// 0001 was filtered out

This result should be grouped and counted (like SQL Group by Count) like this

AB-9791___Foo     2
DE-9791___BAR     1

This answer uses perl but perl is not installed on our machines.

What tool is usefull ( grep, awk, sed, or other) to achieve the second part to group and count?

Update with test records

In my test file Test_2.txt these lines are written

AB-9791___Foo
DE-9791___Bar
AB-0001___Foo
AB-9791___Foo
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
DE-9791___Bar
DE-9791___Bar

I copy und pasted each AB-9791___Foo line so they should be identical. Running $ grep '9791' Test_grep_uniq_sort.txt | uniq -c gave this result

  1     AB-9791___Foo
  1     DE-9791___Bar // expected: 4 actual: 1, 2, 1
  3     AB-9791___Foo // expected: 4 actual: 1, 3
  2     DE-9791___Bar
  1     DE-9791___Bar

Running $ sort Test_2.txt > Test_2_sort_0.txt and then using grep | uniq on Test_2_sort_0.txt did almost return the expected output.

  $ grep '9791' Test_2_sort_0.txt | uniq -c
  4     AB-9791___Foo
  1     DE-9791___Bar // this is due to a missing line break / line feed
  3     DE-9791___Bar

After adding a line break / line feed by hand everything did work

I am on windows 10 but have a bash under windows available. AFAIK this includes the coreutils pdf — surfmuggle, Commented May 24, 2022 at 10:38

Toto · Accepted Answer · 2022-05-24 11:39:35Z

2

You have to sort the file before.

You can use grep and uniq like this:

 grep '9791' file1 | uniq -c
      2 AB-9791___Foo
      1 DE-9791___Bar

edited May 24, 2022 at 11:39

answered May 24, 2022 at 10:06

Toto

18.2k72 gold badges33 silver badges45 bronze badges

Thanks it did work after i figured out that the file has to be sorted first: $ sort -c Test.txt > file1 and then grep '9791' file1 | uniq -c
– surfmuggle
Commented May 24, 2022 at 11:09
1

@surfmuggle: True, I have updated my answer.
– Toto
Commented May 24, 2022 at 11:40
Thanks. Do you know how to grep for either A or B (9791 or 9891 ) for example grep '9791|9891' file1 <-- but this does not work.
– surfmuggle
Commented May 24, 2022 at 12:08
1

@surfmuggle: Escape the pipe: 9791\|9891
– Toto
Commented May 24, 2022 at 12:13

Add a comment |

MarrekNožka · Accepted Answer · 2022-05-24 10:36:47Z

1

uniq -c for count and awk for swap columns:

$ uniq -c <<END | awk '{print $2 " " $1;}'
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
END

AB-9791___Foo 2
DE-9791___Bar 1

A few ideas is here: https://stackoverflow.com/questions/8627014/count-number-of-similar-lines-in-a-file

edited May 24, 2022 at 10:36

answered May 24, 2022 at 10:22

MarrekNožka

1354 bronze badges

You need to grep before. They only want lines that contain a pattern.
– Toto
Commented May 24, 2022 at 11:41

Add a comment |

Stack Exchange Network

How to group and count all lines in a file that contain specific string

Update with test records

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
grep
sed
awk
aggregation
.

Linked

Hot Network Questions

How to group and count all lines in a file that contain specific string

Update with test records

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged grepsedawkaggregation.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
grep
sed
awk
aggregation
.