1

I have

[1] "000010004" "011120000" "002030000" "000300020" "003000020" "001000040" "000030020" "000010112"

[9] "050000000" "000000041" "001020020" "001030001" "001000130" "000000050" "000120020" "000500000"

using grep I want to find all groups (string inside "" is a group) that contain

a. subgroup 2,3 (means: ...2...3... and ...3...2...)

b. subgroup 1, 1, 1, 2 (means: ...1...1...1...2 and 1...1...2...1... etc etc)

Order doesnt matter, but frequency does. Meaning for a) 2 and 3 each should just appear once.

thanks for help

2
  • Can you elaborate on what a subgroup is? Do you want to find the pattern 1,2 in the input string? Commented Mar 30, 2011 at 10:50
  • You mean the position within a string, or the position of a string containing the numbers within a table?
    – MPękalski
    Commented Mar 30, 2011 at 12:25

3 Answers 3

3

This can be done with a regular expression using lookahead but isn't really pretty:

For example to match a quoted number that contains exactly one 2 and one 3 you could do this (verbose regex used for readability):

"         # quote
(?=       # Assert that the following can be matched:
 [^\D2]*  # zero or more numbers except 2
 2        # 2
 [^\D2]*  # zero or more numbers except 2
 "        # quote
)         # End of lookahead
(?=[^\D3]*3[^\D3]*") # same for the number 3
(\d+)     # one or more digits, capture the result
"         # quote

To match exactly three 1s and one 2:

"         # quote
(?=       # Assert that the following can be matched:
 (?:      # Match the following group:
  [^\D1]* # zero or more numbers except 1
  1       # 1
 ){3}     # exactly three times.
 [^\D1]*  # Match zero or more numbers except 1
 "        # quote
)         # End of lookahead
(?=[^\D2]*2[^\D2]*") # as above
(\d+)     # one or more digits, capture the result
"         # quote

I don't know if this will work with standard grep.

1
  • it works with grep -P, although of course you’d have to embed a (?x) for expanded/expressive/excellent:) mode.
    – tchrist
    Commented Mar 30, 2011 at 14:28
0

I assume with a. numbers 2,3 you want to match following entries of the input array

[3] "002030000"
[4] "000300020"
[5] "003000020"
[7] "000030020"

and with b. numbers 1, 1, 1,2 you want to match following entries

[2] "011120000"
[8] "000010112"

to check for frequency you'd probably need some regex with lookaround. This is rather complicated if possible at all.

0
0

First of all it's not possible with grep alone.

But you may do the following:

  1. Find all groups (quoted things)
  2. Make sets out of them
  3. Compare your input against these sets.

It's trivial in awk.

Not the answer you're looking for? Browse other questions tagged or ask your own question.