7

How would you grep for a line containing only 5 or 6 numbers? Something like this.

case 1 (has leading space)

           10      2       12      1       13

case 2 (no leading space)

   1       2       3       4       5        6

I thought something like this would work.

grep -E '[0-9]{5}'
7
  • Is this your homework? Commented Dec 4, 2014 at 6:58
  • @WarrenYoung no. I'm just trying to filter through massive debugging files.
    – cokedude
    Commented Dec 4, 2014 at 7:20
  • @cokedude Can you post some more example lines, preferably a representative sample as to what should and should not match?
    – muru
    Commented Dec 4, 2014 at 7:27
  • @muru yours grabs everything I want. It also creates a line separator from my original numbers which makes it even more readable than I expected.
    – cokedude
    Commented Dec 4, 2014 at 8:03
  • @cokedude what about if line contained more than six numbers in it 1 2 3 4 5 6 7? Should this line must present in output or not? Commented Dec 4, 2014 at 9:59

7 Answers 7

9

grep -E '[0-9]{5}' is looking for numbers with at least 5 digits. What you need is 5 numbers with at least one digit:

grep -E '[0-9]+([^0-9]+[0-9]+){4}'
  • [0-9]+ - a number of at least one digit
  • [^0-9]+[0-9]+ - a number with at least one digit, preceded by at least one non-digit character. We then repeat this 4 times to get 5 numbers separated by non-digits.
  • If the requirement is exactly 5, you might want to surround this regex with [^0-9] so that the entire line is matched (with the anchors, of course).
  • Depending on what you want here (does 1,2,3,4,6 qualify?), you might look at other separators. For example, a proper scientific notation real number would look like: [+-]?(([0-9]+(\.[0-9]+)?)|([0-9]?\.[0-9]+))([eE][+-][0-9]+)? So separators may not include ., e, etc. They may only be whitespace, as mikeserv notes. Or they maybe commas, if it's a CSV record. Or depending on the locale, a comma would be the decimal separator. Vary [^0-9] as per your need.
7
  • 2
    I don't have any commas between my numbers. I just have a bunch of space between my numbers. I did make a couple of trivial tweaks to also grep for fault and Fault. grep -E '[0-9]+([^0-9]+[0-9]+){4}|fault|Fault'
    – cokedude
    Commented Dec 4, 2014 at 8:07
  • yours works the I want. I just added more description to clarify for cuonglm.
    – cokedude
    Commented Dec 4, 2014 at 8:29
  • @muru this returns 1 2 3 4 5 6 7 and 111 222 333 444 A555 as result but those are not actually contain only 5 or 6 numbers. Commented Dec 4, 2014 at 16:32
  • @KasiyA, see the 3rd bullet-point in the answer. Commented Dec 4, 2014 at 16:45
  • 1
    @KasiyA if you insist: grep -P '^\s*\d+(\s+\d+){4,5}\s*$'
    – muru
    Commented Dec 4, 2014 at 16:51
3

I would go with something a little more powerful than grep. This can do it in perl:

perl -ne 'print if s/\d+/$&/g == 5' your_file

The regex substitution replaces all groups of one or more digits (\d+) with themselves ($&): it does nothing. It is used merely for side effect since the s/// operator returns the number of times it managed to substitute for its regex. Thus, the line is printed only if s/// found 5 groups of digits.

3
  • This will match line like 1a 1 1 1 1
    – cuonglm
    Commented Dec 4, 2014 at 7:07
  • @cuonglm I thankfully don't have any numbers like that. All of my important numbers are on there own line and separated by space.
    – cokedude
    Commented Dec 4, 2014 at 8:10
  • @josephr Can you please explain how yours works? It doesn't grab this massive list of numbers 1 2 3 4 5 6 7 8 9 10 11, but it does grab this 1 2 3 4 5 6. It does chop off the 1 though.
    – cokedude
    Commented Dec 4, 2014 at 8:13
2

Another perl:

$ perl -MList::Util=first -Tnle '
  s/^\s+|\s+$//g;
  @e = split /\s+/;
  print if @e == 5 || @e == 6 and !first {/\D/} @e;
' file
10      2       12      1       13

Explanation

  • s/^\s+|\s+$//g trim the line.

  • @e = split /\s+/ split the line into array @e.

  • We will print the line if:

    • array @e contains 5 or 6 elements.
    • And None of its elements contain non-digit characters (\D match non-digit characters).
5
  • How does it work?
    – muru
    Commented Dec 4, 2014 at 7:54
  • @muru: Added explanation.
    – cuonglm
    Commented Dec 4, 2014 at 8:12
  • @cuonglm is \D wrong? I wanted it to match digits. Right now it doesn't print anything.
    – cokedude
    Commented Dec 4, 2014 at 8:18
  • @cokedude: Is your input contain leading spaces?
    – cuonglm
    Commented Dec 4, 2014 at 8:20
  • @cuonglm sorry I wasn't clear. The 5 word lines have several leading spaces. The 6 word lines have no leading space. And can you add one easy piece to also match fault or Fault?
    – cokedude
    Commented Dec 4, 2014 at 8:31
2
grep -E '^(\s*[0-9]+\s+){4,5}[0-9]+\s*$'
0
2
awk '{l=$0; n = gsub(/[0-9]+/, "", l)}; n == 5 || n == 6' 

(same principle as in Joseph's answer)

3
  • I take it the principle is the same as in this answer?
    – muru
    Commented Dec 4, 2014 at 16:56
  • it matches with 4a too.. Commented Dec 10, 2014 at 7:29
  • @JigarGandhi, yes, nobody said it shouldn't or how the numbers may or may not be separated. Commented Dec 10, 2014 at 8:17
0

A way with awk that is customisable for different numbers of fields.
Also whitespace does not matter.

awk 'NF~/^(5|6)$/{x=0;for(i=1;i<=NF;i++)x+=($i~/^[0-9]+$/)}x==NF' file

This checks the number of fields is 5 or 6 although more numbers of fields could be added if your requirements ever change.

Then it sets a counter to 0

Then loops checking each field is a number and if it is adds 1 to the counter

If the counter equals the number of fields it prints the line.

example

input

  1       2       3       4       5        6
        2       3       4       5        6
3       4       5        6
blah  1       2       3       4       5
      4       3324       4       53        6

output

  1       2       3       4       5        6
        2       3       4       5        6
      4       3324       4       53        6
8
  • why does it print out a bunch of x's? The very last line is right but the rest is just x's.
    – cokedude
    Commented Dec 4, 2014 at 8:54
  • What do you mean ? It shouldn't print any xs
    – user78605
    Commented Dec 4, 2014 at 8:57
  • pixhost.org/show/1864/24202911_awk_test.jpg
    – cokedude
    Commented Dec 4, 2014 at 9:01
  • @cokedude, whats the input and can you show me how you executed the command, also it shouldn't make any difference but check the version of awk you are using using awk --version
    – user78605
    Commented Dec 4, 2014 at 9:03
  • Is gnu awk 3.0.4 to old to do what I want? Its the awk that came with git bash.
    – cokedude
    Commented Dec 4, 2014 at 9:04
0

I know I have not solved using sed or awk or any of shell commands, However tcl did work out well.

I have kept the script simple and user friendly. elements like 4a abc etc will be taken care

command

script.tcl file

Script

#!/usr/bin/tclsh

if {$argv == ""} {
        puts "please enter the arguement"
        exit ;
}

set tar_fl [lindex $argv 0]
if {![file exists $tar_fl]} {
        puts "$tar_fl doesnot exist"
        exit ;
}

set flptr [open $tar_fl r]

while {[gets $flptr line] >=0 } {
        if {[llength $line] !=5} {continue ;}
        if {[llength $line ] == 5} {
                if {[lsearch -regexp  $line {[^0-9]}]> -1} {continue;}
                puts $line
        }

}

close $flptr

Output

           10      2       12      1       13
1       2       3       4       5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .