4

I have a file filled with numbers, number per line. Each number consists of two or three digits.

I would like to filter out this file by any number has two or more sequential digits. These sequential digits can be consecutive (e.g. 127, 215, 781), or non-consecutive (e.g. 506). The order of the sequential digits is not important. It can be small to large (e.g. 127) or large to small (e.g. 215).

For example:

127
215
781
874
370
01
10
142
506
94

The expected output:

370
94

Because:

127 # Has two sequential and consecutive digits (1 and 2)
215 # Has two sequential and consecutive digits (1 and 2)
781 # Has two sequential and consecutive digits (7 and 8)
874 # Has two sequential and consecutive digits (7 and 8)
370 # Keep
01  # Has two sequential and consecutive digits (0 and 1)
10  # Has two sequential and consecutive digits (0 and 1)
142 # Has two sequential and non-consecutive digits (1 and 2)
506 # Has two sequential and non-consecutive digits (5 and 6)
94  # Keep
1
  • 11
    You should show the script you are working on and state where the problem is. Otherwise you seem to be attempting to get someone to do your homework for you without making any effort yourself.
    – user56041
    Commented Sep 23, 2018 at 17:35

3 Answers 3

8

With awk and setting FS to empty string (the effect of using empty FS is undefined behavior per POSIX and depending on what version awk you are using it's can be result differently). Below is tested in GNU awk:

awk -F '' '{
             is_sequential=0;
             for (i=2; i<=NF; i++)
                 is_sequential+=($0 ~ $i-1 || $0 ~ $i+1)
}!is_sequential' infile

we are checking on each number $i for a number that it's equal with number-1 $i-1 or number+1 $i+1 against the whole line, meaning that if there was a number number-1 or number+1 or both seen in a line, so we found there are at least two numbers are next to each other (the first, the number $i itself and next one either $i-1 or $i+1 or both (sequential) and so the value of is_sequential variable will be incremented otherwise it will remain 0.

With !is_sequential, we print that line where the value is unchanged (the value is still 0, no at least two numbers seen that were sequential); see also What is the meaning of '1' at the end of an awk script


Or with any awk:

awk '{
       is_sequential=0;
       for (i=1; i<=length(); i++) {
           num=substr($0, i, 1)
           is_sequential+=($0 ~ num-1 || $0 ~ num+1)
       }
}!is_sequential' infile
0
2

You can try either

awk '
  {split ("", N)                    # delete array N
    L = 1                           # initialise boolean L to TRUE
    for (i=1; i<=length($1); i++){  # for each digit
      P = substr($1, i, 1)                   
      if (N[P-1] || N[P+1]){        # if contiguous digit exists,
        L = 0          
        break                       # set L to FALSE; and quit the for loop
      }
      N[P] = 1
    } 
  }
  L
' file

Output:

370
94

or

awk '
  {split ("", N)
    L = 1
    for (i=1; i<=length; i++)
      N[substr($0,i,1)] = 1      # set all N elements for the digits in string

    for (i=0; i<9; i++)
      if (N[i] + N[i+1] == 2) {  # check for two adjacent elements to be TRUE
        L = 0          
        break
      }
  }
L
' file

Output:

370
94

Tested on Ubuntu 18.04

0
1

Here, as the list of combinations is relatively small, you might as well consider them all in a ERE alternation:

grep -vE '0.*1|1.*[02]|2.*[13]|3.*[24]|4.*[35]|5.*[46]|6.*[57]|7.*[68]|8.*[79]|9.*8'

The same with perl but using perl code in (??{...}) inside the regexp to match the next or previous digit:

perl -ne 'print unless /([0-8]).*(??{$1+1})/ || /([1-9]).*(??{$1-1})/'

With sed, you could append the list of consecutive pairs to the pattern space, and use back references to find the matches:

sed -ne '1{x;s/$/0123456789876543210/;x;}' -e 'G;/\(.\).*\(.\).*\n.*\1\2/!P'

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .