0

how to get substring from

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!         

to be

BEGIN!@#Ghjk,GhjkEND#@!

Note: there is whitespaces at end of lines, I tried removing whitespaces at end of lines but I cant.

I tried

#!/bin/bash

s=$(awk '/BEGIN!@#/,/END#@!/' switch.log )


while IFS= read -r line 
do

  h=$(echo "$line" | awk '{$1=$1;print}')
  for i in {0..100}
  do

    zzz=$(echo "$h"  | awk '{print $(NF-$i)}')

    if [ ! -z "$zzz" -a "$zzz" != " " ]; then

      hh=$(echo "$h"  | awk  '{print $(NF-$i)}') 
      echo "$zzz"

      echo  -e  "$zzz" >> ggg.txt
      break
    fi

  done

done <<< "$s"

I got

BEGIN!@#Ghjk,Ghj
2
  • Please do fix your samples in code tags as its not clear as of now. Thank you. Commented Nov 15, 2022 at 7:08
  • Also note, since your formatting was unclear, If you intended the input to be a multi-line input, please re-edit the questions and format the input properly. (in that case simple awk string-concatenation of $NF will do) Commented Nov 15, 2022 at 7:45

8 Answers 8

2

Another option is using sed with the normal substitute method storing the text you want to keep as the first two backreferences. For example:

sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< 'your string`

Example Use/Output

(note: updated to handle whitespace at the end)

$ sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< '42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!'
BEGIN!@#Ghjk,GhjkEND#@!

(note: single-quoting the string is required due to '!')

1
  • Indeed sed is better suited here
    – anubhava
    Commented Nov 15, 2022 at 8:13
2

Using sed

$ sed -E 's/[0-9]+[a-z]? +| +//g' input_file
BEGIN!@#Ghjk,GhjkEND#@!
4
  • 1
    Simple and effective sed
    – anubhava
    Commented Nov 15, 2022 at 8:14
  • 1
    Yep, nuking the unwanted as an approach is worth the nod as well. Commented Nov 15, 2022 at 8:38
  • thans allot , its working... but how :-) Commented Nov 16, 2022 at 22:35
  • @ahmedahmed Match either one or more integers followed by a space, one or more intergers with an optional ? alphabetic character followed by a space, or just spaces and remove them all.
    – sseLtaH
    Commented Nov 16, 2022 at 22:39
1

UPDATED, to fix an error: You have not defined precisely in your question, how the string to be extracted looks like in general, but based on your example, this would do:

if [[ $line =~ (BEGIN[^ ]+)\ .*([^ ]+END[^ ]+) ]]
then
  substring=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
else
  echo Pattern not found in line 1>&2
fi
9
  • Good solution, but you are missing the wanted 'k' in "kEND...". Commented Nov 15, 2022 at 7:17
  • This would be only an issue if the text inbetween could also have an END in it. But actually, my solution is incorrect for a different reason, and I will update my answer. Commented Nov 15, 2022 at 7:20
  • @DavidC.Rankin : My updated solution now does not incorrectly catch the 6b 45 4e 44 23 40 21 anymore. I missed this point in my first attempt. Commented Nov 15, 2022 at 7:25
  • Cleaner than what I was toying with [[ $line =~ ^.*(BEGIN[^ ]+).*(kEND[^ ].*)$ ]] Commented Nov 15, 2022 at 7:29
  • @DavidC.Rankin :In your approach, maybe a .*? between the groups would be better. The main problem of course is that the question is poorly asked and we really don't know how the delimeters look like, and as long as we have to guess, any solution would do which works for the concrete example. Commented Nov 15, 2022 at 7:32
0

I would harness GNU AWK for this task following way, let file.txt content be

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!        

then

awk 'BEGIN{FPAT="[^[:space:]]*(BEGIN|END)[^[:space:]]*";OFS=""}{$1=$1;print}' file.txt

gives output

BEGIN!@#Ghjk,GhjkEND#@!

Explanation: I inform GNU AWK using field pattern (FPAT) that field is BEGIN or (|) END, prefixed and suffixed by zero-or-more (*) non (^)-whitespace ([:space:]) characters and output field separator (OFS) is empty string, then for each line I do $1=$1 to trigger line rebuilt and print it. If you are sure only space characters are used in line you might elect to replace [^[:space:]] using [^ ]

(tested in gawk 4.2.1)

0

s=$(awk '/BEGIN!@#/,/END#@!/' switch.log) echo "$s" > ggg.txt

ss=$(sed -E 's/[0-9]+[a-z]? +| +//g' ggg.txt ) echo "$ss" > ddd.txt

sss=$(awk '{print $1}' ddd.txt) echo "$sss" > hhhh.txt

ssss=$(awk '/BEGIN!@#/,/END#@!/' hhhh.txt) echo "$ssss" > hhh.txt

aaa=$(<hhh.txt) aaa=$(cat hhh.txt | tr -d '\n' )

1
0

By setting the awk record separator RS to " ", awk processes a white-spaced-separated portion of your file at a time (with each record containing only one field). So the two parts that are needed can be extracted with simple awk condition patterns /BEGIN/ and /END/. There can be no white space in any record since this was the delimiter.

If printed, the pattern-filtered records would normally be separated by a new line (the default output record-separator ORS) but this can be changed to an empty string ORS="" to make the two print statements run into one another with no space.

Thus this simple awk command will return the required fields as a concatentated string with no white space:

awk ' BEGIN{RS=" ";ORS=""} /BEGIN/{print} /END/{print}' file.txt

output:

BEGIN!@#Ghjk,GhjkEND#@!
0
$ grep -oE '(BEGIN|END)\S*' file | paste -sd'\0'

BEGIN!@#Ghjk,GhjEND#@!
0
echo ' 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 '   \
     '6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!' | 
{m,g}awk NF=NF FS='[ \t]*([^ \t][^ \t][ \t]+)+[ \t]*' OFS=
BEGIN!@#Ghjk,GkEND#@!

Not the answer you're looking for? Browse other questions tagged or ask your own question.