how to get substring from

Question

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!

to be

BEGIN!@#Ghjk,GhjkEND#@!

Note: there is whitespaces at end of lines, I tried removing whitespaces at end of lines but I cant.

I tried

#!/bin/bash

s=$(awk '/BEGIN!@#/,/END#@!/' switch.log )


while IFS= read -r line 
do

  h=$(echo "$line" | awk '{$1=$1;print}')
  for i in {0..100}
  do

    zzz=$(echo "$h"  | awk '{print $(NF-$i)}')

    if [ ! -z "$zzz" -a "$zzz" != " " ]; then

      hh=$(echo "$h"  | awk  '{print $(NF-$i)}') 
      echo "$zzz"

      echo  -e  "$zzz" >> ggg.txt
      break
    fi

  done

done <<< "$s"

I got

BEGIN!@#Ghjk,Ghj

Please do fix your samples in code tags as its not clear as of now. Thank you. — RavinderSingh13, Commented Nov 15, 2022 at 7:08
Also note, since your formatting was unclear, If you intended the input to be a multi-line input, please re-edit the questions and format the input properly. (in that case simple awk string-concatenation of $NF will do) — David C. Rankin, Commented Nov 15, 2022 at 7:45

David C. Rankin · Accepted Answer · 2022-11-15 07:35:04Z

2

Another option is using sed with the normal substitute method storing the text you want to keep as the first two backreferences. For example:

sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< 'your string`

Example Use/Output

(note: updated to handle whitespace at the end)

$ sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< '42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!'
BEGIN!@#Ghjk,GhjkEND#@!

(note: single-quoting the string is required due to '!')

edited Nov 15, 2022 at 7:35

answered Nov 15, 2022 at 7:19

David C. Rankin

83.7k6 gold badges64 silver badges90 bronze badges

Indeed sed is better suited here
– anubhava
Commented Nov 15, 2022 at 8:13

Add a comment |

sseLtaH · Accepted Answer · 2022-11-15 08:03:31Z

2

Using sed

$ sed -E 's/[0-9]+[a-z]? +| +//g' input_file
BEGIN!@#Ghjk,GhjkEND#@!

answered Nov 15, 2022 at 8:03

sseLtaH

11k5 gold badges16 silver badges33 bronze badges

1

Simple and effective sed
– anubhava
Commented Nov 15, 2022 at 8:14
1

Yep, nuking the unwanted as an approach is worth the nod as well.
– David C. Rankin
Commented Nov 15, 2022 at 8:38
thans allot , its working... but how :-)
– ahmed ahmed
Commented Nov 16, 2022 at 22:35
@ahmedahmed Match either one or more integers followed by a space, one or more intergers with an optional ? alphabetic character followed by a space, or just spaces and remove them all.
– sseLtaH
Commented Nov 16, 2022 at 22:39

Add a comment |

user1934428 · Accepted Answer · 2022-11-15 07:47:56Z

1

UPDATED, to fix an error: You have not defined precisely in your question, how the string to be extracted looks like in general, but based on your example, this would do:

if [[ $line =~ (BEGIN[^ ]+)\ .*([^ ]+END[^ ]+) ]]
then
  substring=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
else
  echo Pattern not found in line 1>&2
fi

edited Nov 15, 2022 at 7:47

answered Nov 15, 2022 at 7:14

user1934428

21.6k7 gold badges49 silver badges97 bronze badges

Good solution, but you are missing the wanted 'k' in "kEND...".
– David C. Rankin
Commented Nov 15, 2022 at 7:17
This would be only an issue if the text inbetween could also have an END in it. But actually, my solution is incorrect for a different reason, and I will update my answer.
– user1934428
Commented Nov 15, 2022 at 7:20
@DavidC.Rankin : My updated solution now does not incorrectly catch the 6b 45 4e 44 23 40 21 anymore. I missed this point in my first attempt.
– user1934428
Commented Nov 15, 2022 at 7:25
Cleaner than what I was toying with [[ $line =~ ^.*(BEGIN[^ ]+).*(kEND[^ ].*)$ ]]
– David C. Rankin
Commented Nov 15, 2022 at 7:29
@DavidC.Rankin :In your approach, maybe a .*? between the groups would be better. The main problem of course is that the question is poorly asked and we really don't know how the delimeters look like, and as long as we have to guess, any solution would do which works for the concrete example.
– user1934428
Commented Nov 15, 2022 at 7:32

| Show 4 more comments

Daweo · Accepted Answer · 2022-11-15 10:46:48Z

I would harness GNU AWK for this task following way, let file.txt content be

 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!

then

awk 'BEGIN{FPAT="[^[:space:]]*(BEGIN|END)[^[:space:]]*";OFS=""}{$1=$1;print}' file.txt

gives output

BEGIN!@#Ghjk,GhjkEND#@!

Explanation: I inform GNU AWK using field pattern (FPAT) that field is BEGIN or (|) END, prefixed and suffixed by zero-or-more (*) non (^)-whitespace ([:space:]) characters and output field separator (OFS) is empty string, then for each line I do $1=$1 to trigger line rebuilt and print it. If you are sure only space characters are used in line you might elect to replace [^[:space:]] using [^ ]

(tested in gawk 4.2.1)

ahmed ahmed · Accepted Answer · 2022-11-16 22:40:21Z

0

s=$(awk '/BEGIN!@#/,/END#@!/' switch.log) echo "$s" > ggg.txt

ss=$(sed -E 's/[0-9]+[a-z]? +| +//g' ggg.txt ) echo "$ss" > ddd.txt

sss=$(awk '{print $1}' ddd.txt) echo "$sss" > hhhh.txt

ssss=$(awk '/BEGIN!@#/,/END#@!/' hhhh.txt) echo "$ssss" > hhh.txt

aaa=$(<hhh.txt) aaa=$(cat hhh.txt | tr -d '\n' )

answered Nov 16, 2022 at 22:40

ahmed ahmed

1

Welcome to SO! Please refer to How do I write a good answer guidline to improve your contribution.
– Jonathan
Commented Nov 23, 2022 at 10:06

Add a comment |

Dave Pritlove · Accepted Answer · 2022-11-17 00:10:50Z

By setting the awk record separator RS to " ", awk processes a white-spaced-separated portion of your file at a time (with each record containing only one field). So the two parts that are needed can be extracted with simple awk condition patterns /BEGIN/ and /END/. There can be no white space in any record since this was the delimiter.

If printed, the pattern-filtered records would normally be separated by a new line (the default output record-separator ORS) but this can be changed to an empty string ORS="" to make the two print statements run into one another with no space.

Thus this simple awk command will return the required fields as a concatentated string with no white space:

awk ' BEGIN{RS=" ";ORS=""} /BEGIN/{print} /END/{print}' file.txt

output:

BEGIN!@#Ghjk,GhjkEND#@!

karakfa · Accepted Answer · 2022-11-19 17:28:11Z

0

$ grep -oE '(BEGIN|END)\S*' file | paste -sd'\0'

BEGIN!@#Ghjk,GhjEND#@!

answered Nov 19, 2022 at 17:28

karakfa

67.1k8 gold badges43 silver badges57 bronze badges

Add a comment |

RARE Kpop Manifesto · Accepted Answer · 2022-11-22 09:12:39Z

0

echo ' 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 '   \
     '6a BEGIN!@#Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND#@!' |

{m,g}awk NF=NF FS='[ \t]*([^ \t][^ \t][ \t]+)+[ \t]*' OFS=

BEGIN!@#Ghjk,GkEND#@!

answered Nov 22, 2022 at 9:12

RARE Kpop Manifesto

2,7204 silver badges13 bronze badges

Add a comment |

Collectives™ on Stack Overflow

how to get substring from

8 Answers 8

Not the answer you're looking for? Browse other questions tagged
bash
awk
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Not the answer you're looking for? Browse other questions tagged bashawk or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
bash
awk
or ask your own question.