Trying to get generic sed multi-line pattern match and substitution script to work

Question

There is a generic approach to solving the problem presented by this poster which is presented on linuxtopia here, in section 4.23.3 . It appears to offer a method for handling any complex content pattern for matching target, then replacing that with, again, any other complex content pattern. The technique is referred to as the "sliding-window" technique.

I believe the below script faithfully recreates the scenario described and attempts to incorporate the sed script to demonstrate that approach as workable.

#!/bin/bash

DBG=1

###
### Code segment to be replaced
###
file1="File1.cpp"
rm -f "${file1}"
cat >"${file1}" <<"EnDoFiNpUt"
void Component::initialize()
{
    my_component = new ComponentClass();
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 1"

###
### Code segment to be used as replacement
###
file2="File2.cpp"
rm -f "${file2}"
cat >"${file2}" <<"EnDoFiNpUt"
void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 2"

###
### Create demo input file
###
testfile="Test_INPUT.cpp"
rm -f "${testfile}"
{
    echo "
other code1()
{
    doing other things
    doing more things
    doing extra things
} 
"
    cat "${file1}"

echo "
other code2()
{
    creating other things
    creating more things
    creating extra things
} 
"
} >>"${testfile}"

test ${DBG} -eq 1 && echo "fence 3"

###
### Create editing specification file
###
{
    cat "${file1}"
    echo "###REPLACE_BY###"
    cat "${file2}"
} >findrep.txt

test ${DBG} -eq 1 && echo "fence 4"


###
### sed script to create editing instructions to apply aove editing specification file
###
cat >"blockrep.sed" <<"EnDoFiNpUt"
#SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
#
# filename: blockrep.sed
#   author: Paolo Bonzini
# Requires:
#    (1) blocks to find and replace, e.g., findrep.txt
#    (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
:a
/^###REPLACE_BY###$/! {
   # Escape slashes, backslashes, the final newline and
   # regular expression metacharacters.
   s,[/\[.*],\\&,g
   s/$/\\/
   H
   #
   # Append N cmds needed to maintain the sliding window.
   x
   1 s,^.,s/,
   1! s/^/N\
/
   x
   n
   ba
}
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
x
s,\\$,/,
x
#
# Until EOF, gather the substitution into hold space.
:b
n
s,[/\],\\&,g
$! s/$/\\/
H
$! bb
#
# Start the RHS of the s command without a leading
# newline, add the P/D pair for the sliding window, and
# print the script.
g
s,/\n,/,
s,$,/\
P\
D,p
#---end of script---
EnDoFiNpUt

test ${DBG} -eq 1 && echo "fence 5"


sed --debug -nf blockrep.sed findrep.txt >custom.sed
test ${DBG} -eq 1 && echo "fence 6"

if [ -s custom.sed ]
then
    more custom.sed
    echo -e "\t Hit return to continue ..." >&2
    read k <&2
else
    echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
    exit 1
fi

testout="Test_OUTPUT.cpp"

sed -f custom.sed "${testfile}" >"${testout}"
test ${DBG} -eq 1 && echo "fence 7"

if [ -s "${testout}" ]
then
    more "${testout}"
else
    echo -e "\t Failed to create '${testout}'.\n" >&2
    exit 1
fi

Unfortunately, what they presented doesn't seem to work. I wish there was something like bash's "set -x" for command expansion/reporting of sed execution to stderr, but I haven't found anything like that.

The execution log for the above is as follows:

fence 1
fence 2
fence 3
fence 4
fence 5
sed: file blockrep.sed line 19: unterminated `s' command
fence 6
     Failed to create 'custom.sed'.  Unable to proceed!

Maybe an expert out there can resolve the logic error in the imported blockrep.sed script ... because I can't get my head wrapped around it to fix it, even with all the comments provided.

I openly attest to the fact that I am very simplistic/limited in both my knowledge, and my usage, of sed. I couldn't begin to understand how that "blockrep.sed" script is trying to do what it claims, only that it states all content of findrep.txt, before the defined separator string "###REPLACE_BY###", is to be replaced by all below that same separator.

In my view, the approach identified by the linuxtopia guide would have broad application and be beneficial for many, including the OP and myself.

so you're married to sed to solve this problem? It would be much easier to solve with awk or perl or many others. Good luck. — shellter, Commented Feb 1, 2023 at 5:02
@shellter, I rarely use sed for complex problems; usually only for short string strip/mapping. AWK is my preferred tool, as evidenced by my responses to other postings. However, I love the simplicity of the approach proposed by the linuxtopia solution, and would like to get my hands on a working version of that. — Eric Marceau, Commented Feb 1, 2023 at 15:56
I tried implementing the code from linixopedia but got sed: file blockrep.sed line 18: unterminated s' command`. When I (thought) I fixed that, the script did not perform as expected. Did it work for you? Don't have much time to spend on this. Good luck in your endeavors! — shellter, Commented Feb 1, 2023 at 16:54
@shelter, as in my OP, I got the *same error message for sed. :-( — Eric Marceau, Commented Feb 1, 2023 at 19:13

Eric Marceau · Accepted Answer · 2023-02-04 23:04:39Z

I resorted to discretizing portions of the blockrep.sed script to see if I could identify a source of breakdown. While that made no logical difference on the surface, that did create a functional and well-formed structure ... which did create a usable custom.sed, but only after I removed the --debug option for the execution of blockrep.sed. That is required because the degug info is not sent to the stderr, but is inline with the stdout !!! I don't know enought to classify that as a bug.

NOTE: I've also added command-line options to specify names of files for input, output, old_pattern, new_pattern, divider, among others.

The now modified and working version of the script is as follows:

#!/bin/bash

reportFiles ()
{
    ls -l "${fileSrchPat}" "${fileReplPat}" "findrep.txt" 2>&1
    ls -l "blockrep.sed" "custom.sed" "custom.err" 2>&1
    ls -l "${fileBefore}" 2>&1
    ls -l "${fileOutput}" 2>&1
}


DBG=0
DBGs=0
dumA=1 ;
dumB=1 ;
fileSrchPat=""
fileReplPat=""
divider="----REPLACE_BY----"
doReview=0
fileBefore=""
fileOutput=""

while [ $# -gt 0 ]
do
    case $1 in
        --debug )
            DBG=1 ;
            shift ;;
        --debug_sed )
            DBGs=1 ;
            shift ;;
        --verbose )
            set -x
            shift ;;
        --old_pattern )
            dumA=0 ;
            fileSrchPat="$2" ;
            if [ ! -s "${fileSrchPat}" ]
            then
                echo -e "\n File '${fileSrchPat}' not found.\n Bye!\n"
                exit 1
            fi ;
            shift ; shift ;;
        --new_pattern )
            dumA=0 ;
            fileReplPat="$2" ;
            if [ ! -s "${fileReplPat}" ]
            then
                echo -e "\n File '${fileReplPat}' not found.\n Bye!\n"
                exit 1
            fi ;
            shift ; shift ;; 
        --pattern_sep )
            divider="$2" ;
            shift ; shift ;;
        --input )
            dumB=0 ;
            fileBefore="$2" ;
            if [ ! -s "${fileBefore}" ]
            then
                echo -e "\n File '${fileBefore}' not found.\n Bye!\n"
                exit 1
            fi ;
            shift ; shift ;;
        --output )
            fileOutput="$2" ;
            if [ ! -s "${fileOutput}" ]
            then
                echo -e "\n File '${fileOutput}' already exists.  Overwrite ? [y|N] => \c"
                read goAhead
                if [ -z "${goAhead}" ] ; then  goAhead="N" ; fi
                case ${goAhead} in
                    y* | Y* ) rm -vf "${fileOutput}" ;;
                    * ) echo -e "\n\t Process abandonned.\n Bye!\n" ; exit 1 ;;
                esac
                exit 1
            fi ;
            shift ; shift ;;
        --review ) doReview=1 ; shift ;;
        * ) echo "\n invalid option used on command line.  Only valid options: [ --old_pattern {textfile1} | --new_pattern {textfile2} ] \n Bye!\n" ; exit 1 ;;
    esac
done

###
### Code segment to be replaced
###
if [ -z "${fileSrchPat}" ]
then
    fileSrchPat="File1.cpp"
    rm -f "${fileSrchPat}"
    cat >"${fileSrchPat}" <<"EnDoFiNpUt"
void Component::initialize()
{
    my_component = new ComponentClass();
}
EnDoFiNpUt

fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 1"


###
### Code segment to be used as replacement
###
if [ -z "${fileReplPat}" ]
then
    fileReplPat="File2.cpp"
    rm -f "${fileReplPat}"
    cat >"${fileReplPat}" <<"EnDoFiNpUt"
void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}
EnDoFiNpUt

fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 2"


if [ -z "${fileBefore}" ]
then
###
### Create demo input file
###
    fileBefore="Test_INPUT.cpp"
    rm -f "${fileBefore}"
    {
    echo "
other code1()
{
    doing other things
    doing more things
    doing extra things
} 
"
    cat "${fileSrchPat}"

    echo "
other code2()
{
    creating other things
    creating more things
    creating extra things
} 
"
    } >>"${fileBefore}"

fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 3"

###
### Create editing specification file
###
{
    cat "${fileSrchPat}"
    echo "${divider}"
    cat "${fileReplPat}"
} >findrep.txt

test ${DBG} -eq 1 && echo -e "\n\t ======== fence 4"


###
### sed script to create editing instructions to apply above editing specification file
###
test ${DBG} -eq 1 && rm -fv "blockrep.sed" || rm -f "blockrep.sed"

cat >"blockrep.sed" <<"EnDoFiNpUt"
#SOURCE:    https://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq4_013.html
#
# filename: blockrep.sed
#   author: Paolo Bonzini
# 
# Modified by: Eric Marceau, Feb 2023
#
# Requires:
#    (1) blocks to find and replace, e.g., findrep.txt
#    (2) an input file to be changed, input.file
#
# blockrep.sed creates a second sed script, custom.sed,
# to find the lines above the row of 4 hyphens, globally
# replacing them with the lower block of text. GNU sed
# is recommended but not required for this script.
#
# Loop on the first part, accumulating the `from' text
# into the hold space.
#
##############################################################################
### Reworked Discretized version of coding
##############################################################################
#
##############################################################################
### Begin of capture - SEARCH pattern
##############################################################################
:markerA
EnDoFiNpUt

#echo "/^----REPLACE_BY----\$/! {" >> "blockrep.sed"
echo "/^${divider}\$/! {" >> "blockrep.sed"

cat >>"blockrep.sed" <<"EnDoFiNpUt"
#
# Escape slashes
    s,[/],\\&,g
#
# Escape backslashes
    s,[\],\\&,g
#
# Escape regular expression metacharacters
    s,[[],\\&,g
    s,[.],\\&,g
    s,[*],\\&,g
#
# Escape the final newline
#  add backslash to end of line (to avoid having sed 
#  think of as end of command input)
    s,$,\\,
#
# APPEND  -  PATTERN space to HOLD space
    H
#
# Sequence to APPEND "N" cmds needed to maintain the sliding window.
# \\ swap contents - HOLD and PATTERN space
    x
#
# If first line, begin constructing sed command for pattern match and replace
    1 s,^.,s/,
#
# If not first line, add line with "N" 
#  i.e. give instruction to "APPEND the next line of input into the pattern space"
    1! s,^,N\
,
# // swap contents again - HOLD and PATTERN space
    x
#
# COPY  -  next line of input into PATTERN space
    n
#
# branch/jump to label markerA
    b markerA
}
#
##############################################################################
### End of capture - SEARCH pattern
##############################################################################
#
#
##############################################################################
### Begin of capture - REPLACEMENT pattern
##############################################################################
#
# \\ swap contents - HOLD and PATTERN space
    x
#
# Change the final backslash to a slash to separate the
# two sides of the s command.
    s,\\$,/,
#
# // swap contents again - HOLD and PATTERN space
    x
#
# Until EOF, gather the REPLACEMENT TEXT into the hold space.
:markerB
    n
#
# Escape slashes
    s,[/],\\&,g
#
# Escape backslashes
    s,[\],\\&,g
#
# If not last line, add backslash to escape all instances of "$".
    $! s,$,\\,
#
# APPEND  -  PATTERN space to HOLD space
    H
#
# If not last line, branch/jump to markerB
    $! b markerB
#
##############################################################################
### End of capture - SEARCH pattern
##############################################################################
#
#
# Start the Right-Hand Side (RHS) of the "s" command without a leading newline,
# add the P/D pair for the sliding window, and
# print the script.
#
# COPY  -  HOLD space to PATTERN space
    g
    s,/\n,/,
#
# (P) Print up to the first embedded newline of the current pattern space.
#  then
# (D) If  pattern  space  contains no newline, start a normal new cycle as if 
#   the d command was issued.  Otherwise, delete text in the pattern space 
#   up to the first newline, and restart cycle with the resultant pattern space,
#   without reading a new line of input.
#  then
# (p) Print the current pattern space.
    s,$,/\
    P\
    D,p
#---end of script---
EnDoFiNpUt

test ${DBG} -eq 1 && echo -e "\n\t ======== fence 5"

test ${DBG} -eq 1 && rm -fv custom.sed custom.err || rm -f custom.sed custom.err

if [ ${DBGs} -eq 1 ]
then
    echo -e "\n\t NOTE:  debug mode active for 'sed' command ..."
    sed --debug -f blockrep.sed findrep.txt >custom.sed 2>custom.err
else
    sed -nf blockrep.sed findrep.txt >custom.sed 2>custom.err
fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 6"


if [ -s custom.err ]
then
    if [ ${doReview} -eq 1 ]
    then
        cat custom.err
    fi
fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 7"


if [ -s custom.sed ]
then
    if [ ${doReview} -eq 1 ]
    then
        more custom.sed
        echo -e "\n\t Hit return to continue ..." >&2
        read k <&2
    fi
    if [ ${DBGs} -eq 1 ]
    then
        more custom.sed
        echo -e "\n =============  End of Review - 'custom.sed' containing execution debug reporting  =============" >&2
        echo -e "\n\t 'custom.sed' is not in usable form due to '--debug' messaging." >&2
        echo -e   "\t Abandoning before attempting final transformation.\n Bye!\n" >&2
        exit 2
    fi
else
    echo -e "\t Failed to create 'custom.sed'.  Unable to proceed!\n" >&2
    exit 1
fi
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 8"


if [ -z "${fileOutput}" ]
then
    fileOutput="Test_OUTPUT.cpp"
fi
rm -f "${fileOutput}"
sed -f custom.sed "${fileBefore}" >"${fileOutput}"
test ${DBG} -eq 1 && echo -e "\n\t ======== fence 9"


if [ -s "${fileOutput}" ]
then
    test ${DBG} -eq 1 && echo -e "\n\t ======== fence 10"
    if [ ${doReview} -eq 1 ]
    then
        more "${fileOutput}"
    fi
    if [ ${DBG} -eq 1 ]
    then
        reportFiles
        if [ ${dumA} -eq 1 ]
        then
            rm -fv "${fileSrchPat}" "${fileReplPat}" 2>&1
        fi
        if [ ${dumB} -eq 1 ]
        then
            rm -fv "${fileBefore}" 2>&1
        fi
        rm -fv "findrep.txt" "custom.sed" "custom.err" "blockrep.sed" 2>&1
    else
        if [ ${dumA} -eq 1 ]
        then
            rm -f "${fileSrchPat}" "${fileReplPat}" 2>&1
        fi
        if [ ${dumB} -eq 1 ]
        then
            rm -f "${fileBefore}" 2>&1
        fi
        rm -f "findrep.txt" "custom.sed" "custom.err" "blockrep.sed" 2>&1
    fi | awk '{ printf("\t %s\n", $0 ) ; }' >&2
else
    echo -e "\t Failed to create '${fileOutput}'.\n" >&2
    reportFiles | awk '{ printf("\t %s\n", $0 ) ; }' >&2
    exit 1
fi


exit

Using the following command to execute (using default built-in demo case)

./script.sh --debug --review

The resulting session output is

     ======== fence 1
     ======== fence 2
     ======== fence 3
     ======== fence 4
     ======== fence 5
     ======== fence 6
     ======== fence 7
N
N
N
s/void Component::initialize()\
{\
    my_component = new ComponentClass();\
}/void Component::initialize()\
{\
    if (doInit)\
    {\
        my_component = new ComponentClass();\
    }\
    else\
    {\
        my_component.ptr = null;\
    }\
}/
    P
    D

     Hit return to continue ...

     ======== fence 8
     ======== fence 9
     ======== fence 10

other code1()
{
    doing other things
    doing more things
    doing extra things
} 

void Component::initialize()
{
    if (doInit)
    {
        my_component = new ComponentClass();
    }
    else
    {
        my_component.ptr = null;
    }
}

other code2()
{
    creating other things
    creating more things
    creating extra things
} 

 -rw-rw-r-- 1 ericthered ericthered  71 Feb  4 16:05 File1.cpp
 -rw-rw-r-- 1 ericthered ericthered 130 Feb  4 16:05 File2.cpp
 -rw-rw-r-- 1 ericthered ericthered 220 Feb  4 16:05 findrep.txt
 -rw-rw-r-- 1 ericthered ericthered 3604 Feb  4 16:05 blockrep.sed
 -rw-rw-r-- 1 ericthered ericthered    0 Feb  4 16:05 custom.err
 -rw-rw-r-- 1 ericthered ericthered  229 Feb  4 16:05 custom.sed
 -rw-rw-r-- 1 ericthered ericthered 240 Feb  4 16:05 Test_INPUT.cpp
 -rw-rw-r-- 1 ericthered ericthered 299 Feb  4 16:06 Test_OUTPUT.cpp
 removed 'File1.cpp'
 removed 'File2.cpp'
 removed 'Test_INPUT.cpp'
 removed 'findrep.txt'
 removed 'custom.sed'
 removed 'custom.err'
 removed 'blockrep.sed'

Which is as was initially intended. Success!

As long as all source code files are standardized, and elements were copy+pasted, the above solution will cover everything. ... Now, all I need to do is add logic to strip leading whitespaces, trailing whitespaces, and map multiple-whitespaces to a single-space, for both the match pattern, and the edited file "evaluation buffer", to allow for random format of manually entered copies, to ensure no cases are overlooked. That will be a biggie! Not sure that stripping/mapping can happen without affecting the format of the final output. :-( — Eric Marceau, Commented Feb 3, 2023 at 17:42
Its good to have a challenging project to help improve your understandings. Good luck! — shellter, Commented Feb 3, 2023 at 17:52
@shellter, do you know what "RHS" stands for ? I've search online and looked at many of the sed books and none make mention of what that acronym means. Does it mean "Replace/Hold Space" or "Run/Hold Space" ? Is that RHS acronym documented anywhere that is officially related to the sed project ? — Eric Marceau, Commented Feb 3, 2023 at 23:03
Yes, that is an obscure one. And good guess, but almost always RightHandSide vs LeftHandSide (LHS), as in if (LHS < RHS); .... . Probably also applies to s/LHS/RHS/. (I think!) . ... Does that make sense for your context? Good luck. — shellter, Commented Feb 3, 2023 at 23:09

Collectives™ on Stack Overflow

Trying to get generic sed multi-line pattern match and substitution script to work

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
bash
sed
code-transformation
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged bashsedcode-transformation or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
bash
sed
code-transformation
or ask your own question.