1

I have a string like : <a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -

stored in a variable a

I tried to extract from it the sequence: 2021_03_19, the second one after /"> sequence with the following script:

a=${a##'/">'}
a=${a%%'/</a'}

But the final result is the same string as the input.

1
  • 1
    That looks like html, but the /" is invalid for a html. Is this only a part of html file? If so - use xmllint or xmlstarlet or other xml aware tool.
    – KamilCuk
    Commented Mar 23, 2021 at 8:30

3 Answers 3

2

The pattern in the parameter expansion needs to match the entire string you want to remove. You are trying to trim the literal prefix /"> but of course the string does not begin with this string, so the parameter expansion does nothing.

Try

a=${a##*'/">'}
a=${a%%'/</a'*}

The single quotes are kind of unusual; I would perhaps instead backslash-escape each metacharacter which should be matched literally.

a=${a##*/\"\>}
a=${a%%/\</a*}
0
1

You have to match the before and after pattern too.

a=${a##*'/">'}
a=${a%%'/</a'*}
0
1

You could use:

a='<a href="2021_03_19/">2021_03_19/</a> 19-Mar-2021 11:55 -'
b=${a#*>}
c=${b%%/<*}

Based on Extract substring in Bash

In your example you want to select based on 3 characters but have ##, not ###. I did try that but doesn't seem to work either. So, therefore an alternative solution.

Not the answer you're looking for? Browse other questions tagged or ask your own question.