Convert to uppercase, except for escaped characters

Question

The methods I found break things further down the line by also affecting linebreaks.
For example...

$ message="First Line\nSecond Line"; 
$ echo "${message^^}"
FIRST LINE\NSECOND LINE

Is there an elegant way to convert a string to uppercase, but leaving escaped characters alone, to get the following output instead?

FIRST LINE\nSECOND LINE

I could just do something convoluted like changing "\n" to 0001 or something along those lines, apply the conversion and then return 0001 to "\n". But maybe there is a better way.

Is this for later inclusion as part of some other data, possibly in XML or JSON format? If so, a parser of that format may possibly have routines for turning strings into uppercase in the way you describe, as, for example, ascii_upcase in tho JSON parser jq, or the XPath function upper-case() for XML. — Kusalananda, Commented Jul 24, 2022 at 11:50
@Kusalananda For me this is only about text processing, but someone else stumbling across this question might have such a use case. — Ocean, Commented Jul 25, 2022 at 9:52

Stéphane Chazelas · Accepted Answer · 2022-04-26 07:48:31Z

With zsh instead of bash:

$ message="First Line\nSecond Line"
$ set -o extendedglob
$ print -r -- ${message//(#b)((\\?)|(?))/$match[2]$match[3]:u}
FIRST LINE\nSECOND LINE

In bash (or any shell) and with the GNU implementation of sed, you can do the same with:

$ printf '%s\n' "$message" | sed -E 's/(\\.)|(.)/\1\u\2/g'
FIRST LINE\nSECOND LINE

Some potentially more efficient variants as they minimise the number of substitutions:

zsh

print -r -- ${message//(#b)((\\?)|([^\\]##))/$match[2]$match[3]:u}

or

print -r -- ${message//(#b)((\\?)#)([^\\]##)/$match[1]$match[3]:u}

their GNU sed translations:

printf '%s\n' "$message" | sed -E 's/(\\.)|([^\\]+)/\1\U\2/g'

or

printf '%s\n' "$message" | sed -E 's/((\\.)*)([^\\]+)/\1\U\3/g'

Beware they convert \Mx (Meta-x, an escape sequence supported by zsh's print for instance and that expands to the 0xf8 byte ('x' + 0x80)) to \MX (0xd8). They also convert \x7a to \x7A or \u007a to \u007A or \Cx to \CX but that shouldn't be a problem as those expand to the same.

Stéphane Chazelas · Accepted Answer · 2022-04-25 19:05:08Z

3

I'd be tempted to interpret the escape sequences into literal characters:

message="First Line\nSecond Line"
declare -u Message                       # uppercase on assignment
printf -v Message -- "${message//%/%%}"  # assign
declare -p Message                       # inspect

result

declare -u msg="FIRST LINE
SECOND LINE"

edited Apr 25, 2022 at 19:05

Stéphane Chazelas

554k92 gold badges1.1k silver badges1.6k bronze badges

answered Apr 25, 2022 at 19:03

glenn jackman

86.8k16 gold badges120 silver badges173 bronze badges

3

Beware that with message='\141' for instance, you'd get declare -u Message="A" instead of declare -u Message="a"
– Stéphane Chazelas
Commented Apr 25, 2022 at 19:07
Note that any \ will ve doubled \\.
– user232326
Commented Apr 25, 2022 at 22:45
1

Not giving printf a format causes the change of % that you want to avoid by duplicating every %. However, a printf -v Message '%b' -- "${message}" will interpret back-slashed characters exactly as echo -e without changing the %s.
– user232326
Commented Apr 25, 2022 at 22:57
Please read: unix.stackexchange.com/q/700508/232326
– user232326
Commented Apr 27, 2022 at 19:26

Add a comment |

G-Man Says 'Reinstate Monica' · Accepted Answer · 2022-05-09 23:15:35Z

echo "$message"  |  sed -e 's/^[[:lower:]]/\u&/' -e 's/\([^\]\)\([[:lower:]]\)/\1\u\2/g' \
                                                 -e 's/\([^\]\)\([[:lower:]]\)/\1\u\2/g'

-e 's/^[[:lower:]]/\u&/' If the first character in the string (or, more generally, the first character on a line) is a lower-case letter, capitalize it. Because the first character on a line can’t be escaped. Duh. That’s a no-brainer.
-e 's/$[^\]$$[[:lower:]]$/\1\u\2/g' Look at the line two characters at a time. If a lower-case letter is preceded by something other than a backslash, leave the preceding character alone, and capitalize the lower-case letter.

You might think that this would be enough to process the entire line. Unfortunately, since it processes the line two characters at a time, it gets only every other letter:
```
$ echo "first line\nsecond line" | sed -e 's/$[^\]$$[[:lower:]]$/\1\u\2/g'
fIrSt LiNe\nSeCoNd LiNe
```
so,
-e 's/$[^\]$$[[:lower:]]$/\1\u\2/g' Do the exact same thing a second time. This will pick up the letters that were skipped on the first pass.

Alternative version:

echo "$message" | sed -e 's/^[[:lower:]]/\u&/' \
                                  -e ': loop; s/\([^\]\)\([[:lower:]]\)/\1\u\2/g; t loop'

Basically the same as the first version, but, instead of repeating the second s command, it iterates it with a loop.

Unfortunately, this will not work correctly for double backslashes:  foo\\bar will become FOO\\bAR, even though the b should be capitalized, since the \\ is an escaped backslash, and so should not cause the b to be escaped.

No, the first character could be escaped, like when you want to insert a tab at the beginning, which would be "\t". — Ocean, Commented May 9, 2022 at 15:19
One of us is not understanding the other. If the line begins with \t, then the first character is \. t is the second character. If I’m misunderstanding you, please explain more clearly. — G-Man Says 'Reinstate Monica', Commented May 9, 2022 at 22:36
Semantics. If a line begins with "\t", then the first character is an escaped "t". But one can also say that "\" is the first character. Depends on how you look at it, I guess. It could also be an escaped "\" by having "\\t", so one gets "\t" instead of the tab character. Since these constructs are supposed to represent a single character (\t is tab), I treat them as single entities, which was the origin of the misunderstanding. — Ocean, Commented May 10, 2022 at 12:08

Chris Davies · Accepted Answer · 2022-07-24 11:42:49Z

1

I'd consider evaluating the \n and other escape sequences at the point that the variable was defined. Here $message actually contains a newline.

message=$(printf '%b' 'First Line\nSecond Line')
echo "${message^^}"

Output

FIRST LINE
SECOND LINE

answered Jul 24, 2022 at 11:42

Chris Davies

119k16 gold badges164 silver badges292 bronze badges

Add a comment |

Kadir · Accepted Answer · 2022-04-26 08:31:51Z

0

The variable can be iterated line by line. Then concatenate the output again.

bash:

$ message="First Line\nSecond Line";
$ message=$(echo -e ${message} |while read -r line; do echo -n "${line^^}\n" ; done) && message=${message%??}
$ echo ${message} 
FIRST LINE\nSECOND LINE

edited Apr 26, 2022 at 8:31

answered Apr 26, 2022 at 7:09

Kadir

2741 silver badge6 bronze badges

See Understanding "IFS= read -r line", When is double-quoting necessary? and Why is printf better than echo?
– Stéphane Chazelas
Commented Apr 26, 2022 at 7:34
1

That will likely leave linefeeds alone, but the OP asked for all escaped characters to be left alone.
– Henrik supports the community
Commented Apr 26, 2022 at 8:04
1

Backslash processing should be removed from the while read loop for sure. Just edited the answer.
– Kadir
Commented Apr 26, 2022 at 8:35
(1) For starters, ${message} should be "$message". See ${variable_name} doesn’t mean what you think it does …. (2) You should explain your answer better — in particular (IMO) the %?? part. (You don’t need to explain it to me; I figured it out.) … … … … … … … … … … … … … … … Please do not respond in comments; edit your answer to make it clearer and more complete. … (Cont’d)
– G-Man Says 'Reinstate Monica'
Commented May 7, 2022 at 19:04
(Cont’d) … (3) This is a classic example of providing a solution for the example while ignoring the larger question. foo\012bar will turn into FOO\nBAR, \g\h\i\j\k\l\m\n\o\p\q will turn into \G\H\I\J\K\L\M\n\O\P\Q, and any of \a, \b, \c, \e, \f, \r, \t, \v, and \\ will cause problems. Also, leading and trailing spaces, and multiple spaces. (4) Strictly speaking, the question didn’t say that you should clobber the original variable. If you need a multi-step process, you should assign the intermediate value to a temp variable.
– G-Man Says 'Reinstate Monica'
Commented May 7, 2022 at 19:04

Add a comment |

jubilatious1 · Accepted Answer · 2022-07-24 11:33:13Z

Using Raku (formerly known as Perl_6)

~$ echo 'a\nb'
a\nb
~$ echo 'a\nb' | raku -pe 's:g/ <!after "\\"> (.) /{$0.uc}/;'
A\nB
~$ echo "a\\nb"
a\nb
~$ echo "a\\nb" | raku -pe 's:g/ <!after "\\"> (.) /{$0.uc}/;'
A\nB

Above uses a negative look-behind assertion, <!after "\\">, to select out all characters except those immediately after a \ backslash. Selected characters are then uppercased with Raku's .uc routine.

Certainly it's safer to provide the regex with a custom <-[ … ]> negative character class, sparing backslashed characters like \n and \t from being uppercased. (FYI, custom positive character classes are written <+[ … ]> or more simply <[ … ]> in Raku).

Below, using Raku's "Q-lang" (quoting language) to feed the substitution operator a string. In all four examples below \n is returned (not uppercase \N). Note in the third example how \n is operationally-interpreted as a newline character, and this remains unchanged in the fourth example, telling us that \n still exists in that string (i.e. it has NOT been uppercased to \N):

~$ raku -e 'put Q<a\nb>'
a\nb
~$ raku -e 'put Q<a\nb>' | raku -pe 's:g/ <!after "\\"> (<-[nt]>) /{$0.uc}/;'
A\nB
~$ raku -e 'put Q:b<a\nb>'
a
b
~$ raku -e 'put Q:b<a\nb>' | raku -pe 's:g/ <!after "\\"> (<-[nt]>) /{$0.uc}/;'
A
B

NOTE, see: "Place an escape sign before every non-alphanumeric characters" for Raku answers to a related question on StackOverflow.

References:
https://docs.raku.org/language/quoting
https://docs.raku.org/language/regexes#Literals_and_metacharacters
https://raku.org

Stack Exchange Network

Convert to uppercase, except for escaped characters

6 Answers 6

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
bash
escape-characters
.

Linked

Hot Network Questions

Convert to uppercase, except for escaped characters

6 Answers 6

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged bashescape-characters.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
bash
escape-characters
.