Revisions to How to convert a string to lower case in Bash

Added note about a performance metric

Source Link

edited Jul 28, 2018 at 23:45

Dejay Clayton

3.8k
2
30
20

2s for my approach to lowercase; 12s for uppercase
4s for tr to lowercase; 4s for uppercase
20s for Orwellophile's approach to lowercase; 29s for uppercase
75s for ghostdog74's approach to lowercase; 669s for uppercase. It's interesting to note how dramatic the performance difference is between a test with predominant matches vs. a test with predominant misses
467s for technosaurus' approach to lowercase; 449s for uppercase
660s for JaredTS486's approach to lowercase; 660s for uppercase. It's interesting to note that this approach generated continuous page faults (memory swapping) in Bash

2s for my approach to lowercase; 12s for uppercase
4s for tr to lowercase; 4s for uppercase
20s for Orwellophile's approach to lowercase; 29s for uppercase
75s for ghostdog74's approach to lowercase; 669s for uppercase
467s for technosaurus' approach to lowercase; 449s for uppercase
660s for JaredTS486's approach to lowercase; 660s for uppercase. It's interesting to note that this approach generated continuous page faults (memory swapping) in Bash

2s for my approach to lowercase; 12s for uppercase
4s for tr to lowercase; 4s for uppercase
20s for Orwellophile's approach to lowercase; 29s for uppercase
75s for ghostdog74's approach to lowercase; 669s for uppercase. It's interesting to note how dramatic the performance difference is between a test with predominant matches vs. a test with predominant misses
467s for technosaurus' approach to lowercase; 449s for uppercase
660s for JaredTS486's approach to lowercase; 660s for uppercase. It's interesting to note that this approach generated continuous page faults (memory swapping) in Bash

Augmented description with timing information

Source Link

edited Jul 28, 2018 at 23:35

Dejay Clayton

3.8k
2
30
20

This is a far faster variation of JaredTS486's approach JaredTS486's approach that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach.

I've timed 1,000 iterations of eachthis approach for a small string (25 characters) and a larger string (445 characters), consisting ofboth for lowercase and uppercase conversions. Since the poem "The Robin" by Witter Bynner)test strings are predominantly lowercase, conversions to lowercase are generally faster than to uppercase.

TimingI've compared my approach with several other answers on this page that are compatible with Bash 3.2. My approach is far more performant than most approaches documented here, and is even faster than tr in several cases.

Here are the timing results for 1,000 iterations of 25 characters:

6 seconds0.46s for my approach to lowercase; 0.96s for uppercase
8 seconds1.16s for Orwellophile's approach to lowercase; 1.59s for uppercase

3.67s for tr '[:lower:]' '[:upper:]' to lowercase; 3.81s for uppercase
9 seconds11.12s for Orwellophile's approach ghostdog74's approach to lowercase; 31.41s for uppercase
35 seconds26.25s for technosaurus' approach to lowercase; 26.21s for uppercase

25.06s for JaredTS486's approach to lowercase; 27.04s for uppercase

Timing results for 1,000 iterations of 445 characters (consisting of the poem "The Robin" by Witter Bynner):

9 seconds2s for my approach to lowercase; 12s for uppercase

4s for tr '[:lower:]' '[:upper:]' to lowercase; 4s for uppercase
17 seconds20s for myOrwellophile's approach to lowercase; 29s for uppercase

75s for ghostdog74's approach to lowercase; 669s for uppercase
25 seconds467s for [Orwellophile's approach]technosaurus' approach to lowercase; 449s for uppercase
829 seconds660s for JaredTS486's approach to lowercase; 660s for uppercase. It's interesting to note that this approach generated continuous page faults (memory swapping) in Bash

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function ucaselcase()
{
  local TARGET="${1-}"
  local LCHAR=''UCHAR=''
  local LOFFSET=''UOFFSET=''

  while [[ "${TARGET}" =~ ([a[A-z]Z]) ]]
  do
    LCHAR="$UCHAR="${BASH_REMATCH[1]}"
    LOFFSET="$UOFFSET="${LCS%%$UCS%%${LCHARUCHAR}*}"
    TARGET="${TARGET//${LCHARUCHAR}/${UCSLCS:${#LOFFSET#UOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

echo "OUTPUT:function [$ucase()
{
 ucase 'Changelocal MeTARGET="${1-}"
 To Alllocal Capitals'LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z])]" ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

The approach is simple: while the input string has any remaining lowercaseuppercase letters present, find the firstnext one, and replace all instances of that letter with its uppercaselowercase variant. Repeat until all lowercaseuppercase letters are replaced.

Some performance characteristics of my solution:

Uses only shell builtin utilities, which avoids the overhead of invoking external binary utilities in a new process

Avoids sub-shells, which incur performance penalties

Uses shell mechanisms that are compiled and optimized for performance, such as global string replacement within variables, variable suffix trimming, and regex searching and matching. These mechanisms are far faster than iterating manually through strings

Loops only the number of times required by the count of unique matching characters to be converted. For example, converting a string that has three different uppercase characters to lowercase requires only 3 loop iterations. For the preconfigured ASCII alphabet, the maximum number of loop iterations is 26

UCS and LCS can be augmented with additional characters

This is a faster variation of JaredTS486's approach that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach.

I've timed 1,000 iterations of each approach for a small string (25 characters) and a larger string (445 characters, consisting of the poem "The Robin" by Witter Bynner).

Timing results for 1,000 iterations of 25 characters:

6 seconds for my approach
8 seconds for tr '[:lower:]' '[:upper:]'
9 seconds for Orwellophile's approach
35 seconds for JaredTS486's approach

Timing results for 1,000 iterations of 445 characters:

9 seconds for tr '[:lower:]' '[:upper:]'
17 seconds for my approach
25 seconds for [Orwellophile's approach]
829 seconds for JaredTS486's approach

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function ucase()
{
  local TARGET="${1-}"
  local LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z]) ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

echo "OUTPUT: [$( ucase 'Change Me To All Capitals' )]"

The approach is simple: while the input string has any remaining lowercase letters present, find the first one, and replace all instances of that letter with its uppercase variant. Repeat until all lowercase letters are replaced.

This is a far faster variation of JaredTS486's approach that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach.

I've timed 1,000 iterations of this approach for a small string (25 characters) and a larger string (445 characters), both for lowercase and uppercase conversions. Since the test strings are predominantly lowercase, conversions to lowercase are generally faster than to uppercase.

I've compared my approach with several other answers on this page that are compatible with Bash 3.2. My approach is far more performant than most approaches documented here, and is even faster than tr in several cases.

Here are the timing results for 1,000 iterations of 25 characters:

0.46s for my approach to lowercase; 0.96s for uppercase
1.16s for Orwellophile's approach to lowercase; 1.59s for uppercase

3.67s for tr to lowercase; 3.81s for uppercase
11.12s for ghostdog74's approach to lowercase; 31.41s for uppercase
26.25s for technosaurus' approach to lowercase; 26.21s for uppercase

25.06s for JaredTS486's approach to lowercase; 27.04s for uppercase

Timing results for 1,000 iterations of 445 characters (consisting of the poem "The Robin" by Witter Bynner):

2s for my approach to lowercase; 12s for uppercase

4s for tr to lowercase; 4s for uppercase
20s for Orwellophile's approach to lowercase; 29s for uppercase

75s for ghostdog74's approach to lowercase; 669s for uppercase
467s for technosaurus' approach to lowercase; 449s for uppercase
660s for JaredTS486's approach to lowercase; 660s for uppercase. It's interesting to note that this approach generated continuous page faults (memory swapping) in Bash

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function lcase()
{
  local TARGET="${1-}"
  local UCHAR=''
  local UOFFSET=''

  while [[ "${TARGET}" =~ ([A-Z]) ]]
  do
    UCHAR="${BASH_REMATCH[1]}"
    UOFFSET="${UCS%%${UCHAR}*}"
    TARGET="${TARGET//${UCHAR}/${LCS:${#UOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

function ucase()
{
  local TARGET="${1-}"
  local LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z]) ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

The approach is simple: while the input string has any remaining uppercase letters present, find the next one, and replace all instances of that letter with its lowercase variant. Repeat until all uppercase letters are replaced.

Some performance characteristics of my solution:

Uses only shell builtin utilities, which avoids the overhead of invoking external binary utilities in a new process

Avoids sub-shells, which incur performance penalties

Uses shell mechanisms that are compiled and optimized for performance, such as global string replacement within variables, variable suffix trimming, and regex searching and matching. These mechanisms are far faster than iterating manually through strings

Loops only the number of times required by the count of unique matching characters to be converted. For example, converting a string that has three different uppercase characters to lowercase requires only 3 loop iterations. For the preconfigured ASCII alphabet, the maximum number of loop iterations is 26

UCS and LCS can be augmented with additional characters

Appended more detailed timing information

Source Link

edited Jul 28, 2018 at 18:34

Dejay Clayton

3.8k
2
30
20

This is a faster variation of JaredTS486's answer JaredTS486's approach that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach. It even seems to perform faster than

I've timed 1,000 iterations of each approach for a small string tr '[:lower:]' '[:upper:]' on my machine!(25 characters) and a larger string (445 characters, consisting of the poem "The Robin" by Witter Bynner).

Timing results for 1,000 iterations of 25 characters:

6 seconds for my approach

8 seconds for tr '[:lower:]' '[:upper:]'

9 seconds for Orwellophile's approach

35 seconds for JaredTS486's approach

Timing results for 1,000 iterations of 445 characters:

9 seconds for tr '[:lower:]' '[:upper:]'

17 seconds for my approach

25 seconds for [Orwellophile's approach]

829 seconds for JaredTS486's approach

Solution:

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function ucase()
{
  local TARGET="${1-}"
  local LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z]) ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

echo "OUTPUT: [$( ucase 'Change Me To All Capitals' )]"

The approach is simple: while the input string has any remaining lowercase letters present, find the first one, and replace all instances of that letter with its uppercase variant. Repeat until all lowercase letters are replaced.

On my machine, the test string Change Me To All Capitals requires 11 loops, and less than 6 seconds to execute 1,000 times, which is surprisingly faster than invoking tr '[:lower:]' '[:upper:]', which takes over 8 seconds to execute 1,000 times. JaredTS486's answer requires 650 loops and over 35 seconds to execute 1,000 times.

Note that the execution time drops from less than 6 seconds to less than 5 seconds when the logic is inlined directly within the source code, instead of embedded within a Bash function that is then invoked via a string-interpolation subshell $( ).

This is a faster variation of JaredTS486's answer that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach. It even seems to perform faster than tr '[:lower:]' '[:upper:]' on my machine!

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function ucase()
{
  local TARGET="${1-}"
  local LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z]) ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

echo "OUTPUT: [$( ucase 'Change Me To All Capitals' )]"

The approach is simple: while the input string has any remaining lowercase letters present, find the first one, and replace all instances of that letter with its uppercase variant. Repeat until all lowercase letters are replaced.

On my machine, the test string Change Me To All Capitals requires 11 loops, and less than 6 seconds to execute 1,000 times, which is surprisingly faster than invoking tr '[:lower:]' '[:upper:]', which takes over 8 seconds to execute 1,000 times. JaredTS486's answer requires 650 loops and over 35 seconds to execute 1,000 times.

Note that the execution time drops from less than 6 seconds to less than 5 seconds when the logic is inlined directly within the source code, instead of embedded within a Bash function that is then invoked via a string-interpolation subshell $( ).

This is a faster variation of JaredTS486's approach that uses native Bash capabilities (including Bash versions <4.0) to optimize his approach.

I've timed 1,000 iterations of each approach for a small string (25 characters) and a larger string (445 characters, consisting of the poem "The Robin" by Witter Bynner).

Timing results for 1,000 iterations of 25 characters:

6 seconds for my approach

8 seconds for tr '[:lower:]' '[:upper:]'

9 seconds for Orwellophile's approach

35 seconds for JaredTS486's approach

Timing results for 1,000 iterations of 445 characters:

9 seconds for tr '[:lower:]' '[:upper:]'

17 seconds for my approach

25 seconds for [Orwellophile's approach]

829 seconds for JaredTS486's approach

Solution:

#!/bin/bash
set -e
set -u

declare LCS="abcdefghijklmnopqrstuvwxyz"
declare UCS="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

function ucase()
{
  local TARGET="${1-}"
  local LCHAR=''
  local LOFFSET=''

  while [[ "${TARGET}" =~ ([a-z]) ]]
  do
    LCHAR="${BASH_REMATCH[1]}"
    LOFFSET="${LCS%%${LCHAR}*}"
    TARGET="${TARGET//${LCHAR}/${UCS:${#LOFFSET}:1}}"
  done

  echo -n "${TARGET}"
}

echo "OUTPUT: [$( ucase 'Change Me To All Capitals' )]"

The approach is simple: while the input string has any remaining lowercase letters present, find the first one, and replace all instances of that letter with its uppercase variant. Repeat until all lowercase letters are replaced.

Source Link

created Jul 28, 2018 at 17:37

Dejay Clayton

3.8k
2
30
20

Loading

Collectives™ on Stack Overflow

Return to Answer