24
\$\begingroup\$

What general tips do you have for golfing in sed? I'm looking for ideas which can be applied to code-golf problems and which are also at least somewhat specific to sed (e.g. "remove comments" is not an answer).

Please post one tip per answer.

\$\endgroup\$
7
  • 6
    \$\begingroup\$ Not really a golfing tip (but still a tip for golfing): linefeeds consume just as many bytes as semicolons, so you can keep your code short and readable. \$\endgroup\$
    – Dennis
    Commented Jul 22, 2015 at 17:48
  • \$\begingroup\$ Not a tip either, but a problem: I have GNU sed, yet the F command never worked. Does anyone know why? \$\endgroup\$
    – seshoumara
    Commented Aug 30, 2016 at 16:15
  • \$\begingroup\$ @seshoumara F works on my GNU sed (Debian testing). It just prints - if reading from stdin, of course, but that's expected. What do you get from sed -e 'F;Q' /etc/hostname? \$\endgroup\$ Commented Aug 30, 2016 at 16:28
  • \$\begingroup\$ @TobySpeight That gives this error: char 1: unknown command: F. I have to update sed maybe; what version do you have? The L command also doesn't work, but it's useless anyway since -l n exists. Everything else mentioned on GNU sed's site works. \$\endgroup\$
    – seshoumara
    Commented Aug 30, 2016 at 16:44
  • 1
    \$\begingroup\$ I opened the chat room bash, sed and dc for all who want to talk and ask about these languages. Let's make a community! \$\endgroup\$
    – seshoumara
    Commented Aug 30, 2016 at 17:10

27 Answers 27

10
\$\begingroup\$

If you need to use labels then for sure you'll want your label names to be as short as possible. In fact taken to the extreme, you may even use the empty string as a label name:

:    # define label ""
p    # print pattern space
b    # infinite loop! - branch to label ""
\$\endgroup\$
2
  • 5
    \$\begingroup\$ As of gnu sed 4.3, this behavior was removed. : now requires a label. \$\endgroup\$
    – Kevin
    Commented Feb 17, 2017 at 20:06
  • \$\begingroup\$ Indeed, here is also the actual git commit link. I guess for PPCG this won't change much, since we are allowed to post answers for GNU sed 4.2.x, but it's good to know, though regrettably, that this trick won't officially work anymore. \$\endgroup\$
    – seshoumara
    Commented Feb 17, 2017 at 20:40
8
\$\begingroup\$

The GNU sed documentation describes the s command as "sed's Swiss Army Knife". But if all you want to do is replace all instances of one character with another, then the y command is what you need:

y/a/b/

is one char shorter than:

s/a/b/g
\$\endgroup\$
1
  • \$\begingroup\$ its also way faster, and can swap chars in place: y/12/21/ \$\endgroup\$
    – mikeserv
    Commented Dec 23, 2015 at 18:12
7
\$\begingroup\$

When repeatedly replacing in a loop:

loop:
s/foo/bar/g
tloop

it's usually unnecessary to replace globally, as the loop will eventually replace all occurrences:

# GNU sed
:
s/foo/bar/
t

Note also the GNU extension above: a label can have an empty name, saving more precious bytes. In other implementations, a label cannot be empty, and jumping without a label transfers flow to the end of script (i.e. same as n).

\$\endgroup\$
2
  • 2
    \$\begingroup\$ The empty label name is GNU-specific, POSIX requires branches with no argument to jump to the end of the script (seems to be the behavior in the BSDs and Busybox, also in GNU sed if you don't add an empty :) \$\endgroup\$
    – ninjalj
    Commented Nov 3, 2015 at 1:30
  • 3
    \$\begingroup\$ The nameless label was always a bug in GNU sed, not an extension, and in version 4.3 and higher this bug was, regrettably, fixed. See here. \$\endgroup\$
    – seshoumara
    Commented Feb 17, 2017 at 20:45
6
\$\begingroup\$

Consider using extended regex syntax (in GNU sed). The -r option costs one byte in scoring, but using it just once to eliminate the backslashes from a pair of \(...\) has already paid for itself.

\$\endgroup\$
6
  • 2
    \$\begingroup\$ With the additional note that -r seems to be GNU sed specific. \$\endgroup\$
    – manatwork
    Commented Jun 5, 2015 at 11:28
  • \$\begingroup\$ @manat - added (but it's a Community Wiki answer, so you could have edited yourself). \$\endgroup\$ Commented Jun 5, 2015 at 11:51
  • 1
    \$\begingroup\$ And it keeps paying for itself when using +, ?, {} and | in regex matches, since no backslashes are needed either. \$\endgroup\$
    – seshoumara
    Commented Aug 29, 2016 at 17:52
  • 1
    \$\begingroup\$ -E works as an alias to -r in many sed implementations if I remember correctly. \$\endgroup\$
    – phk
    Commented Feb 28, 2019 at 15:08
  • 1
    \$\begingroup\$ -E also works in older versions of GNU sed though it has been documented only since newer versions. For the new versions GNU wrongly claims that -E is a POSIX option. POSIX does not specify extended regex for sed. \$\endgroup\$ Commented Jul 29, 2020 at 9:50
6
\$\begingroup\$

As mentioned in man sed (GNU), you can use any character as a delimiter for regular expressions by using the syntax

\%regexp%

where % is a placeholder for any character.

This is useful for commands like

/^http:\/\//

which are shorter as

\%^http://%

What is mentioned in the GNU sed manual but not in man sed is that you can change the delimiters of s/// and y/// as well.

For example, the command

ss/ssg

removes all slashes from the pattern space.

\$\endgroup\$
6
\$\begingroup\$

Read the whole input at once with -z

Often you need to operate on the whole input at once instead of one line at a time. The N command is useful for that:

:
$!{N;b}

...but usually you can skip it and use the -z flag instead.

The -z flag makes sed use NUL (\0) as its input line separator instead of \n, so if you know your input won’t contain \0, it will read all of the input at once as a single “line”:

$ echo 'foo
> bar
> baz' | sed -z '1y/ao/eu/'
fuu
ber
bez

Try it online!

\$\endgroup\$
5
\$\begingroup\$

There's no built-in arithmetic, but calculations can be done in unary or in unary-coded decimal. The following code converts decimal to UCD, with x as the unit and 0 as the digits separator:

s/[1-9]/0&/g
s/[5-9]/4&/g
y/8/4/
s/9/4&/g
s/4/22/g
s/[37]/2x/g
s/[26]/xx/g
s/[1-9]/x/g

and here's the conversion back to decimal:

s/0x/-x/g
s/xx/2/g
y/x/1/
s/22/4/g
s/44/8/g
s/81/9/g
s/42/6/g
s/21/3/g
s/61/7/g
s/41/5/g
s/-//g

These are both taken from an answer to "Multiply two numbers without using any numbers".

Plain old unary can be converted using this pair of loops from this answer to "{Curly Numbers};", where the unit is ;. I've used v and x to match Roman for 5 and 10; b comes from "bis".

# unary to decimal
:d
/;/{
s/;;;;;/v/g
s/vv/x/g
/[;v]/!s/x\+/&0/
s/;;/b/g
s/bb/4/
s/b;/3/
s/v;/6/
s/vb/7/
s/v3/8/
s/v4/9/
y/;bvx/125;/
td
}

# Decimal to unary
:u
s/\b9/;8/
s/\b8/;7/
s/\b7/;6/
s/\b6/;5/
s/\b5/;4/
s/\b4/;3/
s/\b3/;2/
s/\b2/;1/
s/\b1/;0/
s/\b0//
/[^;]/s/;/&&&&&&&&&&/g
tu
\$\endgroup\$
5
  • 1
    \$\begingroup\$ ...and if you have to use either of these, you've almost certainly already lost the code golf, though you might still be competitive with Java answers ;-) Still fun to use though. \$\endgroup\$ Commented Jun 5, 2015 at 17:36
  • \$\begingroup\$ The conversion from plain unary to decimal gives wrong answers for unary input equivalent of decimal form X0X, for example 108. The line responsible for this is /[;v]/!s/\b/0/2, which needs to be changed to /[;v]/!s:x\+:&0: for it to work. See here. \$\endgroup\$
    – seshoumara
    Commented Apr 6, 2017 at 14:20
  • \$\begingroup\$ @seshoumara, your link seems to be an empty page. But it's entirely plausible that I made an error when extracting that code from the referenced answer, so I'll just apply your fix. \$\endgroup\$ Commented Apr 6, 2017 at 14:48
  • \$\begingroup\$ The link loads correctly, but I was expecting something other than a grey page with "TIO" and something that looks like the Ubuntu logo - is that what's intended? And I was referring to the second of the answers I referenced (58007), as that's where the plain-unary sample originated. \$\endgroup\$ Commented Apr 6, 2017 at 15:30
  • \$\begingroup\$ The TIO link should have contained the corrected code, plus an example input, 108 in unary. On running the code you should have seen the correct result 108, and not 180, as previously generated by that now fixed line of code. Updating the referenced answer is entirely up to you. This is a community wiki. \$\endgroup\$
    – seshoumara
    Commented Apr 6, 2017 at 15:52
5
\$\begingroup\$

Append a newline in one byte

The G command appends a newline and the contents of the hold space to the pattern space, so if your hold space is empty, instead of this:

s/$/\n/

You can do this:

G

Prepend a newline in three bytes

The H command appends a newline and the contents of the pattern space to the hold space, and x swaps the two, so if your hold space is empty, instead of this:

s/^/\n/

You can do this:

H;x

This will pollute your hold space, so it only works once. For two more bytes, though, you could clear your pattern space before swapping, which is still a savings of two bytes:

H;z;x
\$\endgroup\$
5
\$\begingroup\$

Unary-UCD-Decimal conversions

The programs below are adapted from solutions in anarchy golf. The solutions below work on GNU sed 4.2.2, where empty labels are allowed. It is possible I've missed better solutions on anarchy golf.

Unary to Decimal, sed -r at 47 bytes

From solutions by %20 and tails. Works only for integers other than 0. The chosen unary digit (a in the below program) occurs 3 times in the program.

:
s/a/<<123456789a01>/
s/(.)<.*\1(a?.).*>/\2/
t

Try it online! and (bash wrapper for easier testing)

If you have a prefix (< here, occurs 4 times) before your number, you can make 0 show up with just 2 more bytes:

:
s/<|a/<<123456789&01>/
s/(.)<.*\1(a?.).*>/\2/
t

Try it online! (test cases show off how it functions)

Decimal to UCD, sed at 39 bytes

From a solution by %20. Works on all integers, including negative ones.

:
s/[1-9]/&;/g
y/123456789/012345678/
t

Try it online!

Decimal to Unary (via UCD), sed 55 bytes

From a solution by tails. Works on all integers, including negative ones.

:
s/[1-9]/&;/g
y/123456789/012345678/
s/;0/9;/
t
s/0//g

Try it online! and (bash wrapper)

\$\endgroup\$
4
\$\begingroup\$

Let's talk about the t and T commands, that although they are explained in the man page, it's easy to forget about it and introduce bugs accidently, especially when the code gets complicated.

Man page statement for t:

If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label.

Example showing what I mean: Let's say you have a list of numbers and you want to count how many negatives there are. Partial code below:

1{x;s/.*/0/;x}                   # initialize the counter to 0 in hold space
s/-/&/                           # check if number is negative
t increment_counter              # if so, jump to 'increment_counter' code block
b                                # else, do nothing (start a next cycle)

:increment_counter
#function code here

Looks ok, but it's not. If the first number is positive, that code will still think it was negative, because the jump done via t for the first line of input is performed regardless, since there was a successful s substitution when we initialized the counter! Correct is: /-/b increment_counter.

If this seemed easy, you could still be fooled when doing multiple jumps back and forth to simulate functions. In our example the increment_counter block of code for sure would use a lot of s commands. Returning back with b main might cause another check in "main" to fall in the same trap. That is why I usually return from code blocks with s/.*/&/;t label. It's ugly, but useful.

\$\endgroup\$
2
  • \$\begingroup\$ another thing to note is that the GNU info page says "since the last conditional branch was taken" rather than "since the last t or T command". from what I can see, the seds on TIO use the logic you are describing here. \$\endgroup\$
    – guest4308
    Commented Apr 27 at 16:38
  • \$\begingroup\$ this means if you know a swap has happened but need to reset to test for a specific swap you can use an empty T command just before it without needing to define a label; so you can fix the code with only two chars by replacing } with ;T} \$\endgroup\$
    – guest4308
    Commented Apr 27 at 16:43
4
\$\begingroup\$

If not explicitly banned by the question, the consensus for this meta question is that numerical input may be in unary. This saves you the 86 bytes of decimal to unary as per this answer.

\$\endgroup\$
3
  • \$\begingroup\$ Isn't that meta consensus for sed referring to plain old unary format? I have several answers where an input in UCD would help me, in case it's either way. \$\endgroup\$
    – seshoumara
    Commented Feb 15, 2017 at 8:32
  • \$\begingroup\$ @seshoumara I meant unary, not UCD \$\endgroup\$ Commented Feb 15, 2017 at 16:07
  • \$\begingroup\$ Then the conversion from decimal to plain old unary saves you 126 bytes as per that answer you linked. The 86 bytes is for the conversion to UCD. \$\endgroup\$
    – seshoumara
    Commented Feb 17, 2017 at 2:32
4
\$\begingroup\$

I know this is an old thread, but I just found those clumsy decimal to UCD converters, with almost a hundred bytes, some even messing the hold space or requiring special faulty sed versions.

For decimal to UCD I use (68 bytes; former best posted here 87 bytes)

s/$/\n9876543210/
:a
s/\([1-9]\)\(.*\n.*\)\1\(.\)/\3x\2\1\3/
ta
P;d

UCD to decimal is (also 66 bytes; former best posted here 96)

s/$/\n0123456789/
:a      
s/\([0-8]\)x\(.*\n.*\)\1\(.\)/\3\2\1\3/
ta      
P;d
  • \n in the replacement is not portable. You can use a different character instead and save two bytes, but you'll need more bytes to remove the appendix instead of P;d; see next remark. Or, if your hold space is empty, do G;s/$/9876543210/ without byte penalty.
  • If you need further processing, you'll need some more bytes for s/\n.*// instead of P;d.
  • You could save two bytes each for those buggy old GNU sed versions
  • No, you can't save those six backslashes as extended regular expressions don't do backreferences
\$\endgroup\$
4
  • \$\begingroup\$ There are no decimal to UCD and back converters posted in this thread that mess the hold space or require faulty sed versions. \$\endgroup\$
    – seshoumara
    Commented Nov 11, 2017 at 8:49
  • \$\begingroup\$ Your own answer from April 6th uses the gold space and will only run with old sed versions that violate the POSIX standard. \$\endgroup\$
    – Philippos
    Commented Nov 11, 2017 at 9:08
  • \$\begingroup\$ I'm not doing decimal to UCD conversions! Read the thread again carefully. UCD means that 12 is converted to 0x0xx (what your answer calculates), while plain unary (what my answer calculates) means that 12 is converted to xxxxxxxxxxxx. I chosed @ as symbol, but you get the idea. And further, on PPCG one doesn't need to adhere to the POSIX standard. \$\endgroup\$
    – seshoumara
    Commented Nov 11, 2017 at 9:15
  • \$\begingroup\$ If it pleases you, sheriff \$\endgroup\$
    – Philippos
    Commented Nov 11, 2017 at 9:55
4
\$\begingroup\$

Empty regexes are equivalent to the previously encountered regex

(thanks to Riley for discovering this from an anagol submission)

Here is an example where we are tasked with creating 100 @s in an empty buffer.

s/$/@@@@@@@@@@/;s/.*/&&&&&&&&&&/ # 31 bytes
s/.*/@@@@@@@@@@/;s//&&&&&&&&&&/  # 30 bytes

The second solution is 1 byte shorter and uses the fact that empty regexes are filled in with the last encountered regex. Here, for the second substitution, the last regex was .*, so the empty regex here will be filled with .*. This also works with regexes in /conditionals/.

Note that it is the previously encountered regex, so the following would also work.

s/.*/@@@@@@@@@@/;/@*/!s/$/@/;s//&&&&&&&&&&/

The empty regex gets filled with @* instead of $ because s/$/@/ is never reached.

\$\endgroup\$
1
  • \$\begingroup\$ Yes, good answer. I've even made regexes longer so that they can be re-matched like this (thus making the program shorter). \$\endgroup\$ Commented Jun 22, 2018 at 11:53
3
\$\begingroup\$

Instead of clearing the pattern space with s/.*//, use the z command (lowercase) if you go with GNU sed. Besides the lower bytes count, it has the advantage that it won't start the next cycle as the command d does, which can be useful in certain situations.

\$\endgroup\$
1
  • 2
    \$\begingroup\$ May also be of benefit if you have invalid multi-byte sequences (which aren't matched by .). \$\endgroup\$ Commented Aug 30, 2016 at 17:08
3
\$\begingroup\$

Expanding upon this tip answer, regarding the conversions between decimal and plain unary number formats, I present the following alternative methods, with their advantages and disadvantages.

Decimal to plain unary: 102 + 1(r flag) = 103 bytes. I counted \t as a literal tab, as 1 byte.

h
:
s:\w::2g
y:9876543210:87654321\t :
/ /!s:$:@:
/\s/!t
x;s:-?.::;x
G;s:\s::g
/\w/{s:@:&&&&&&&&&&:g;t}

Try it online!

Advantage: it is 22 bytes shorter and as extra, it works with negative integers as input

Disadvantage: it overwrites the hold space. However, since it's more likely that you'd need to convert the input integer right at the start of the program, this limitation is rarely felt.

Plain unary to decimal: 102 + 1(r flag) = 103 bytes

s:-?:&0:
/@/{:
s:\b9+:0&:
s:.9*@:/&:
h;s:.*/::
y:0123456789:1234567890:
x;s:/.*::
G;s:\n::
s:@::
/@/t}

Try it online!

Advantage: it is 14 bytes shorter. This time both tip versions work for negative integers as input.

Disadvantage: it overwrites the hold space

For a complicated challenge, you'll have to adapt these snippets to work with other information that may exist in the pattern space or hold space, besides the number to convert. The code can be golfed more, if you know you only work with positive numbers or that zero alone is not going to be a valid input / output.

An example of such challenge answer, where I created and used these snippets, is the Reciprocal of a number (1/x).

\$\endgroup\$
5
  • \$\begingroup\$ For unary-to-decimal you can save two bytes by combining the last two substitutions: s:\n|@$::g. tio.run/##K05N@f@/2ErX3krNwIpL30G/… \$\endgroup\$
    – Jordan
    Commented Jun 19, 2017 at 16:49
  • \$\begingroup\$ I had my own try at the decimal to unary converter. Here's 97 bytes :) Try it online! (also doesn't require -r, but with new consensus, flags do not count towards the bytecount anyways, and it doesn't mess up the hold space) \$\endgroup\$
    – user41805
    Commented May 21, 2018 at 12:25
  • \$\begingroup\$ Actually if you change the last line from /\n/ta to /\n/t, you save 1 byte to get 96 \$\endgroup\$
    – user41805
    Commented May 22, 2018 at 10:22
  • \$\begingroup\$ @Cowsquack Thanks, 96 is great! Don't have time now, will look on it this weekend. \$\endgroup\$
    – seshoumara
    Commented May 22, 2018 at 15:05
  • \$\begingroup\$ Sure, do send me a ping on chat then :) \$\endgroup\$
    – user41805
    Commented May 22, 2018 at 16:37
3
\$\begingroup\$

Make use of sed's line-handling ability

With flexible challenge I/O, it can pay to have input/output separated by newlines instead of any other character by taking advantage of sed's commands for handling lines (like D, N, n, G, H, P, s's m flag) instead of only being limited to s substitutions.

This can also open the opportunity for using D for looping instead of labels and goto, especially in sed versions that don't permit empty labels.

\$\endgroup\$
3
\$\begingroup\$

use the e flag on s for simplifying things

your output may depend on which shell sed is using. I'll only include things that work on TIO at time of writing.

I know there are lots of decimal to unary and back solutions on here, but they are all unreasonably long. here's some that are much shorter:

decimal to unary:

s/.*/echo {0..&}/e
s/[0-9]//g

this sends echo {0..#} to the shell which usually expands it into a list from 0 to #, with spaces inbetween each number. remove the numbers and you have the right number of spaces. works for any positive integer

unary to decimal:

s/.*/wc -L<<<&/e
s/.*/wc -L<<<'&'/e         #the above if you're using spaces as your unary char
s/.*/echo &|wc -L/e        #shells that don't support <<< redirects
s/.*/echo '&'|wc -L/e      # + using spaces

these send all the chars to wc which returns the longest line with -L (the <<< redirect and echo add a newline to the end). the unary must consist of the entire line; as it runs the text of the full line after the swap, so if you leave something out you end up trying to run loremipsumwc -L<<<' '

basic math (-E)

s/(.*) (.*)/echo $[\1+\2]/e   #two numbers seperated by a space
s/(.*) (.*)/echo $[\1*\2]/e   #there's at least + - * / %

Try it online!

idk, go have fun

s/.*/for i in {1..&};{ echo $[&%$i];}/e

basically if a shell solution is a lot shorter than a sed solution it doesn't need to be that way

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Brace-expansion isn't a standard shell feature, so you'll need $SHELL to be set appropriately for that first one, too. \$\endgroup\$ Commented Apr 9 at 13:46
  • 1
    \$\begingroup\$ @TobySpeight thanks! (it might not be $SHELL that sets the shell sed uses though. for me, echo $SHELL and sed 's/.*/echo $SHELL/e both give /bin/bash but echo $0 gives -bash while sed s/.*/echo $0/e gives me sh) \$\endgroup\$
    – guest4308
    Commented Apr 10 at 17:05
2
\$\begingroup\$

The L command in old GNU sed versions

Used, for example, in the second solution in https://codegolf.stackexchange.com/a/220633/. Older versions of GNU sed like GNU sed 4.2.2 have the L command, which was later removed in newer versions. From the archived docs,

L n

This GNU sed extension fills and joins lines in pattern space to produce output lines of (at most) n characters, like fmt does; if n is omitted, the default as specified on the command line is used. This command is considered a failed experiment and unless there is enough request (which seems unlikely) will be removed in future versions.

\$\endgroup\$
2
\$\begingroup\$

#n at first to imply -n

The "#" and the remainder of the line are ignored (treated as a comment), with the single exception that if the first two characters in the file are #n, the default output is suppressed; this is the equivalent of specifying -n on the command line.

Source: sed (from SUSv2)

This is useful if you prefer NOT to output something by default.

...But is it really useful? -n adds either 1 or 2 bytes but #n and LF adds 3.

\$\endgroup\$
1
  • \$\begingroup\$ This would be useful in [anagol](golf.shinh.org/) where you can't specify flags. \$\endgroup\$
    – user41805
    Commented Aug 30, 2021 at 12:08
2
\$\begingroup\$

i, a, and c

(The answer focuses on GNU sed because GNU sed's invocation of these commands is slightly shorter than that of POSIX, but otherwise I think the functionality should be the same).

These insert to, append to, and change the pattern space respectively. The lines they add are printed and so are not edited into the pattern space, meaning that the program will not be able to manipulate these lines. But they are shorter than s, if you want to insert some unchanging lines. Compare the following two lines:

s/^/text\n/ # 11 bytes
itext       #  5 bytes

Now I want to focus on c. c text replaces the pattern space with text. This text will be printed immediately, since the c makes sed move on to the next line. This means that the commands follows a call to c are effectively ignored. This behaviour of c can be useful in challenges where there are only few possible outputs, particularly when combined with the conditional /.../.

An example is the 'Hello, World!' challenge, where c is shorter than s:

s/^/Hello, World!/ # 18 bytes
cHello, World!     # 14 bytes

Another example is this challenge (sed answer) to swap the strings Good and Bad. The outputs are restricted to being Good (when the input is Bad) or Bad (when the input is Good).

s gives 21 bytes:

s/Goo/Ba/;t;s/Ba/Goo/

Using c (in combination with /.../) gives 13 instead:

/B/cGood
cBad

If the input matches /B/, i.e. it the input is Bad, Good is printed and the program skips processing this input line. So the program only ever reaches the second line if the input doesn't match B, i.e. if the input is Good. Then in this case the output is set to Bad.

\$\endgroup\$
2
\$\begingroup\$

\L, \U, \E for fractals in GNU sed

The case switching special sequences help in "toggling" a line. An example is the following challenge to produce a fractal X on anagol, whose shortest sed solution by mitchs et al is reproduced here.

s/^/X\n/
:
s/^.\{,27\}\n/&\L&\U&/mg
//s/[ X]/& &/g
s/x/ X /g
t

Try it online!

If the fractal uses another character to fill it, then at the end you can add a transliterate.

\$\endgroup\$
1
\$\begingroup\$

In sed, the closest thing to a function that you can have is a label. A function is useful because you can execute its code multiple times, thus saving a lot of bytes. In sed however you would need to specify the return label and as such you can't simply call this "function" multiple times throughout your code the way you would do it in other languages.

The workaround I use is to add in one of the two memories a flag, which is used to select the return label. This works best when the function code only needs a single memory space (the other one).

Example showing what I mean: taken from a project of mine to write a small game in sed

# after applying the player's move, I overwrite the pattern space with the flag "P"
s/.*/P/
b check_game_status
:continue_turn_from_player
#code

b calculate_bot_move
:return_bot_move
# here I call the same function 'check_game_status', but with a different flag: "B"
s/.*/B/
b check_game_status
:continue_turn_from_bot
#code (like say 'b update_screen')

:check_game_status   # this needs just the hold space to run
#code
/^P$/b continue_turn_from_player
/^B$/b continue_turn_from_bot

The labels should be golfed of course to just one letter, I used full names for a better explanation.

\$\endgroup\$
1
\$\begingroup\$

Mostly useless step:

y|A-y|B-z|

This will only translate A to B and y to z (... and - to - ;), but nothing else, so

sed -e 'y|A-y|B-z|' <<<'Hello world!'

will just return:

Hello world!

You could ensure this will be useless, for sample by using this on lower-case hexadecimal values (containing only 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e or f.)

A little worst:

sed '; ;/s/b;y|A-y|B-z|;s ;s/ //; ; ;' <<<'Hello world'
Hello world

Why did this not suppress the space?

\$\endgroup\$
1
  • 4
    \$\begingroup\$ Is this something you found out the hard way?! ;-) \$\endgroup\$ Commented Sep 15, 2015 at 19:06
1
\$\begingroup\$

Combine s substitutions

The s command takes many bytes (4 + 1 for the statement separator), so combining them can save bytes.

An example: the following is at 17 bytes

s/\S+ //
s/\S+$//

while combining the two substitutions gives 15 bytes

s/^\S+ |\S+$//g
\$\endgroup\$
1
\$\begingroup\$

Use back references

Most people know that you can mark some parts of your regex with \(\) and later refer to it as \1 (or \2 for the second and so on) in your substitution.

But you hardly ever see back references, using \1 in the regex itself. For extended regular expressions (option -E to sed), this has been removed from the standard, but GNU sed supports it anyhow, so it works at tio.run, for example.

For example, this ERE (.)\1 matches a double character or this one (.).*\1 appearing twice in the pattern space. See here for an example, how this makes things easier (and much shorter!).

Complicated task like this one looking for matches in a file or this one or this one would almost be impossible without this feature.

Math with back references

And it's great for teaching sed to count. I know you can solve some of those problems using y, but as soon as you have more than one thing to increment or decrement, this is the way to go. See examples here or here or try the 151-byte decimal add online to examine how it works.

\$\endgroup\$
1
\$\begingroup\$

Okay, here's a weird functioning of (GNU) sed's regex. For example, if the pattern space is only qqq, then the extended regex s/^|q|$/<&>/g gives <q><q><q>. Try it online!

I'm not sure, but I think it is because if a character matches, then do the empty strings surrounding it. So because the terminal q matches, the match also includes the end of the pattern space, so $ by itself doesn't get matched because otherwise it would overlap with the previous match (and likewise for ^ and even I think for word boundaries and the like).

An example where this is useful is in the following (rather contrived) task:

  • an input of a should give ab,
  • and inputs matching /a(c+)/ should give ab\1b, for example, an input of accc should give abcccb

Without the trick I can get 11 bytes (with the -E flag)

s/a|c+/&b/g

but using it gives 10 bytes

s/a|$/&b/g

Try it online!

(Do let me know if there is a better example that uses this behaviour).

\$\endgroup\$
1
\$\begingroup\$

Store more in your hold space

To preform a loop without losing the number you used as your counter, you can use h;H to add your number to the hold space twice, and then only decrement after the newline x;s/\n./\n/;x. This essentially gives you two holdspaces instead of just one. Now when the loop finishes you still have your number and can even copy it again to use as many times as you need like x;s/.*/&&/;x (note this will copy the newline as well)

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.