The question explicitly states that titles will contain spaces.
For sake of safety, I’m assuming that titles may contain dots (periods);
e.g., “The History of 3.14159” or “Dr. Doolittle’s Discovery”.
My answers assume that there is some character
that will never appear in the table of contents;
specifically, they assume it is @
.
If you have @
in your table,
replace it with some character that never appears
(e.g., #
, ^
, _
, |
, etc.).
If you really use every ASCII character,
you may need to use a character sequence, like <@>
.
Three ways to do it with sed
:
Loop:
sed 's/\(.*\)\( \)/\1@\2/; :loop; s/ @/ @./; t loop; s/@//'
s/\(.*\)\( \)/\1@\2/
finds the last space on the line
and inserts a @
before it.
:loop
is a label, like a mile marker.
s/ @/ @./
(that’s s/␣␣@/␣@./
, for non-ambiguity) says,
if there are two spaces before the @
,
replace them with ␣.
(space and dot), and move the @
between them.
t loop
says, if the above substitution succeeded,
jump back to the :loop
marker and repeat.
Otherwise, continue to
s/@//
, which removes the @
.
So the foo bar
line in your table will be processed as follows:
Initial value: foo bar url3
s/\(.*\)\( \)/\1@\2/ foo bar @ url3
s/ @/ @./ foo bar @. url3
s/ @/ @./ foo bar @.. url3
s/ @/ @./ foo bar @.. url3 (Substitution fails, so don’t loop)
s/@// foo bar .. url3
Final output: foo bar .. url3
Overwhelming numbers:
sed 's/\(.*\)\( \)/\1@@@@@@@@@@@@@@@@@@@@\2/; s/ [ @]\{20\}/ /; s/@/./g'
s/\(.*\)\( \)/\1@@@@@@@@@@@@@@@@@@@@\2/
is very similar to the first s
subcommand in the first solution;
it finds the last space on the line
and inserts a string of 20 @
characters before it.
This should actually be a number that’s at least as large
as the maximum number of dots you’ll ever need to insert on one line; e.g., 80.
Managing a string of 80 @
characters would be awkward;
you might want to replace this with
s/\(.*\)\( \)/\1<@><@><@><@><@>\2/; s/<@>/@@@@@@@@/g
which inserts a string of five <@>
sequences,
and then replaces each one of them with a string of 16 @
characters,
resulting in 5×16=80 @
characters.
s/ [ @]\{20\}/ /
finds a string of 20 consecutive characters
that are either a space or an @
, preceded by a space,
and replaces it with just the preceding space.
Replace 20
with the number from the previous step.
s/@/./g
replaces each remaining @
with a dot.
So the foo
line in your table will be processed as follows:
Initial value: foo url1
s/\(.*\)\( \)/\1@@@@...@@@@\2/ foo @@@@@@@@@@@@@@@@@@@@ url1
s/ [ @]\{20\}/ / _[↑↑↑↑↑↑remove↑↑↑↑↑↑]
foo @@@@@@ url1
s/@/./g foo ...... url1
Use the “hold space”:
sed 's/.*[^ ] /&@/; h; s/ /./g; s/\(\.*\)\./\1 /; x; G; s/@.*@//'
s/.*[^ ] /&@/
is similar to the previous commands;
it finds the end of the title — to be precise,
the last place where a non-blank character is followed by a space —
and inserts an @
after it.
h
copies the line to the hold space.
s/ /./g
replaces all spaces in the line with dots.
s/\(\.*\)\./\1 /
replaces the last dot with a space.
(This will need to change if the URL can contain dots,
which, I guess, is likely.)
x
exchanges the pattern space and the hold space.
G
appends the hold space to the pattern space.
We now have, essentially, two copies of the line.
s/@.*@//
keeps the first part of the first copy
and the second part of the second copy,
getting rid of the stuff in the middle.
Initial value: foo bar url3
Pattern space Hold space
s/.*[^ ] /&@/ foo bar @ url3
h foo bar @ url3 foo bar @ url3
s/ /./g [email protected] foo bar @ url3
s/\(\.*\)\./\1 / foo.bar.@.. url3 foo bar @ url3
x foo bar @ url3 foo.bar.@.. url3
G foo bar @ url3 foo.bar.@.. url3 foo.bar.@.. url3
s/@.*@// foo bar .. url3 foo.bar.@.. url3
Final output: foo bar .. url3
site:superuser.com
orsite:unix.stackexchange.com
.sed
.