Regex: Replace Arbitrary Number of Whitespace with Same Number of Another Character

Question

What I am trying to do is take a list that is formatted much like a table of contents & replace whitespace (single space characters, not tabs) between the left & right texts with dots, preserving only the two outermost whitespace characters.

So specifically, I want to take a list like this:

foo        url1
foobar     url2
foo bar    url3

And convert it to this:

foo ...... url1
foobar ... url2
foo bar .. url3

I am using the Eclipse IDE for editing my text. I'm not familiar with the different regex engines, but I am guessing that it uses either Jakarta Regexp or java.util.regex (which I looked up on Wikipedia).

I can capture the whitespace characters in the Find field using "( +)", but I don't know how to convert them to the same number of dots in the Replace with field.

I did some Googling & came across this question (which is where I learned the "( +)" syntax). It sounds like it may be the same, or a similar question to mine. But I either didn't find my answer or I just didn't understand the answers given.

Any white space or just spaces? Your expression seems to be only about spaces. Then why not just replacing space with whatever character you like? — sticky bit, Commented Apr 25, 2018 at 22:44
Because they don’t want to change spaces in the title; e.g., “foo bar” → “foo.bar”. Also, they don’t want to change “foo        url” to “foo........url”; they want “foo␣......␣url” (keeping the first and last space). — Scott - Слава Україні, Commented Apr 26, 2018 at 0:02
This sounds like a question that has come up before, and so quite possibly it has already been answered here or on Unix & Linux Stack Exchange. But I don’t remember the answer right now. I’ll try to get back to this later when I have more time, but, until then, I suggest you search our site a bit harder. Hint: Stack Exchange has its own search engine, but sometimes you get better results using Google and saying site:superuser.com or site:unix.stackexchange.com. — Scott - Слава Україні, Commented Apr 26, 2018 at 0:30
I did a brief search (about 15 minutes) and I didn’t find any exact matches, although Using sed to replace all occurrences at the beginning with a matching number of replacement strings and Replace characters in matched line are close. Since nobody has flagged your question as a duplicate, and you’ve gotten only one answer so far, I invented three answers myself (the first one is very similar to one of the ones in the questions I linked to). I hope you have access to sed. — Scott - Слава Україні, Commented Apr 26, 2018 at 21:14

Scott - Слава Україні · Accepted Answer · 2018-04-26 21:14:39Z

The question explicitly states that titles will contain spaces. For sake of safety, I’m assuming that titles may contain dots (periods); e.g., “The History of 3.14159” or “Dr. Doolittle’s Discovery”. My answers assume that there is some character that will never appear in the table of contents; specifically, they assume it is @. If you have @ in your table, replace it with some character that never appears (e.g., #, ^, _, |, etc.). If you really use every ASCII character, you may need to use a character sequence, like <@>.

Three ways to do it with `sed`:

Loop:

sed 's/\(.*\)\( \)/\1@\2/; :loop; s/  @/ @./; t loop; s/@//'

s/$.*$/\1@\2/ finds the last space on the line and inserts a @ before it.
:loop is a label, like a mile marker.
s/ @/ @./ (that’s s/␣␣@/␣@./, for non-ambiguity) says, if there are two spaces before the @, replace them with ␣. (space and dot), and move the @ between them.
t loop says, if the above substitution succeeded, jump back to the :loop marker and repeat. Otherwise, continue to
s/@//, which removes the @.

So the foo bar line in your table will be processed as follows:

Initial value:          foo bar    url3
s/\(.*\)\( \)/\1@\2/    foo bar   @ url3
s/  @/ @./              foo bar  @. url3
s/  @/ @./              foo bar @.. url3
s/  @/ @./              foo bar @.. url3        (Substitution fails, so don’t loop)
s/@//                   foo bar .. url3
Final output:           foo bar .. url3

Overwhelming numbers:

sed 's/\(.*\)\( \)/\1@@@@@@@@@@@@@@@@@@@@\2/; s/ [ @]\{20\}/ /; s/@/./g'

s/$.*$/\1@@@@@@@@@@@@@@@@@@@@\2/ is very similar to the first s subcommand in the first solution; it finds the last space on the line and inserts a string of 20 @ characters before it. This should actually be a number that’s at least as large as the maximum number of dots you’ll ever need to insert on one line; e.g., 80. Managing a string of 80 @ characters would be awkward; you might want to replace this with
- s/$.*$/\1<@><@><@><@><@>\2/; s/<@>/@@@@@@@@/g which inserts a string of five <@> sequences, and then replaces each one of them with a string of 16 @ characters, resulting in 5×16=80 @ characters.
s/ [ @]\{20\}/ / finds a string of 20 consecutive characters that are either a space or an @, preceded by a space, and replaces it with just the preceding space. Replace 20 with the number from the previous step.
s/@/./g replaces each remaining @ with a dot.

So the foo line in your table will be processed as follows:

Initial value:                  foo        url1
s/\(.*\)\( \)/\1@@@@...@@@@\2/  foo       @@@@@@@@@@@@@@@@@@@@ url1
s/ [ @]\{20\}/ /                   _[↑↑↑↑↑↑remove↑↑↑↑↑↑]
                                foo @@@@@@ url1
s/@/./g                         foo ...... url1

Use the “hold space”:

sed 's/.*[^ ] /&@/; h; s/ /./g; s/\(\.*\)\./\1 /; x; G; s/@.*@//'

s/.*[^ ] /&@/ is similar to the previous commands; it finds the end of the title — to be precise, the last place where a non-blank character is followed by a space — and inserts an @ after it.
h copies the line to the hold space.
s/ /./g replaces all spaces in the line with dots.
s/$\.*$\./\1 / replaces the last dot with a space. (This will need to change if the URL can contain dots, which, I guess, is likely.)
x exchanges the pattern space and the hold space.
G appends the hold space to the pattern space. We now have, essentially, two copies of the line.
s/@.*@// keeps the first part of the first copy and the second part of the second copy, getting rid of the stuff in the middle.

Initial value: foo bar    url3

                      Pattern space                            Hold space
s/.*[^ ] /&@/       foo bar @   url3
h                   foo bar @   url3                        foo bar @   url3
s/ /./g             [email protected]                        foo bar @   url3
s/\(\.*\)\./\1 /    foo.bar.@.. url3                        foo bar @   url3
x                   foo bar @   url3                        foo.bar.@.. url3
G                   foo bar @   url3 foo.bar.@.. url3       foo.bar.@.. url3
s/@.*@//            foo bar .. url3                         foo.bar.@.. url3

Final output:   foo bar .. url3

Thank you for the very detailed explanation. Very helpful. The first two solutions work perfectly for me. The third one (hold space) is omitting the final space. So the final output is foo bar ...url3. Could it be related to the version of sed I am using? $ sed --version --> sed (GNU sed) 4.8 — AntumDeluge, Commented Jun 19, 2022 at 0:02

Toto · Accepted Answer · 2018-04-26 07:58:15Z

2

You can do that with Notepad++

Ctrl+H
Find what: (?<!\S) (?= )
Replace with: .
check Wrap around
check Regular expression
Replace all

Explanation:

(?<!    : Start negative lookbehind, make sure we have not
  \S    : a non-space character
)       : end lookbehind
        : a space
(?=     : start lookahead, make sure we have
        : a space
)       : en lookahead

Replacement:

.       : a dot

Result for given example:

foo ...... url1
foobar ... url2
foo bar .. url3

answered Apr 26, 2018 at 7:58

Toto

18.3k73 gold badges33 silver badges45 bronze badges

Looks interesting. I don’t have Notepad++, so I can’t test this. Can you explain why this doesn’t replace the first space after the title, resulting in foo.......␣url1?
– Scott - Слава Україні
Commented Apr 26, 2018 at 21:14
@Scott: I'm pretty sure it also works with SublimeText. A space is replaced only when there is not a non-space before it and a space after.
– Toto
Commented Apr 27, 2018 at 10:24
Oh … when there is a space after, and not a non- space before. I missed the double negative. Couldn’t you just do regular lookbehind for a space instead of a negative lookbehind for a non-space?
– Scott - Слава Україні
Commented Apr 27, 2018 at 15:48
@Scott: No, if I use positive lookbehind (ie. (?<=\s)) the space before is mandatory, a contrario (?<!\S) make the space optionnal and that is the case after the first space has been replaced by a dot.
– Toto
Commented Apr 27, 2018 at 16:46
1

@AntumDeluge: True with new version of Notepad++, it worked before as you can see here
– Toto
Commented Jun 19, 2022 at 7:07

| Show 3 more comments

Stack Exchange Network

Regex: Replace Arbitrary Number of Whitespace with Same Number of Another Character

2 Answers 2

Three ways to do it with `sed`:

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
regex
eclipse
text-editors
text-formatting
.

Linked

Hot Network Questions

Regex: Replace Arbitrary Number of Whitespace with Same Number of Another Character

2 Answers 2

Three ways to do it with sed:

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged regexeclipsetext-editorstext-formatting.

Linked

Related

Hot Network Questions

Three ways to do it with `sed`:

Not the answer you're looking for? Browse other questions tagged
regex
eclipse
text-editors
text-formatting
.