1

I want to remove duplicated words in each line using Notepad++.

Example:

Flooring Services, Carpet, Flooring Services, Tile, Flooring Services

In the above, Flooring Services is repeated 3x. I only want to keep one (1) Flooring Services.

I looked at this page which worked fine for a single word, but not for two words: How to remove all the duplicated words on every line using Notepad++?

4
  • 4
    Do the replace twice - first time will remove the first word - second time will remove the second word (using the second answer in the linked question)
    – DavidPostill
    Commented Oct 7, 2023 at 4:37
  • What exactly is "duplicated words"? Anything separated by comma/line? Just letters? So more like phrases in that case. You need to be as specific as possible here. And those answers don't quite work for this case indeed.
    – Destroy666
    Commented Oct 7, 2023 at 5:20
  • HI, I'm sorry if I'm duplicating this comment as I"m new to superuser. Thanks for the help. So using this expression: (\b\w+(?:\h\w+)?),\h(?=.*\1) finds and replaces all instances of "Services", not only "Flooring Services" When I tried to run this, I found it's replacing on the same line "Services" on this line: Home Services, Flooring Services, Carpet, Flooring Services, Flooring Services Is it possible to only search the entire phrase and not just the word? In this instance, it should not replace Home Services, but only Flooring Services. Thanks.
    – Matt Lance
    Commented Oct 8, 2023 at 19:26
  • 2
    That didn't quite clear anything.
    – Destroy666
    Commented Oct 8, 2023 at 19:55

1 Answer 1

2
  • Ctrl+H
  • Find what: (?:^|,)\h*\K([^,\s]+(?:\h[^,\s]+)?),\h(?=.*\1)
  • Replace with: LEAVE EMPTY
  • TICK Match case
  • TICK Wrap around
  • SELECT Regular expression
  • UNTICK . matches newline
  • Replace all

Explanation:

(?:^|,)         # non capture group, beginning of line OR comma
\h*             # 0 or more horizontal spaces
\K              # Reset operator, forget all we have seen until this position
(               # group 1
    [^,\s]+         # 1 or more any character that is not a comma or space
    (?:             # non capture group
        \h              # horizontal space
        [^,\s]+         # 1 or more any character that is not a comma or space
    )?              # end group, optional
)               # end group 1
,\h             # a comma followed by a space
(?=.*\1)        # positive lookahead, make sure we have the same word(s) somewhere after

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

8
  • Is it possible to keep the first occurrence, but delete the 2nd, 3rd occurences? I am asking as the regex is keeping the last occurence. Commented Oct 7, 2023 at 9:04
  • 1
    @ReddyLutonadio: Probably but I think the regex will become much more complex. I gonna try some tests.
    – Toto
    Commented Oct 7, 2023 at 9:07
  • Understood. Your regex is doing what the OP asked, so no need to go the complex route when the simplest is best :) Commented Oct 7, 2023 at 9:11
  • 2
    @ReddyLutonadio: I've got a solution but it needs to be ran as many times it exists duplicates! FYI: Find: (^.*?(\b\w+(?:\h\w+)?),\h.*?)\2(?:, )? Replace: $1
    – Toto
    Commented Oct 7, 2023 at 9:20
  • Hi, thanks for the help. So using this expression: (\b\w+(?:\h\w+)?),\h(?=.*\1) finds and replaces all instances of "Services", not only "Flooring Services" When I tried to run this, I found it's replacing on the same line "Services" on this line: Home Services, Flooring Services, Carpet, Flooring Services, Flooring Services Is it possible to only search the entire phrase and not just the word? In this instance, it should not replace Home Services, but only Flooring Services. Thanks.
    – Matt Lance
    Commented Oct 8, 2023 at 19:20

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .