Whenever I copy formatted text from a PDF file which is formatted to have line breaks (or carriage returns), I need to find a way to remove these line breaks without removing the paragraph format.

To do this I need to use RegEx (Regular expressions) to only remove the line breaks which aren't preceded by a period.

So for example, if a string of text has a line break right after a period, that is obviously almost always a legitimate line break which will start a new paragraph. If a string of text has a line break mid-word or after a word with no period, it's simply part of the bad formatting I need to get rid of.

My problem is that I don't know how to use RegEx to make it only remove the ^p tags in word or CRLF or line breaks in any format under the conditions that it omits ones following a period.

  • Please mention your operating system. On anything but windows, this is trivial. I take it you are using windows? What RegEx engine are you using? We need to know more details in order to provide you with a working RegEx.
    – terdon
    Commented Sep 2, 2012 at 11:57
  • Do you simply want to remove the line breaks? I suspect you really want to replace them with spaces. And what about line breaks after ? or !? Or .), ?), or !)? Commented Aug 1, 2013 at 0:00

4 Answers 4


Solution for MS Word:

  1. Open Find & Replace (Ctrl+H) and check the "Use wildcards" option. If you don't see the "Use wildcards" option, click "More".
  2. Copy the following into the "Find What" box: ([!.])^0013
  3. Copy the following into the "Replace What" box: \1
  4. Click "Replace All"


  • [!.] means "find every symbol except dot"
  • ^0013 is a paragraph mark, so in the "Find What" we will find every non-dot symbol followed by a paragraph mark
  • Parentheses mean that we will place that non-dot symbol in memory to use later
  • \1 replaces our memorized symbol at the location where we find it

Note that the ^0013 is not inside the parentheses, so the final text would be without paragraph marks.


In Word try to find and replace the manual line break ^l with the paragraph mark ^p.

  • It's from a pdf all line breaks are ^p
    – Luke Allen
    Commented Sep 2, 2012 at 6:54
  • ok. try to replace ^p with a <space> this will fix the paragraph marks but the only problem you will face that all paragraph will be just one paragraph.
    – hsawires
    Commented Sep 2, 2012 at 7:15
  • yeah that is what the question i posted is trying to solve I already knew to replace ^p with <space>, need to replace only ^p that don't have <period> before them, that makes the paragraphs maintained but not the formatting breaks
    – Luke Allen
    Commented Sep 2, 2012 at 7:23
  • I tried to save the PDF in acrobat into word document and it works fine, except that you may do extra works to clean up the doc file from unwanted texts. some other software may help you converting PDF2DOC
    – hsawires
    Commented Sep 2, 2012 at 7:32

Because sentences can end in more punctuation than a period I’ve updated hsawires’ answer to:

  1. Find every symbol except dot, question mark, exclamation point, close quote or colon.
  2. Additionally, in some cases you’ll want to add a space after \1 in the “Replace What” box to keep from combining the last word on one line with the first word on the next line.

Solution for MS Word:

  1. Open Find & Replace (Ctrl+H) and check the “Use wildcards” option.
  2. If you don’t see the “Use wildcards” option, click “More.”
  3. Copy the following into the “Find What” box: ([!.\?\!"':])^0013
  4. Copy the following into the “Replace What” box: \1
  5. Click “Replace All.”


[!.\?\!"':] means “find every symbol except dot, question mark, exclamation point, close quote or colon.” - ^0013 is a paragraph mark, so in the “Find What” we will find every non-dot symbol followed by a paragraph mark. - Parentheses mean that we will place that non-dot symbol in memory to use later. - \1 replaces our memorized symbol at the location where we find it.

Note that the ^0013 is not inside the parentheses, so the final text would be without paragraph marks.


A much easier way to create/modify an address block before cutting and pasting it into an email or other document is to declare a 3/4 row table and type the address data into each row. Then get rid of the lines.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .