I have a Word document about 200 pages. The text I pasted from a PDF lost it's spaces for some reason. How do I fix this?
-
When you pasted the text in from the .PDF, did you right-click and Paste Special, selecting a specific paste operation?– spikey_richieCommented Dec 13, 2017 at 8:35
-
I think you need a better way (than copy/paste) to extract text from PDF. Can you share the PDF document so we can take a closer look?– EdiCommented Dec 13, 2017 at 8:35
-
2I feel trying to fix this in Word is incorrect. The solution is to use a different program to extract the text from the PDF. Maybe you need to use an OCR or similar.– DaveCommented Dec 13, 2017 at 9:00
Add a comment
|
2 Answers
This is quite interesting but also not easy
My real answer is to suggest you fix the issue which is how the PDF was exported!
However, this VBa may get you going. There is no undo so create a back up first
Option Explicit
Sub DoIt()
Dim maxChars As Integer
maxChars = 30 'update for the biggest word you want to check for (max characters in the word)
Dim pos As Integer
pos = 0
Dim total As Integer
total = Len(Range.Text)
Do While (pos < Len(Range.Text))
Dim s As String
s = ""
Dim wordToUse As String
wordToUse = ""
Dim i As Integer
For i = 1 To maxChars
s = s + Mid(Range.Text, pos + i, 1)
If SpellCheck(s) = True Then
wordToUse = s
End If
Next i
pos = pos + Len(wordToUse)
Dim lef As String
Dim rig As String
lef = Trim(left(Range.Text, pos))
rig = Trim(Mid(Range.Text, pos + 1))
Range.Text = Trim(lef) + " " + Trim(Replace(rig, " ", " "))
If pos >= total Then
Exit Do
End If
Loop
End Sub
Function SpellCheck(SomeWord As String) As Boolean
'credit https://stackoverflow.com/a/10776225/1221410
SpellCheck = Application.CheckSpelling(SomeWord)
End Function
The logic is simple - keeping adding characters until you find a valid word... at that point, make sure it's not part of a word (eg and exists in land). Then add some white space to the end.
-
Has this been tested/shown to work? I had to add/define a range first (I used Set myRange = ActiveDocument.Paragraphs(1).Range), after which it ran, but it seemed to insert spaces somewhat at random, rather than between known words :s Commented Nov 30, 2020 at 15:14
-
Update - the Application.Checkspelling( ) function is confused by CAPS - for reasons I won't go into, my block of un-spaced text was all caps, and converting it to all lower case made things work better! Commented Nov 30, 2020 at 16:03