2

I have a pdf file which is Persian script and it is a Right-to-Left. Since Persian uses UTF-8 format therefore I can't convert it into a plain text in Microsoft Word, also I can't copy-paste the text resulting unreadable characters. I have tried a lot of softwares such as unipdf and e-Pdf Converter however after the conversion still the characters are not displayed properly. I even tried OCR but again same problem appeared. The pdf does'nt have any password or restrictions.

Does anyone have any other ideas?

Edit: I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)

10
  • 3
    Microsoft Word supports UTF-8 format. It also supports right to left languages. So why exactly can't you convert it to a Word document?
    – Ramhound
    Commented May 6, 2015 at 13:14
  • Hey thanx for your consideration. The source of my file is PDF so I don't know what exactly happens when I try to copy and paste it in Microsoft Word, but it doesn't show proper character. The same thing happens when I try to convert it using third party tools.
    – Mehdi
    Commented May 6, 2015 at 13:21
  • 1
    possible duplicate of Cutting & Pasting Vietnamese characters from a PDF Commented May 6, 2015 at 14:27
  • @RedGrittyBrick I read your answer. but in my case I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)- Thanks
    – Mehdi
    Commented May 6, 2015 at 14:59
  • How was the PDF created? Electronically or scanned and you are hoping for OCR to take over? Commented May 6, 2015 at 15:39

4 Answers 4

1

I had the same problem with converting pdf files to word. After copy/paste in Word, the formatting changed and caused trouble. I tried several online converters but they also failed.
The only method that worked was as follows:

  1. Open the pdf file with Adobe Acrobat Reader, then from the file menu choose print. From the printer names, choose adobe acrobat. Yes, you are about to create a pdf from a pdf!
  2. Open the new pdf file with Google Chrome (drag and drop the file onto Chrome).
  3. Now simply select all the text (ctrl + A) and copy/paste it into a blank Word file.
1
  • Thank you. This work for me but I didn't open the file with Adobe Acrobat Reader, Opened in IE after that using print screen and paint capture from pdf.
    – user5730
    Commented Nov 10, 2020 at 11:40
1

Very often PDF files in non-Latin scripts (especially RTL scripts such as Arabic, Hebrew and Farsi) are generated by software which sort of LTR-ifies the text at the word or sentence-fragment level, or just somehow gets the right glyphs to display but you get gibberish for the 'logical' text. In these cases there is very little to be done except write a custom back-converter which is effectively not an option.

However, if you can figure out how the file is created - which is often indicated in the meta-data accessible using common PDF readers - there might be an option to open the file in the application which generated it, or at least you could make your question more specific.

0

I have currently worked to convert a pdf to an editable Persian text. The best solution I have found is to use google doc as follows.

  1. You should convert pdf pages to images. For this you can use Adobe acrobat reader( Not the adobe reader which is free) or in Linux I use GIMP to open a pdf and then I select to open each page in a separate image. It's your own choice.
  2. Upload the image files to Google Drive
  3. Go to Google Drive and right click on each image then click open with google doc
  4. wait until google doc open an editable text from your image
  5. Copy it to word

I dont know if there are any automated method. I hope some time I have time to make an application for doing this automatically.

0

I know it's too late for the answering but for anyone having the same question, I could suggest Delix.ir which is a Persian OCR and PDF to word converter.

Disclaimer: I'm the founder of delix.ir and I hope it won't be treated as a advertisement.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .