I am trying to print text from the pdf file using PyPDF2 module but some special characters are printing.
already tried this solution but it does not seems to work.
code
import PyPDF2
obj = open('/home/sarthak/Documents/UNIT-4.pdf','rb')
pdfReader = PyPDF2.PdfFileReader(obj)
print(pdfReader.numPages) #printing No. of pages
pageObj = pdfReader.getPage(0)
print(pageObj.extractText().encode('ascii','ignore')) #also used 'utf-8' but doesn't work either
obj.close()
output
17
b'\n\n\n\n!#$\n\n\n\n\n\n\n\n\n\n\n \n\n"%$\n\n\n"#\n\n\n $\n\n\n\'())(*+, -$&\n\n\n\n\n $&-\n $\n'