Opening pdf urls with pyPdf

Question

How would I open a pdf from url instead of from the disk

Something like

input1 = PdfFileReader(file("http://example.com/a.pdf", "rb"))

I want to open several files from web and download a merge of all the files.

Here is the Python3 solution: stackoverflow.com/questions/47177060 — tommy.carstensen, Commented Sep 14, 2018 at 0:59

John · Accepted Answer · 2012-03-17 20:03:31Z

20

I think urllib2 will get you what you want.

from urllib2 import Request, urlopen
from pyPdf import PdfFileWriter, PdfFileReader
from StringIO import StringIO

url = "http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf"
writer = PdfFileWriter()

remoteFile = urlopen(Request(url)).read()
memoryFile = StringIO(remoteFile)
pdfFile = PdfFileReader(memoryFile)

for pageNum in xrange(pdfFile.getNumPages()):
        currentPage = pdfFile.getPage(pageNum)
        #currentPage.mergePage(watermark.getPage(0))
        writer.addPage(currentPage)


outputStream = open("output.pdf","wb")
writer.write(outputStream)
outputStream.close()

edited Mar 17, 2012 at 20:03

answered Mar 17, 2012 at 16:05

John

13.5k7 gold badges52 silver badges103 bronze badges

I get AttributeError: 'str' object has no attribute 'seek'
– meadhikari
Commented Mar 17, 2012 at 16:38
1

@meadhikari, sorry about that, it's fixed now.
– John
Commented Mar 17, 2012 at 17:15
1

@meadhikari Your code is good, my fault again. outputStream = file("output.pdf","wb") needs to be outputStream = open("output.pdf","wb")
– John
Commented Mar 17, 2012 at 20:04
3

use urllib.request instead of urllib2 for python 3.5 and higher
– Shriganesh Kolhe
Commented Apr 24, 2020 at 5:03
4

for "StringIO" use >> from io import StringIO ## for Python 3
– Shriganesh Kolhe
Commented Apr 24, 2020 at 5:07

| Show 1 more comment

Chaudhry Ihsan · Accepted Answer · 2022-10-06 12:46:59Z

9

I think it could be simplified with Requests now.

import io
import requests
from PyPDF2 import PdfReader
headers = {'User-Agent': 'Mozilla/5.0 (X11; Windows; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'}

url = 'https://www.url_of_pdf_file.com/sample.pdf'
response = requests.get(url=url, headers=headers, timeout=120)
on_fly_mem_obj = io.BytesIO(response.content)
pdf_file = PdfReader(on_fly_mem_obj)

edited Oct 6, 2022 at 12:46

answered Oct 6, 2022 at 12:46

Chaudhry Ihsan

911 silver badge3 bronze badges

2

this is the right answer now.
– rawkintrevo
Commented Mar 23, 2023 at 16:57

Add a comment |

Switch · Accepted Answer · 2012-03-17 16:42:07Z

4

Well, you can first download the pdf separately and then use pypdf to read it

import urllib

url = 'http://example.com/a.pdf'
webFile = urllib.urlopen(url)
pdfFile = open(url.split('/')[-1], 'w')
pdfFile.write(webFile.read())
webFile.close()
pdfFile.close()

base = os.path.splitext(pdfFile)[0]
os.rename(pdfFile, base + ".pdf")

input1 = PdfFileReader(file(pdfFile, "rb"))

edited Mar 17, 2012 at 16:42

answered Mar 17, 2012 at 16:09

Switch

15.2k21 gold badges71 silver badges110 bronze badges

Hey, what is thisFile from the line base = os.path.splitext(thisFile)[0]
– meadhikari
Commented Mar 17, 2012 at 16:33
1

Oh sorry it was a mistake, it should be pdfFile (the absolute path for the downloaded file)
– Switch
Commented Mar 17, 2012 at 16:41

Add a comment |

Aseem · Accepted Answer · 2021-08-06 00:39:42Z

2

For python 3.8

import io
from urllib.request import Request, urlopen

from PyPDF2 import PdfFileReader


class GetPdfFromUrlMixin:
    def get_pdf_from_url(self, url):
        """
        :param url: url to get pdf file
        :return: PdfFileReader object
        """
        remote_file = urlopen(Request(url)).read()
        memory_file = io.BytesIO(remote_file)
        pdf_file = PdfFileReader(memory_file)
        return pdf_file

answered Aug 6, 2021 at 0:39

Aseem

6,5237 gold badges49 silver badges88 bronze badges

1

You might want to use PdfReader instead of the deprecated PdfFileReader
– Martin Thoma
Commented Oct 15, 2022 at 12:13
Also it is completely unnecessary to put this function inside a class
– Martin Thoma
Commented Oct 15, 2022 at 12:20

Add a comment |

Collectives™ on Stack Overflow

Opening pdf urls with pyPdf

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
pdf
pypdf
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonpdfpypdf or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pdf
pypdf
or ask your own question.