51

I need to make a HTTP request and determine the response size in bytes. I have always used request for simple HTTP requests, but I am wondering if I can achieve this using raw?

>>> r = requests.get('https://github.com/', stream=True)
>>> r.raw

My only problem is I don't understand what raw returns or how I could count this data-type in bytes? Is using request and raw the right approach?

1
  • Note that the only way to get the size before downloading the entire file is to read the content-length header, if it exists.
    – cowlinator
    Commented Feb 25, 2020 at 4:00

2 Answers 2

81

Just take the len() of the content of the response:

>>> response = requests.get('https://github.com/')
>>> len(response.content)
51671

If you want to keep the streaming, for instance if the content is (too) large you can iterate over chunks of the data and sum their sizes:

>>> with requests.get('https://github.com/', stream=True) as response:
...     size = sum(len(chunk) for chunk in response.iter_content(8196))
>>> size
51671
17
  • 3
    Does this just parse Content-length or does it actually measure the full content? Also, does response.content include HTTP headers?
    – ewhitt
    Commented Jul 11, 2014 at 17:05
  • 2
    That does determine the actual length of the content. At least the Github front page does not send a Content-length header.
    – BlackJack
    Commented Jul 11, 2014 at 18:12
  • 2
    @MarlonAbeykoon Yes the value is in bytes because response.content is bytes and not characters. If you want characters use the response.text attribute. Of course this only makes sense if the body actually is text. If it's an image for instance, you'll get garbage or a decoding error when accessing the text attribute.
    – BlackJack
    Commented Sep 22, 2016 at 10:57
  • 1
    @MarlonAbeykoon In Python 2 len() on a str value means number of bytes and len() on a unicode value means number of characters. (Actually number of code points because not everey code point is a character and there are characters consisting of more than one code point.) There is no thing like ”ascii-latin1”. It's either ASCII or Latin1. Latin1 has ASCII as subset though.
    – BlackJack
    Commented Sep 22, 2016 at 13:54
  • 3
    The OP is using a streaming response, accessing r.content is going to load all the data into memory first and that is usually not what you want when streaming the response. Commented Feb 15, 2018 at 22:02
7

r.raw is an instance of urllib3.response.HTTPResponse. We can count the length of response by looking up the response's header Content-length or use built-in function len().

5
  • 9
    Yes, but Content-length is not always provided.
    – ewhitt
    Commented Jul 11, 2014 at 17:04
  • 4
    @ewhitt: If there is no Content-length header then you can't know the full length until you have received all data. Accessing r.content forces the issue, that reads from the raw connection until all data has been read, building up the full document in memory. You may as well not use stream=True in that case. Commented Feb 15, 2018 at 22:03
  • 5
    @MartijnPieters what about gzip responses which it automatically decompresses, so len(r.content) does not show true response size ... ?
    – madzohan
    Commented Jul 17, 2018 at 19:58
  • 2
    @madzohan what about those? If you need to know the HTTP body size of the response and there is no content-length header then see stackoverflow.com/questions/50825528/… Commented Jul 17, 2018 at 20:44
  • @MartijnPieters thank you) I've finished yesterday with simple urllib's response.read() :)
    – madzohan
    Commented Jul 18, 2018 at 4:49

Not the answer you're looking for? Browse other questions tagged or ask your own question.