h11 fails on multiple targets where other HTTP clients work #95

fbexiga · 2020-01-15T18:54:08Z

Considering for example this snippet of code

def run(target):
    conn = h11.Connection(our_role=h11.CLIENT)
    sock = socket.create_connection((target, 80))
    request = h11.Request(method="GET", target="/", headers=[("Host", target)])
    data = conn.send(request)
    sock.sendall(data)
    data = sock.recv(2048)
    conn.receive_data(data)
    conn.next_event()

>>> run("100.33.56.173")
h11._util.RemoteProtocolError: multiple Content-Length headers

>>> run("220.181.136.243")
h11._util.RemoteProtocolError: Response status_code should be in range [200, 600), not 600

Other errors of the same kind that I've encountered include:

h11._util.RemoteProtocolError: malformed data
h11._util.RemoteProtocolError: Receive buffer too long

These are all basically ill-configured servers, sometimes even against protocol specs, but they actually appear a lot in the wild. I think these should work nonetheless, as most HTTP clients don't make these kinds of restrictions and they do allow users to see the underlying data despite the misconfigurations.

The text was updated successfully, but these errors were encountered:

njsmith · 2020-01-15T19:41:44Z

Can you file individual bugs for the different issues? "h11 should work" is too vague to figure out actual code changes :-). The problem is to figure out what exactly servers are doing that h11 needs to support.

Multiple content-lengths already has an issue here: #92

What on earth is a "600" response? I've never heard of that.

"Malformed data" means that one of h11's parsing regexps failed. Need more details to figure out which one needs to be loosened and how.

"Receive buffer too long" probably means that the headers were >16384 bytes, which is the default max_incomplete_event_size: https://h11.readthedocs.io/en/latest/api.html#the-connection-object
This is already configurable, though we could potentially change the default if there's a good reason. The current value is pretty arbitrary; I based it on looking at some HTTP servers and picked something in the same ballpark:

h11/h11/_connection.py

Lines 23 to 32 in 68e32db

    
           # If we ever have this much buffered without it making a complete parseable 
        
           # event, we error out. The only time we really buffer is when reading the 
        
           # request/reponse line + headers together, so this is effectively the limit on 
        
           # the size of that. 
        
           # 
        
           # Some precedents for defaults: 
        
           # - node.js: 80 * 1024 
        
           # - tomcat: 8 * 1024 
        
           # - IIS: 16 * 1024 
        
           # - Apache: <8 KiB per line>

Probably it would make sense to look at clients too, though. Apparently curl has a hardcoded limit of 102400? https://curl.haxx.se/mail/lib-2019-09/0023.html

fbexiga · 2020-01-15T19:50:03Z

The 600 response doesn't actually exist in the standard, it's something that whoever configured the server created. Nonetheless, shouldn't be a reason to reject the response.

These were found out using the httpx library. More examples and discussion here: encode/httpx#767

njsmith · 2020-01-16T21:35:30Z

Here's an issue for one possible cause of the "malformed data" – not sure if it's the one you saw or not. (Or maybe you saw multiple, I dunno)

njsmith · 2020-01-17T06:02:14Z

Whoops, I meant this issue: #97

cancan101 · 2021-07-27T23:09:03Z

LinkedIn is an example of where the status code >= 600 comes up in the wild. They are (in)famous for returning 999 status codes. See: https://stackoverflow.com/questions/27231113/999-error-code-on-head-request-to-linkedin or just run curl -I --url https://www.linkedin.com/company/linkedin

tomchristie mentioned this issue Jan 16, 2020

On recoverable protocol errors from real-world usage. #96

Closed

tomchristie mentioned this issue Feb 18, 2021

Graceful handling of 204 responses that incorrectly a include non-zero Content-Length. encode/httpx#1474

Closed

2 tasks

tomchristie mentioned this issue Aug 6, 2021

Increase/Decrease status code ranges #134

Closed

tomchristie mentioned this issue Dec 29, 2022

Client should have more lenient behaviour wrt. spec violations. encode/httpx#767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

h11 fails on multiple targets where other HTTP clients work #95

h11 fails on multiple targets where other HTTP clients work #95

fbexiga commented Jan 15, 2020 •

edited

Loading

njsmith commented Jan 15, 2020

fbexiga commented Jan 15, 2020

njsmith commented Jan 16, 2020

njsmith commented Jan 17, 2020

cancan101 commented Jul 27, 2021

h11 fails on multiple targets where other HTTP clients work #95

h11 fails on multiple targets where other HTTP clients work #95

Comments

fbexiga commented Jan 15, 2020 • edited Loading

njsmith commented Jan 15, 2020

fbexiga commented Jan 15, 2020

njsmith commented Jan 16, 2020

njsmith commented Jan 17, 2020

cancan101 commented Jul 27, 2021

fbexiga commented Jan 15, 2020 •

edited

Loading