Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h11 fails on multiple targets where other HTTP clients work #95

Open
fbexiga opened this issue Jan 15, 2020 · 5 comments
Open

h11 fails on multiple targets where other HTTP clients work #95

fbexiga opened this issue Jan 15, 2020 · 5 comments

Comments

@fbexiga
Copy link

fbexiga commented Jan 15, 2020

Considering for example this snippet of code

def run(target):
    conn = h11.Connection(our_role=h11.CLIENT)
    sock = socket.create_connection((target, 80))
    request = h11.Request(method="GET", target="/", headers=[("Host", target)])
    data = conn.send(request)
    sock.sendall(data)
    data = sock.recv(2048)
    conn.receive_data(data)
    conn.next_event()
>>> run("100.33.56.173")
h11._util.RemoteProtocolError: multiple Content-Length headers
>>> run("220.181.136.243")
h11._util.RemoteProtocolError: Response status_code should be in range [200, 600), not 600

Other errors of the same kind that I've encountered include:

h11._util.RemoteProtocolError: malformed data
h11._util.RemoteProtocolError: Receive buffer too long

These are all basically ill-configured servers, sometimes even against protocol specs, but they actually appear a lot in the wild. I think these should work nonetheless, as most HTTP clients don't make these kinds of restrictions and they do allow users to see the underlying data despite the misconfigurations.

@njsmith
Copy link
Member

njsmith commented Jan 15, 2020

Can you file individual bugs for the different issues? "h11 should work" is too vague to figure out actual code changes :-). The problem is to figure out what exactly servers are doing that h11 needs to support.

Multiple content-lengths already has an issue here: #92

What on earth is a "600" response? I've never heard of that.

"Malformed data" means that one of h11's parsing regexps failed. Need more details to figure out which one needs to be loosened and how.

"Receive buffer too long" probably means that the headers were >16384 bytes, which is the default max_incomplete_event_size: https://h11.readthedocs.io/en/latest/api.html#the-connection-object
This is already configurable, though we could potentially change the default if there's a good reason. The current value is pretty arbitrary; I based it on looking at some HTTP servers and picked something in the same ballpark:

h11/h11/_connection.py

Lines 23 to 32 in 68e32db

# If we ever have this much buffered without it making a complete parseable
# event, we error out. The only time we really buffer is when reading the
# request/reponse line + headers together, so this is effectively the limit on
# the size of that.
#
# Some precedents for defaults:
# - node.js: 80 * 1024
# - tomcat: 8 * 1024
# - IIS: 16 * 1024
# - Apache: <8 KiB per line>

Probably it would make sense to look at clients too, though. Apparently curl has a hardcoded limit of 102400? https://curl.haxx.se/mail/lib-2019-09/0023.html

@fbexiga
Copy link
Author

fbexiga commented Jan 15, 2020

The 600 response doesn't actually exist in the standard, it's something that whoever configured the server created. Nonetheless, shouldn't be a reason to reject the response.

These were found out using the httpx library. More examples and discussion here: encode/httpx#767

@njsmith
Copy link
Member

njsmith commented Jan 16, 2020

Here's an issue for one possible cause of the "malformed data" – not sure if it's the one you saw or not. (Or maybe you saw multiple, I dunno)

@njsmith
Copy link
Member

njsmith commented Jan 17, 2020

Whoops, I meant this issue: #97

@cancan101
Copy link

LinkedIn is an example of where the status code >= 600 comes up in the wild. They are (in)famous for returning 999 status codes. See: https://stackoverflow.com/questions/27231113/999-error-code-on-head-request-to-linkedin or just run curl -I --url https://www.linkedin.com/company/linkedin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants