Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(transport): Use a shorter ECN probe threshold initially #1964

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Jul 6, 2024

This commit adds a test where a client connects to a server over a connection dropping all ECN marked datagrams (ECN blackhole) in both directions, asserting 43 RTT to detect ECN blackhole, disable ECN and eventually establish connection. Test assumes 20 RTT for client to detect blackhole, 20 RTT for server to detect blackhole and 3 RTT for handshake to be confirmed.


Test for #1925.

How likely are ECN blackholes on today's Internet? Should this block ECN roll-out on Firefox? //CC @larseggert

Bauer et. alia. in "Measuring the State of ECN Readiness in Servers, Clients, and Routers" suggests that ECN blackholes are rare.

Test output
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Client at 0ms
0s  0ms DEBUG ECN probing: sent 1 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 50ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 100ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 150ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 200ms
0s  0ms DEBUG ECN probing: sent 2 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 250ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 300ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 350ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 400ms
0s  0ms DEBUG ECN probing: sent 3 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 450ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 500ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 550ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 600ms
0s  1ms DEBUG ECN probing: sent 4 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 650ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 700ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 750ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 800ms
0s  1ms DEBUG ECN probing: sent 5 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 850ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 900ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 950ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1000ms
0s  1ms DEBUG ECN probing: sent 6 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1050ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1100ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1150ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1200ms
0s  2ms DEBUG ECN probing: sent 7 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1250ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1300ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1350ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1400ms
0s  2ms DEBUG ECN probing: sent 8 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1450ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1500ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1550ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1600ms
0s  3ms DEBUG ECN probing: sent 9 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1650ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1700ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1750ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1800ms
0s  3ms DEBUG ECN probing: sent 10 probes
0s  3ms DEBUG ECN probing concluded with 10 probes sent
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1850ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 1900ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 1950ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2000ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Init
Processing Server at 2050ms
0s  5ms DEBUG ECN probing: sent 1 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2100ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2150ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2200ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2250ms
0s  6ms DEBUG ECN probing: sent 2 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2300ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2350ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2400ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2450ms
0s  8ms DEBUG ECN probing: sent 3 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2500ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2550ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2600ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2650ms
0s  9ms DEBUG ECN probing: sent 4 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2700ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2750ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2800ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2850ms
0s 10ms DEBUG ECN probing: sent 5 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 2900ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 2950ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3000ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3050ms
0s 12ms DEBUG ECN probing: sent 6 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3100ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3150ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3200ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3250ms
0s 13ms DEBUG ECN probing: sent 7 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3300ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3350ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3400ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3450ms
0s 14ms DEBUG ECN probing: sent 8 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3500ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3550ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3600ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3650ms
0s 15ms DEBUG ECN probing: sent 9 probes
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3700ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3750ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3800ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3850ms
0s 17ms DEBUG ECN probing: sent 10 probes
0s 17ms DEBUG ECN probing concluded with 10 probes sent
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 3900ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 3950ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 4000ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 4050ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = WaitInitial
Processing Client at 4100ms
0s 18ms TRACE RTT latest=100ms -> estimate=100ms~50ms
0s 18ms WARN ECN validation failed, no ECN counts in ACK frame
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 4150ms
0s 19ms TRACE RTT latest=500ms -> estimate=500ms~250ms
0s 19ms TRACE RTT latest=100ms -> estimate=450ms~287.5ms
0s 20ms WARN ECN validation failed, no ECN counts in ACK frame
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Client at 4200ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Handshaking
Processing Server at 4250ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Connected
Processing Client at 4300ms
0s 21ms TRACE RTT latest=100ms -> estimate=100ms~37.5ms
0s 21ms TRACE RTT latest=100ms -> estimate=100ms~28.125ms
[neqo-transport/src/connection/tests/mod.rs:190:13] c.state() = Confirmed
0s 22ms TRACE RTT latest=100ms -> estimate=406.25ms~303.125ms
test connection::tests::ecn::handshake_delay_with_ecn_blackhole ... ok

This commit adds a test where a client connects to a server over a connection
dropping all ECN marked datagrams (ECN blackhole) in both directions, asserting
43 RTT to detect ECN blackhole, disable ECN and eventually establish connection.
Copy link

github-actions bot commented Jul 6, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

Copy link

github-actions bot commented Jul 6, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Copy link

codecov bot commented Jul 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.97%. Comparing base (e05cc70) to head (0242b31).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1964   +/-   ##
=======================================
  Coverage   94.96%   94.97%           
=======================================
  Files         112      112           
  Lines       36504    36525   +21     
=======================================
+ Hits        34667    34689   +22     
+ Misses       1837     1836    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jul 8, 2024

Benchmark results

Performance differences relative to e05cc70.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [190.00 ns 190.59 ns 191.19 ns]
       change: [-0.1604% +0.2995% +0.7300%] (p = 0.20 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [230.40 ns 231.00 ns 231.63 ns]
       change: [-0.6668% -0.1846% +0.2545%] (p = 0.44 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [230.59 ns 231.29 ns 232.14 ns]
       change: [-1.8009% -0.7863% -0.0234%] (p = 0.08 > 0.05)

Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
5 (5.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.
       time:   [212.48 ns 212.78 ns 213.10 ns]
       change: [+0.4856% +1.0617% +1.6352%] (p = 0.00 < 0.05)

Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [119.03 ms 119.10 ms 119.17 ms]
       change: [+0.3308% +0.4269% +0.5194%] (p = 0.00 < 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

transfer/Run multiple transfers with varying seeds: No change in performance detected.
       time:   [53.900 ms 56.938 ms 60.007 ms]
       thrpt:  [66.659 MiB/s 70.252 MiB/s 74.212 MiB/s]
change:
       time:   [-6.1261% +1.5777% +9.3197%] (p = 0.69 > 0.05)
       thrpt:  [-8.5252% -1.5532% +6.5259%]
transfer/Run multiple transfers with the same seed: No change in performance detected.
       time:   [67.657 ms 73.881 ms 80.075 ms]
       thrpt:  [49.953 MiB/s 54.141 MiB/s 59.122 MiB/s]
change:
       time:   [-15.215% -3.5097% +8.3454%] (p = 0.56 > 0.05)
       thrpt:  [-7.7026% +3.6374% +17.946%]

Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low mild

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
       time:   [280.38 ms 285.37 ms 290.13 ms]
       thrpt:  [344.67 MiB/s 350.43 MiB/s 356.66 MiB/s]
change:
       time:   [-1.7285% +0.6221% +2.9358%] (p = 0.62 > 0.05)
       thrpt:  [-2.8520% -0.6183% +1.7589%]
1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [443.92 ms 447.68 ms 451.48 ms]
       thrpt:  [22.149 Kelem/s 22.337 Kelem/s 22.527 Kelem/s]
change:
       time:   [-1.7177% -0.6391% +0.4003%] (p = 0.26 > 0.05)
       thrpt:  [-0.3987% +0.6432% +1.7478%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [68.561 ms 68.850 ms 69.183 ms]
       thrpt:  [14.454  elem/s 14.524  elem/s 14.586  elem/s]
change:
       time:   [-0.9345% -0.1946% +0.5412%] (p = 0.61 > 0.05)
       thrpt:  [-0.5383% +0.1950% +0.9433%]

Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
5 (5.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 115.1 ± 18.8 101.5 177.7 1.00
neqo msquic reno on 275.3 ± 11.2 257.2 290.4 1.00
neqo msquic reno 265.7 ± 11.0 254.7 288.4 1.00
neqo msquic cubic on 293.8 ± 71.3 262.2 495.6 1.00
neqo msquic cubic 272.5 ± 7.8 257.4 281.9 1.00
msquic neqo reno on 161.4 ± 11.6 145.4 184.1 1.00
msquic neqo reno 145.4 ± 17.6 113.9 166.5 1.00
msquic neqo cubic on 152.4 ± 10.8 135.2 168.6 1.00
msquic neqo cubic 172.0 ± 26.4 112.1 206.8 1.00
neqo neqo reno on 182.9 ± 13.7 157.7 203.8 1.00
neqo neqo reno 174.3 ± 27.0 145.3 233.5 1.00
neqo neqo cubic on 183.0 ± 20.2 151.2 222.1 1.00
neqo neqo cubic 184.4 ± 13.6 163.8 212.0 1.00

⬇️ Download logs

@mxinden
Copy link
Collaborator Author

mxinden commented Jul 15, 2024

Friendly ping @larseggert.

@larseggert
Copy link
Collaborator

Thanks. I'll add a fix so that we stop trying ECN earlier in this case.

@larseggert larseggert marked this pull request as ready for review July 15, 2024 11:16
@larseggert larseggert changed the title test(transport): handshake delay with ecn blackhole Jul 15, 2024
@larseggert
Copy link
Collaborator

@mxinden WDYT?

Copy link
Collaborator Author

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me. Thank you!

Intuitively 6 RTT to detect an ECN black-hole on the client seems fine under the assumption that ECN black-holes are very rare.

I can clean-up the tests later today or tomorrow unless you want to.

neqo-transport/src/ecn.rs Outdated Show resolved Hide resolved
neqo-transport/src/ecn.rs Outdated Show resolved Hide resolved
neqo-transport/src/ecn.rs Outdated Show resolved Hide resolved
neqo-transport/src/ecn.rs Outdated Show resolved Hide resolved
neqo-transport/src/ecn.rs Outdated Show resolved Hide resolved
larseggert and others added 7 commits July 15, 2024 15:02
Co-authored-by: Max Inden <mail@max-inden.de>
Signed-off-by: Lars Eggert <lars@eggert.org>
Co-authored-by: Max Inden <mail@max-inden.de>
Signed-off-by: Lars Eggert <lars@eggert.org>
Co-authored-by: Max Inden <mail@max-inden.de>
Signed-off-by: Lars Eggert <lars@eggert.org>
Co-authored-by: Max Inden <mail@max-inden.de>
Signed-off-by: Lars Eggert <lars@eggert.org>
Co-authored-by: Max Inden <mail@max-inden.de>
Signed-off-by: Lars Eggert <lars@eggert.org>
Copy link
Collaborator Author

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larseggert I cleaned-up the tests. Mind taking another look?

neqo-transport/src/connection/tests/mod.rs Show resolved Hide resolved
neqo-transport/src/connection/tests/mod.rs Show resolved Hide resolved
@larseggert
Copy link
Collaborator

@KershawChang or @martinthomson, could you take a look? This would let us enable ECN in Gecko w/@mxinden's patch.

@mxinden mxinden mentioned this pull request Jul 17, 2024
@larseggert larseggert added the needs-review Needs a review by a CODEOWNER. label Jul 17, 2024
@larseggert larseggert linked an issue Jul 17, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-review Needs a review by a CODEOWNER.
2 participants