I only today became fully aware of a very annoying issue we have in former 1G racks in codfw. Specifically this came up while discussing T366205: codfw:(3) wikikube-ctrl NIC upgrade to 10G today.
That host is in codfw rack A3, which has a shiny new QFX5120 in it. The problem is, however, every single port on the box has a 1G copper SFP in it with all hosts connected at 1G:
cmooney@lsw1-a3-codfw> show interfaces descriptions Interface Admin Link Description ge-0/0/0 up up mw2377 ge-0/0/1 up up mw2378 ge-0/0/2 up up mw2379 ge-0/0/3 up up mw2380 ge-0/0/4 up up mw2381 ge-0/0/5 up up mw2382 ge-0/0/6 up up mw2383 ge-0/0/7 up up mw2384 ge-0/0/8 up up mw2385 ge-0/0/9 up up mw2386 ge-0/0/10 up up mw2387 ge-0/0/11 up up mw2388 ge-0/0/12 up up mw2389 ge-0/0/13 up up mw2390 ge-0/0/14 down down DISABLED ge-0/0/15 up up mw2392 ge-0/0/16 up up db2142 ge-0/0/17 up up mw2393 ge-0/0/18 up up mw2394 ge-0/0/19 up up mw2395 ge-0/0/20 up up mw2396 ge-0/0/21 up up db2158 ge-0/0/26 up up mw2397 ge-0/0/27 up up mw2398 ge-0/0/28 up up mw2399 ge-0/0/29 up up mw2400 ge-0/0/30 up up netmon2002 ge-0/0/31 up up mw2291 ge-0/0/32 up up mw2292 ge-0/0/33 up up kafka-main2006 ge-0/0/34 up up es2020 ge-0/0/36 up up mw2293 ge-0/0/37 up up mw2294 ge-0/0/38 up up mw2295 ge-0/0/39 up up mw2296 ge-0/0/40 up up mw2297 ge-0/0/41 up up mw2298 ge-0/0/42 up up mw2299 ge-0/0/43 up up mw2300 et-0/0/54 up up Core: ssw1-a8-codfw:et-0/0/2 {#230403800021} et-0/0/55 up up Core: ssw1-a1-codfw:et-0/0/2 {#230403800027}
As we know the QFX5120 is Trident 3 based, and due to how the ASIC works every 4 SFP ports are actually a single 40/100G connection. The result being adjacent ports in blocks of 4 all have to have the same speed.
cmooney@lsw1-a3-codfw> show configuration chassis | display set set chassis fpc 0 pic 0 port 0 speed 1G set chassis fpc 0 pic 0 port 4 speed 1G set chassis fpc 0 pic 0 port 8 speed 1G set chassis fpc 0 pic 0 port 12 speed 1G set chassis fpc 0 pic 0 port 16 speed 1G set chassis fpc 0 pic 0 port 20 speed 1G set chassis fpc 0 pic 0 port 24 speed 1G set chassis fpc 0 pic 0 port 28 speed 1G set chassis fpc 0 pic 0 port 32 speed 1G set chassis fpc 0 pic 0 port 36 speed 1G set chassis fpc 0 pic 0 port 40 speed 1G
So to complete the upgrade in the above task @Papaul is planning to move the server to another rack. Which is fine, but nobody can deny it's not a lot of effort.
This is quite an annoying problem to be honest, which I hadn't fully thought through. It strikes me that over time we are going to have to do an awful lot of server moves/shuffling as we move/upgrade hosts to 10G. In rows A and B of codfw alone we have 7 switches exactly like this one, with over 200 servers connected at 1G.
One option that occurs to me is that we could use some of the QSFP28 ports on these switches, in channelized mode, to create 10/25G ports on them which server uplinks could be moved to if they are going from 1 to 10G? Avoiding having to move / re-rack the whole server? As we upgrade hosts and have more empty SFP ports we can then reconfigure those for the higher speeds and move hosts back. There would still be more juggling of links than we'd like, but at least we aren't moving servers the whole time? Anyway just an idea.