cmooney (Cathal Mooney)
SRE (netops)

Projects

Calendar

User Details

User Since: May 10 2021, 3:25 PM (166 w, 3 d)
Availability: Available
IRC Nick: topranks
LDAP User: Cathal Mooney
MediaWiki User: CMooney (WMF) [ Global Accounts ]

Recent Activity
View All

Today

cmooney added a comment to T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

GNMI stats proved very helpful to keep an eye on the bandwidth shifting around

Thu, Jul 18, 6:56 PM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T360789: codfw row C/D upgrade racking task.

Thu, Jul 18, 6:53 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

cmooney closed T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches as Resolved.

Work completed, traffic is currently bridged through the two spine switches over the AEs from the row C/D virtual-chassis and the CR interfaces connected to the Spines are working as VRRP gateway.

Thu, Jul 18, 6:41 PM · netops, Infrastructure-Foundations, SRE

cmooney closed T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches, a subtask of T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW, as Resolved.

Thu, Jul 18, 6:39 PM · netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 18, 4:46 PM · netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 18, 4:12 PM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

There is possibly a variant of option 1:

Thu, Jul 18, 2:07 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack

cmooney added a comment to T370164: Request additional mgmt IP range for frack servers.

In T370164#9993574, @ayounsi wrote:

Or we could just use a IPv6 /64 and stop worrying about space :)

Thu, Jul 18, 8:52 AM · Infrastructure-Foundations, DC-Ops, netops, SRE, ops-codfw

Yesterday

cmooney updated the task description for T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices.

Wed, Jul 17, 10:24 PM · Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices.

Wed, Jul 17, 10:18 PM · Infrastructure-Foundations, netops, SRE

cmooney renamed T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices from Issue with subscribing to GNMI telemetry on certain QFX5120 devices to Issue creating GNMI telemetry subscription to certain QFX5120 devices.

Wed, Jul 17, 9:21 PM · Infrastructure-Foundations, netops, SRE

cmooney added a subtask for T369384: Productionize gnmic network telemetry pipeline: T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices.

Wed, Jul 17, 9:07 PM · netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices: T369384: Productionize gnmic network telemetry pipeline.

Wed, Jul 17, 9:07 PM · Infrastructure-Foundations, netops, SRE

cmooney triaged T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices as Low priority.

Wed, Jul 17, 9:07 PM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T370164: Request additional mgmt IP range for frack servers.

In T370164#9989347, @ayounsi wrote:

We will need to migrate the whole range to a new prefix :( Running 2 ranges is going to be a pain long term, and would need much more work on the automation side than a migration.

Wed, Jul 17, 8:32 PM · Infrastructure-Foundations, DC-Ops, netops, SRE, ops-codfw

cmooney added a comment to T369825: 10gbit nic option for centrallog1002.

In T369825#9992536, @VRiley-WMF wrote:

There is another rack in row B that we can use for this server. However, moving it to another rack will require an IP change when I last confirmed it.

Wed, Jul 17, 8:17 PM · SRE, DC-Ops, ops-eqiad

cmooney added a comment to T369256: Kafka lag for benthos-mw-accesslog-sampler and mediawiki.httpd.accesslog topic.

Though there didn't seem to be a problem afterwards, the timing makes me think of T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad

Wed, Jul 17, 11:10 AM · Observability-Logging

cmooney closed T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad as Resolved.

Wed, Jul 17, 11:08 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad , a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.

Wed, Jul 17, 11:08 AM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.

In T368513#9938867, @fgiunchedi wrote:

Those are SSH probes from local prometheus hosts indeed, in this case the probe consists of a TCP connection reading the SSH banner, and then closing the connection, HTH!

Wed, Jul 17, 9:46 AM · Infrastructure-Foundations, netops, SRE

cmooney awarded T355750: CFSSL gencert "remote error: tls: certificate require" a Love token.

Wed, Jul 17, 9:04 AM · Infrastructure-Foundations, CFSSL-PKI

Tue, Jul 16

cmooney added a comment to T365997: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad .

Upgrade completed, all hosts back online and pinging ok. Thanks all for the assistance!

Tue, Jul 16, 3:26 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

Fri, Jul 12

cmooney closed T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad as Resolved.

Fri, Jul 12, 11:37 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad , a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.

Fri, Jul 12, 11:37 AM · Infrastructure-Foundations, netops, SRE

cmooney edited P66397 (An Untitled Masterwork).

Fri, Jul 12, 10:32 AM

cmooney created P66397 (An Untitled Masterwork).

Fri, Jul 12, 10:31 AM

Thu, Jul 11

cmooney added a comment to T365996: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad .

Switch upgrade complete, all looks good hosts are online and responding to ping again. Thanks for the assistance!

Thu, Jul 11, 2:34 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad, a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.

Thu, Jul 11, 10:19 AM · Infrastructure-Foundations, netops, SRE

cmooney closed T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad as Resolved.

Thu, Jul 11, 10:19 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

Wed, Jul 10

cmooney added a comment to T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad.

Switch upgraded successfully and all hosts back online/pinging. Thanks everyone for the assistance!

Wed, Jul 10, 3:45 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T365995: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad as Resolved.

In T365995#9968809, @Marostegui wrote:

@cmooney got to be closed?

Wed, Jul 10, 12:41 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T365995: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad, a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.

Wed, Jul 10, 12:39 PM · Infrastructure-Foundations, netops, SRE

cmooney closed T364103: Adjust IBGP route-reflector spine/leaf automation to support separate client clusters as Resolved.

Closing task - is a duplicate work was completed under T365169

Wed, Jul 10, 11:51 AM · netops, Infrastructure-Foundations, SRE

cmooney closed T366348: Include vlans with defined IRB int in device vlans even if no port present as Resolved.

Wed, Jul 10, 11:49 AM · Patch-For-Review, Infrastructure-Foundations, netops, SRE

cmooney added a parent task for T364092: Upgrade core routers to Junos 22.4R3: Unknown Object (Task).

Wed, Jul 10, 11:37 AM · netops, Infrastructure-Foundations, SRE

cmooney moved T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches from Backlog to This quarter on the netops board.

Wed, Jul 10, 11:35 AM · netops, Infrastructure-Foundations, SRE

cmooney moved T369106: Add new elements to automation to support new switches in codfw C/D from Backlog to This quarter on the netops board.

Wed, Jul 10, 11:34 AM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T360356: Request access to servers Dcops group.

In T360356#9950740, @wiki_willy wrote:

ifconfig

Wed, Jul 10, 11:33 AM · Patch-For-Review, User-Elukey, SRE, Infrastructure-Foundations

cmooney added a comment to T312635: Consolidate Automation Templates for DC Switches.

I think the work on this can be done in tandem with the review of the setup in T367203: Sub-optimal cloud routing for WMCS in eqiad when link fails.

Wed, Jul 10, 11:18 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney added a comment to T369011: hw troubleshooting: Management and main interfaces down for kubernetes1051.eqiad.wmnet.

In T369011#9968167, @JMeybohm wrote:

I've disabled BGP for this node for now.

Wed, Jul 10, 10:53 AM · SRE, ops-eqiad, DC-Ops, Prod-Kubernetes, serviceops

Tue, Jul 9

cmooney added a comment to T365995: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad.

Switch upgrade completed without issue. All connected hosts are back online and responding to ping now, thanks all for the help.

Tue, Jul 9, 3:30 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney added a comment to T365998: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad .

Tue, Jul 9, 3:23 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney added a comment to T369011: hw troubleshooting: Management and main interfaces down for kubernetes1051.eqiad.wmnet.

In T369011#9948452, @JMeybohm wrote:

I've deleted the node from the k8s API as a required istio update would not finish successfully because it was waiting for the deamonset to be scheduled on the broken node. The node should auto-join the cluster (cordoned) when it comes back online.

Tue, Jul 9, 2:58 PM · SRE, ops-eqiad, DC-Ops, Prod-Kubernetes, serviceops

Mon, Jul 8

cmooney updated the task description for T365998: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad .

Mon, Jul 8, 4:07 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2.

Mon, Jul 8, 4:06 PM · Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Mon, Jul 8, 4:01 PM · netops, Infrastructure-Foundations, SRE

cmooney triaged T369504: Upgrade Management routers to 22.4R3-S2 as Medium priority.

Mon, Jul 8, 1:27 PM · netops, Infrastructure-Foundations, SRE

Fri, Jul 5

cmooney updated the task description for T369384: Productionize gnmic network telemetry pipeline.

Fri, Jul 5, 5:20 PM · netops, Infrastructure-Foundations, SRE

cmooney closed T333210: Investigate Junos Prometheus exporter as Resolved.

Seems like a great tool, but we are going to move forward with pulling these stats using gnmic after successfully testing it under T326322. If we find any blockers that gNMI can't cover we can revisit using junos_exporter but hopefully that won't be needed. Future gnmic pipeline development will be tracked in T369384: Productionize gnmic network telemetry pipeline

Fri, Jul 5, 5:17 PM · Observability-Metrics, SRE, observability, netops, Infrastructure-Foundations

cmooney closed T326322: Add per-output queue monitoring for Juniper network devices as Resolved.

I'm going to close this task now, the current gnmic collection is providing what we need in terms of the queue-stats for observing how QoS operates, and it seems best to track future, general improvements to our network telemetry in a separate task (see bleow).

Fri, Jul 5, 5:16 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney renamed T369384: Productionize gnmic network telemetry pipeline from Poructionize gnmic network telemetry pipeline to Productionize gnmic network telemetry pipeline.

Fri, Jul 5, 5:12 PM · netops, Infrastructure-Foundations, SRE

cmooney triaged T369384: Productionize gnmic network telemetry pipeline as Medium priority.

Fri, Jul 5, 5:12 PM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface.

Bit of an update on this one. We had a problem recently after lvs2011 was rebooted which is related, which we need to address.

Fri, Jul 5, 11:39 AM · Patch-For-Review, Traffic, SRE

cmooney triaged T369351: Model GRE tunnels in Netbox as Low priority.

Fri, Jul 5, 10:36 AM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T362392: Routed Ganeti: Add support for VM BGP.

In T362392#9955949, @ayounsi wrote:

Adding bfd yes; returns bird: /etc/bird/bird.conf:90:1 Multihop BGP with BFD requires specified local address
Adding local fe80::2022:22ff:fe22:2201 requires an interface stanza, which is not possible with dynamic tapX interfaces.

Fri, Jul 5, 9:42 AM · Patch-For-Review, Ganeti

Thu, Jul 4

cmooney updated the task description for T339850: Configure QoS marking and policy across network.

Thu, Jul 4, 9:25 PM · Patch-For-Review, netops, Infrastructure-Foundations, SRE

cmooney created P65828 Google vs WMF NS.

Thu, Jul 4, 2:30 PM

cmooney created P65809 (An Untitled Masterwork).

Thu, Jul 4, 12:32 PM

cmooney closed T367439: No unicast IP ranges announced to peers from eqdfw as Resolved.

All seems good with the policy changes now, closing task.

Thu, Jul 4, 11:27 AM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T367512: Get test host connected to codfw row c/d lsw's.

All is working well on the test-host. Well puppet was giving me a headache but I just skipped all that :)

Thu, Jul 4, 11:27 AM · DC-Ops, ops-codfw, SRE, netops, Infrastructure-Foundations

cmooney removed a subtask for T364095: Codfw row C/D switch installation & configuration: T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 4, 11:20 AM · DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a subtask for T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW: T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 4, 11:20 AM · netops, Infrastructure-Foundations, SRE

cmooney edited parent tasks for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches, added: T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW; removed: T364095: Codfw row C/D switch installation & configuration.

Thu, Jul 4, 11:19 AM · netops, Infrastructure-Foundations, SRE

cmooney added a subtask for T364095: Codfw row C/D switch installation & configuration: T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW.

Thu, Jul 4, 11:19 AM · DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW: T364095: Codfw row C/D switch installation & configuration.

Thu, Jul 4, 11:19 AM · netops, Infrastructure-Foundations, SRE

cmooney triaged T369274: Move IP gateways for codfw row C/D vlans to EVPN Anycast GW as Medium priority.

Thu, Jul 4, 11:18 AM · netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 4, 10:47 AM · netops, Infrastructure-Foundations, SRE

cmooney updated the task description for T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

Thu, Jul 4, 10:45 AM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T366941: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches.

@Papaul sorry I meant to get back to you sooner. I've made decent progress on T369106 and managed to test reimage working ok in one of the new racks with the test server, so we are in a good position.

Thu, Jul 4, 10:45 AM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T360789: codfw row C/D upgrade racking task.

In T360789#9941103, @Papaul wrote:

All the cabling is done. I am leaving this task open so when we move the console cables from asw-c*/d*-codfw to ssw1-* and lsw1-* I can update the task and resolve it.

Thu, Jul 4, 10:40 AM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

cmooney added a comment to T367439: No unicast IP ranges announced to peers from eqdfw.

Ok change merged, we are now announcing codfw ranges from eqord again:

cmooney@cr2-eqord> show route advertising-protocol bgp 192.80.17.197

Thu, Jul 4, 9:58 AM · Infrastructure-Foundations, netops, SRE

Wed, Jul 3

cmooney closed T369238: Should we add links between our spine switches aggregating each row of two? as Invalid.

Wed, Jul 3, 10:51 PM

cmooney updated subscribers of T369238: Should we add links between our spine switches aggregating each row of two?.

Wed, Jul 3, 10:47 PM

cmooney triaged T369238: Should we add links between our spine switches aggregating each row of two? as Low priority.

Wed, Jul 3, 10:37 PM

cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

So one thing I noticed is that we are not getting the stats for LAG/ae interfaces with the current setup, nor routed sub-interface stats.

Wed, Jul 3, 6:11 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney closed T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad, a subtask of T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2, as Resolved.

Wed, Jul 3, 5:14 PM · Infrastructure-Foundations, netops, SRE

cmooney closed T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad as Resolved.

Wed, Jul 3, 5:14 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney added a comment to T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.

Switch is back up, all looks good at first glance from the network side.

Wed, Jul 3, 2:22 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney closed T316358: Create Quality of Service design for WMF internal networks as Resolved.

Gonna close this one as the design is finalised, see detail on wikitech here:

Wed, Jul 3, 9:48 AM · SRE, Infrastructure-Foundations, netops

Tue, Jul 2

cmooney added a comment to T367512: Get test host connected to codfw row c/d lsw's.

Also @Jhancock.wm when next on site can you check the mgmt / idrac connection for this one? It doesn't seem to be trying to get an IP by DHCP, and the old IP from when it was mw2289 isn't working either.

Tue, Jul 2, 8:48 PM · DC-Ops, ops-codfw, SRE, netops, Infrastructure-Foundations

cmooney updated the task description for T369106: Add new elements to automation to support new switches in codfw C/D.

Tue, Jul 2, 8:15 PM · netops, Infrastructure-Foundations, SRE

cmooney added a subtask for T364095: Codfw row C/D switch installation & configuration: T369106: Add new elements to automation to support new switches in codfw C/D.

Tue, Jul 2, 7:10 PM · DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE

cmooney added a parent task for T369106: Add new elements to automation to support new switches in codfw C/D: T364095: Codfw row C/D switch installation & configuration.

Tue, Jul 2, 7:10 PM · netops, Infrastructure-Foundations, SRE

cmooney triaged T369106: Add new elements to automation to support new switches in codfw C/D as Medium priority.

Tue, Jul 2, 7:09 PM · netops, Infrastructure-Foundations, SRE

cmooney added a comment to T367512: Get test host connected to codfw row c/d lsw's.

@Jhancock.wm can you confirm what position in the rack the server is in?

Tue, Jul 2, 6:28 PM · DC-Ops, ops-codfw, SRE, netops, Infrastructure-Foundations

cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

All seems ok following the increase:

Tue, Jul 2, 3:12 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney added a comment to T367512: Get test host connected to codfw row c/d lsw's.

In T367512#9945244, @Jhancock.wm wrote:

@cmooney got sretest2002 on lsw-d4, ports 44 and 45. 10G card.

Tue, Jul 2, 2:58 PM · DC-Ops, ops-codfw, SRE, netops, Infrastructure-Foundations

cmooney updated the task description for T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.

Tue, Jul 2, 2:15 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad.

Tue, Jul 2, 1:59 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T348977: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2.

Tue, Jul 2, 1:58 PM · Infrastructure-Foundations, netops, SRE

cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

So the change to the timeout has made a big difference, but there are still some small gaps:

Tue, Jul 2, 9:23 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney closed T363341: Q4:rack/setup/install cloudcephosd10[39-41] as Resolved.

Tue, Jul 2, 9:15 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

cmooney added a comment to T363341: Q4:rack/setup/install cloudcephosd10[39-41].

In T363341#9936269, @Jclark-ctr wrote:

cloudcephosd1039
2nd cable serial#20220008 port 1
cloudcephosd1040
2nd cable serial#20220043 port 5
cloudcephosd1041
2nd cable serial#20220011 port 7

Tue, Jul 2, 9:10 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

cmooney reopened T363341: Q4:rack/setup/install cloudcephosd10[39-41] as "Open".

Tue, Jul 2, 8:59 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

Sun, Jun 30

cmooney added a comment to T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.

Folks just FYI I've pushed the time here back an hour if that's ok, seems to suit most best.

Sun, Jun 30, 3:27 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney updated the task description for T365994: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e2-eqiad.

Sun, Jun 30, 3:26 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

Fri, Jun 28

cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

I may have spoken too soon when I said things were working fine. It seems in codfw since the change we are only getting stats some of the time:

Fri, Jun 28, 6:24 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney added a comment to T326322: Add per-output queue monitoring for Juniper network devices.

@fgiunchedi I was perhaps a little cheeky and merged this, but it was clear the volume of new metrics was well within what you'd said before was ok. Everything working nicely I'm glad to say.

Fri, Jun 28, 12:34 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops

cmooney added a comment to T368544: IPIP encapsulation considerations for low-traffic services.

In T368544#9933423, @ayounsi wrote:

An ip route 0/0 rule would be needed to "clamp" the outbound MTU or MSS (using mtu lock, or advmss for example) on any client or realserver with a jumbo MTU.

Fri, Jun 28, 10:28 AM · Infrastructure-Foundations, serviceops, netops, Traffic

cmooney closed T365988: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e7-eqiad as Resolved.

Thanks all for the help with this one!

Fri, Jun 28, 9:13 AM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE

cmooney (Cathal Mooney)
SRE (netops)

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent Activity
View All

Today

Yesterday

Tue, Jul 16

Fri, Jul 12

Thu, Jul 11

Wed, Jul 10

Tue, Jul 9

Mon, Jul 8

Fri, Jul 5

Thu, Jul 4

Wed, Jul 3

Tue, Jul 2

Sun, Jun 30

Fri, Jun 28

cmooney (Cathal Mooney)SRE (netops)

Projects

Calendar

Today

Tomorrow

Saturday

User Details

Recent ActivityView All

Today

Yesterday

Tue, Jul 16

Fri, Jul 12

Thu, Jul 11

Wed, Jul 10

Tue, Jul 9

Mon, Jul 8

Fri, Jul 5

Thu, Jul 4

Wed, Jul 3

Tue, Jul 2

Sun, Jun 30

Fri, Jun 28

cmooney (Cathal Mooney)
SRE (netops)

Recent Activity
View All