CFSSL based pki solution used for auto provisioning TLS material. Docs: https://wikitech.wikimedia.org/wiki/PKI
(Requested in T281371)
CFSSL based pki solution used for auto provisioning TLS material. Docs: https://wikitech.wikimedia.org/wiki/PKI
(Requested in T281371)
Yep it's all good ! I manually added the host to gNMIc and metrics are properly being collected/exposed. Thanks !
elukey@cumin1002:~$ sudo cookbook sre.network.tls lsw1-d1-codfw Acquired lock for key /spicerack/locks/cookbooks/sre.network.tls: {'concurrency': 20, 'created': '2024-07-17 07:36:21.455628', 'owner': 'elukey@cumin1002 [964742]', 'ttl': 1800} START - Cookbook sre.network.tls for network device lsw1-d1-codfw lsw1-d1-codfw: ❌ Can't connect to device, assuming initial bootstrap. lsw1-d1-codfw: 🔏 cfssl called with operation: gencert. lsw1-d1-codfw: ⚙️ Deploy needed. lsw1-d1-codfw: 👍 All done. Released lock for key /spicerack/locks/cookbooks/sre.network.tls: {'concurrency': 20, 'created': '2024-07-17 07:36:21.455628', 'owner': 'elukey@cumin1002 [964742]', 'ttl': 1800} END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw
Change #1054618 merged by Elukey:
[operations/cookbooks@master] sre.network.tls: use a different client certificate to authenticate
Change #1054618 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/cookbooks@master] sre.network.tls: use a different client certificate to authenticate
I managed to get the certificate via:
In T355750#9976497, @elukey wrote:Completely different use case: traffic-cache-upload-bullseye.traffic.eqiad1.wikimedia.cloud
In there purged needs to get a cfssl discovery cert, but I see the same error reported in the task's description. I suspect this could be related to mutual-tls-cert and mutual-tls-key, since when I try to run the correspondent cfssl gencert command I get the same error with/without those options.
Change #1053937 merged by Elukey:
[operations/puppet@production] pki: add the Traffic's project Puppet CA to client_auth_CA.pem in cloud
After some digging, it seems to me that the issue is httpd on pki1001: it rejects the client authentication from cumin1002. I added a bit more logging to the mod-ssl module, and this is what I see in the ssl error log:
Change #1054289 merged by Elukey:
[operations/puppet@production] profile::pki::multirootca: use info in the client auth vhost
Change #1054289 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/puppet@production] profile::pki::multirootca: use info in the client auth vhost
I tried to validate client_auth_CA.pem on pkiXXXX and it looks good (allowing Puppet5/7/PKI client certs), so it must be something client-cert related but I am still missing what.
Change #1053937 had a related patch set uploaded (by Elukey; author: Elukey):
[operations/puppet@production] pki: add the Traffic's project Puppet CA to client_auth_CA.pem in cloud
Completely different use case: traffic-cache-upload-bullseye.traffic.eqiad1.wikimedia.cloud
Actually something tangentially related, lsw1-a1-codfw had a cert generated previously. This switch has now been moved and re-purposed with a new name, lsw1-d1-codfw.
In T355750#9825850, @CDanis wrote:Hi Arzhel, for when I do have time to look at this, do you have a recommended way of reproducing without breaking anything or potentially actually affecting a network device?
sudo cookbook sre.network.tls --system lsw1-f8-eqiad
Hi Arzhel, for when I do have time to look at this, do you have a recommended way of reproducing without breaking anything or potentially actually affecting a network device?
As data point, same error today with cumin1002:~$ sudo cookbook sre.network.tls lsw1-d1-codfw
Change 993099 merged by Muehlenhoff:
[operations/puppet@production] Remove obsolete Hiera entries for Ganeti PKI support
Change 993099 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Remove obsolete Hiera entries for Ganeti PKI support
This is complete
Change 981301 merged by Muehlenhoff:
[labs/private@master] Remove obsolete dummy certs
Change 981301 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[labs/private@master] Remove obsolete dummy certs
Change 981285 merged by Muehlenhoff:
[operations/puppet@production] Remove now obsolete cergen Ganeti certs
Change 981285 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] Remove now obsolete cergen Ganeti certs
Change 979897 merged by Muehlenhoff:
[operations/puppet@production] ganeti: Remove non-PKI code for RAPI access
Change 979901 merged by Muehlenhoff:
[labs/private@master] Remove ganeti RAPI dummy certs
Change 979901 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[labs/private@master] Remove ganeti RAPI dummy certs
Change 979897 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] ganeti: Remove non-PKI code for RAPI access
Change 979890 merged by Muehlenhoff:
[operations/puppet@production] ganeti: Configure eqiad/test for PKI
Change 979890 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] ganeti: Configure eqiad/test for PKI
Change 979838 merged by Muehlenhoff:
[operations/puppet@production] ganeti: Switch eqiad to PKI