talos linux k8s DNS issue

Question

We are using talos linux to set up k8s clusters in VMware. It's working fine on one cluster on a vmware host but on another everything works except DNS INSIDE pods/containers.

I've enabled DNS on all hosts and talosctl "get resolvers" and "dnsupstream" all give the correct dns data

NODE             NAMESPACE   TYPE             ID          VERSION   RESOLVERS
192.168.130.82   network     ResolverStatus   resolvers   2         ["10.203.32.2","10.203.32.3"]

talosctl -n 192.168.130.${ip} -e 192.168.130.${ip} get dnsupstream 
NODE             NAMESPACE   TYPE          ID            VERSION   HEALTHY   ADDRESS
192.168.130.80   network     DNSUpstream   10.203.32.2   1         true      10.203.32.2:53
192.168.130.80   network     DNSUpstream   10.203.32.3   1         true      10.203.32.3:53

Yet when I fire up a "curlimages/curl" pod and curl to an internal server it works by IP but resolve hostname does not work.

~ $ curl http://10.203.32.90:32005
RABBITMQ API 1.0.15~ $
~ $ curl http://rabbit.domain.com:32005
curl: (6) Could not resolve host: rabbit.domain.com

According to the docs resolv.conf should have 10.96.0.9 on all nodes and it has.

talosctl read /system/resolved/resolv.conf
nameserver 10.96.0.9

This works on another talos cluster (where their ip's are in the same network as the dns servers) so I have no idea how to even debug anymore or how to fix it

Another testing by starting a dnsutils pod with the image

image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3

and seeing that it started on node 192.168.130.15, I checked the dns logs there.

And they show gcr.io:

192.168.130.15: 2024-07-04T07:22:00.210Z DEBUG dns response {"component": "dns-resolve-cache", "data": ";; opcode: QUERY, status: NOERROR, id: 7044\n;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1\n\n;; OPT PSEUDOSECTION:\n; EDNS: version 0; flags:; udp: 1232\n\n;; QUESTION SECTION:\n;gcr.io.\tIN\t A\n\n;; ANSWER SECTION:\ngcr.io.\t300\tIN\tA\t142.250.102.82\n"}.

So DNS works on the node itself and images are pulled

yet dnsutils itself errors out:

kubectl exec dnsutils -it -- nslookup google.com
;; connection timed out; no servers could be reached

On the working cluster

kubectl exec dnsutils -it -- nslookup google.com
;; connection timed out; no servers could be reached
kubectl exec dnsutils -it -- nslookup -query=any google.com
Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
google.com      nameserver = ns1.google.com.
google.com      nameserver = ns3.google.com.

On the failing cluster

kubectl exec dnsutils -it -- nslookup -query=any google.com
;; Connection to 10.96.0.10#53(10.96.0.10) for google.com failed: timed out.
;; Connection to 10.96.0.10#53(10.96.0.10) for google.com failed: timed out.

How can I debug further to see why DNS inside pods is not working? Or how to fix this if anyone knows?

Serve Laurijssen · Accepted Answer · 2024-07-04 11:02:00Z

By using traceroute, it turns out to be a firewall block for port 53, traceroute gets to 172.17.252 and beyond on the working cluster.252 is a firewall VM.

/ # traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
 1  10.244.6.1 (10.244.6.1)  0.022 ms  0.031 ms  0.019 ms
 2  10.203.32.254 (10.203.32.254)  0.755 ms  1.011 ms  0.639 ms
 3  172.17.0.252 (172.17.0.252)  1.515 ms  0.854 ms  0.730 ms

and it blocks on the failing cluster

/ # traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
 1  10.244.5.1 (10.244.5.1)  0.013 ms  0.010 ms  0.002 ms
 2  192.168.130.254 (192.168.130.254)  0.550 ms  0.706 ms  0.009 ms
 3  *  *  *

Stack Exchange Network

talos linux k8s DNS issue

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
kubernetes
dns
talos
or ask your own question.

Hot Network Questions

talos linux k8s DNS issue

1 Answer 1

Not the answer you're looking for? Browse other questions tagged kubernetesdnstalos or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
kubernetes
dns
talos
or ask your own question.