1

I have a server that started out on Ubuntu 16.04. I'm trying to get it current so I did an upgrade from 16.04 to 18.04 (Which was mostly trouble free) I let that run for a day to make sure everything worked as expected and then I upgraded to 20.04.. And that upgrade has not gone so smoothly.

I have resolved most of the issues with that upgrade (kernel panics, MySQL 5.7 to mariadb incompatibilities, etc) but the one thing I cannot figure out is why my UFW rules are behaving differently with my kubernetes (microk8s) install.

I have a few pods running in microk8s. These pods ran fine under 16.04 and 18.04. But once I upgraded to 20.04 they lost their ability to reach my DNS server (Which is dnsmasq running on the same host)

I have microk8s configured to forward to 192.168.0.50 (My hosts main local IP address). Again this worked fine prior to the upgrade to 20.04. I need the pods to be able to access this local DNS server as there are specific services on my network that only this DNS server knows about. I have disabled the built in systemd-resolver in favor of dnsmasq.

Also note that 192.168.0.50 is exposed to the public internet (My router has it's DMZ pointed to 192.168.0.50) as I run my own web server, mail server, etc. So I want DNS exposed to any internal interfaces and/or IPs but I do not want it exposed to public IPs from the internet.

If I check netstat it looks like dnsmasq is listening on all of the correct interfaces:

# netstat -lepn | grep ":53 " | grep -v tcp6 | grep -v udp6
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      0          10627970   3213753/dnsmasq     
tcp        0      0 192.168.0.50:53         0.0.0.0:*               LISTEN      0          10627968   3213753/dnsmasq     
tcp        0      0 192.168.123.1:53        0.0.0.0:*               LISTEN      0          10627966   3213753/dnsmasq     
tcp        0      0 172.18.0.1:53           0.0.0.0:*               LISTEN      0          10627964   3213753/dnsmasq     
tcp        0      0 172.17.0.1:53           0.0.0.0:*               LISTEN      0          10627962   3213753/dnsmasq     
tcp        0      0 10.1.206.192:53         0.0.0.0:*               LISTEN      0          10627960   3213753/dnsmasq     
udp        0      0 127.0.0.1:53            0.0.0.0:*                           0          10627969   3213753/dnsmasq     
udp        0      0 192.168.0.50:53         0.0.0.0:*                           0          10627967   3213753/dnsmasq     
udp        0      0 192.168.123.1:53        0.0.0.0:*                           0          10627965   3213753/dnsmasq     
udp        0      0 172.18.0.1:53           0.0.0.0:*                           0          10627963   3213753/dnsmasq     
udp        0      0 172.17.0.1:53           0.0.0.0:*                           0          10627961   3213753/dnsmasq     
udp        0      0 10.1.206.192:53         0.0.0.0:*                           0          10627959   3213753/dnsmasq     

Now after the upgrade I see UFW appears to be blocking DNS requests from the k8s network interfaces:

Apr 28 12:20:37 server kernel: [   58.148877] [UFW BLOCK] IN=cali00ea8fe43e0 OUT=enp1s0f0 MAC=ee:ee:ee:ee:ee:ee:fa:cd:4a:b9:17:d8:08:00 SRC=10.1.206.231 DST=10.1.206.232 LEN=86 TOS=0x00 PREC=0x00 TTL=63 ID=20903 DF PROTO=UDP SPT=49282 DPT=53 LEN=66
Apr 28 12:20:37 server kernel: [   58.148898] [UFW BLOCK] IN=cali00ea8fe43e0 OUT=enp1s0f0 MAC=ee:ee:ee:ee:ee:ee:fa:cd:4a:b9:17:d8:08:00 SRC=10.1.206.231 DST=10.1.206.232 LEN=86 TOS=0x00 PREC=0x00 TTL=63 ID=53947 DF PROTO=UDP SPT=45603 DPT=53 LEN=66

Where cali00ea8fe43e0 is one of the network interfaces for my k8s cluster.

If I look at microk8s core-dns log I see lines like this:

[ERROR] plugin/errors: 2 8297669929822044709.913121715873788735. HINFO: read udp 10.1.206.203:45629->192.168.0.50:53: i/o timeout

So it certainly seems to me like UFW is blocking this traffic. The only issue is I cannot figure out why it stopped working exactly. I have a backup copy of my /etc/ufw/user.rules file and the upgrade appears to have added these lines:

### tuple ### allow any 30080 0.0.0.0/0 any 0.0.0.0/0 in
-A ufw-user-input -p tcp --dport 30080 -j ACCEPT
-A ufw-user-input -p udp --dport 30080 -j ACCEPT

### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 in_vxlan.calico
-A ufw-user-input -i vxlan.calico -j ACCEPT

### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 out_vxlan.calico
-A ufw-user-output -o vxlan.calico -j ACCEPT

### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 in_cali+
-A ufw-user-input -i cali+ -j ACCEPT

### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 out_cali+
-A ufw-user-output -o cali+ -j ACCEPT

I did not manually add these, they were added automatically during the upgrade. Perhaps by moving to a newer version of microk8s? These new rules are all tied to my k8s cluster in one way or another.

These lines show up in ufw status numbered like so:

# ufw status numbered
WARN: Duplicate profile 'Dovecot IMAP', using last found
WARN: Duplicate profile 'Dovecot Secure IMAP', using last found
WARN: Duplicate profile 'Dovecot POP3', using last found
WARN: Duplicate profile 'Dovecot Secure POP3', using last found
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] 22                         ALLOW IN    Anywhere                  
[ 2] 80                         ALLOW IN    Anywhere                  
[ 3] 6969                       ALLOW IN    Anywhere                  
[ 4] 9696                       ALLOW IN    Anywhere                  
[ 5] 110                        ALLOW IN    Anywhere                  
[ 6] 143                        ALLOW IN    Anywhere                  
[ 7] 25                         ALLOW IN    Anywhere                  
[ 8] 993                        ALLOW IN    Anywhere                  
[ 9] 995                        ALLOW IN    Anywhere                  
[10] 9300:9599/tcp              ALLOW IN    Anywhere                  
[11] Apache Full                ALLOW IN    Anywhere                  
[12] Anywhere                   ALLOW IN    192.168.0.0/16            
[13] 192.168.1.255 53           ALLOW IN    192.168.1.0 53            
[14] 127.0.1.1 53               ALLOW IN    127.0.0.1 53              
[15] 30080                      ALLOW IN    Anywhere                  
[16] Anywhere on vxlan.calico   ALLOW IN    Anywhere                  
[17] Anywhere                   ALLOW OUT   Anywhere on vxlan.calico   (out)
[18] Anywhere on cali+          ALLOW IN    Anywhere                  
[19] Anywhere                   ALLOW OUT   Anywhere on cali+          (out)

What I'm assuming has happened here is that microk8s networking has changed in some fundamental way. I don't remember all of these "calico" and "cali+" interfaces before. So microk8s must have added some firewall rules during the upgrade to allow pods to get outbound Internet access... But it looks like the rules may not allow the pods to access "Anywhere" on my host server.

IE: if I'm reading this correctly: Line 16 ALLOWs IN any traffic destined for "Anywhere on vxlan.calico" And line 17 ALLOWs OUT to "Anywhere" from Anywhere on vxlan.calico

But neither of these seems to "ALLOW IN" traffic FROM vxlan.calico? (Which IMO should be allowed).

I've been trying to figure out how to alter that line in user.rules to ALLOW IN any/all traffic from vxlan.calico and cali+ interfaces but I have not had any luck in finding the right syntax to do that. I suppose I could just allow all traffic from 10.0.0.0/8 since that appears to be the CIDR used by all of those interfaces but I'm not sure if that's the "correct" way to do it.

So to summarize:

Main question #1: Is there a way to allow traffic IN to any port from a specific interface. IE: I want to add a line that shows up in UFW status like:

     To                         Action      From
     --                         ------      ----
[20] Anywhere                   ALLOW **IN**   Anywhere on vxlan.calico

Basically identical to rule #17 except the "Action" is ALLOW IN instead of ALLOW OUT

Main question #2: Am I missing some other fundamental issue here? Seems like upgrading from 18.04 to 20.04 should not have caused this much of a headache. Did I miss a fundamental step somewhere along the way?

EDIT #1: So maybe it's not UFW after all... I just completely disabled UFW and I'm still seeing this traffic blocked. Is this being blocked within microk8s itself?

I am at a total loss here. What am I missing?

I checked and I can connect to other ports on the host server (192.168.0.50) just fine from my k8s pods. It's only requests to port 53 that are getting hung up.

For example: running curl -v http://192.168.0.50 on one of the k8s pods returns content from my web server just fine.

So my next thought was maybe it's a UDP specific issue but trying TCP for DNS doesn't work either:

$ kubectl exec -i -t dnsutils -- dig cnn.com @192.168.0.50 +tcp
;; communications error to 192.168.0.50#53: connection reset

But from the host the same dig command with +tcp returns exactly as expected.

I also checked using tcpdump: tcpdump -i any -XX port 53 And when I call dig from one of the k8s pods I see the request in tcpdump output, but no response:

14:55:09.849245 IP 10.1.206.251.53235 > 192.168.0.50.domain: 27889+ [1au] A? cnn.com. (36)

When I execute the same dig command from another machine on my local network, I see the request and the response:

14:56:45.430750 IP 192.168.0.233.36836 > 192.168.0.50.domain: 43090+ [1au] A? cnn.com. (48)
14:56:45.447998 IP 192.168.0.50.domain > 192.168.0.233.36836: 43090 4/0/1 A 151.101.131.5, A 151.101.195.5, A 151.101.3.5, A 151.101.67.5 (100)

So I enabled logging in dnsmasq and it logs entries when I make dns requests from the server itself or from other machines on my network but it does NOT log anything at all when I make requests from the k8s pods.

So tcpdump sees the DNS request packet from the k8s pod, but dnsmasq doesn't seem to be receiving it? How is that possible?


Editted to add: Well this might be the clue I needed:

# grep "ignoring query from non-local network" /var/log/syslog
Apr 29 07:59:25 server dnsmasq[1540970]: ignoring query from non-local network 10.1.206.240
Apr 29 07:59:30 server dnsmasq[1541198]: ignoring query from non-local network 10.1.206.240
Apr 29 12:43:20 server dnsmasq[2803264]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 13:56:41 server dnsmasq[3122204]: ignoring query from non-local network 10.1.206.240
Apr 29 14:04:56 server dnsmasq[3157857]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 14:40:45 server dnsmasq[3313551]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 14:51:24 server dnsmasq[3360405]: ignoring query from non-local network 10.1.206.251
Apr 29 14:52:09 server dnsmasq[3363587]: ignoring query from non-local network 10.1.206.251
Apr 29 15:47:35 server dnsmasq[3603248]: ignoring query from non-local network 10.1.206.203 (logged only once)

Because those are "logged only once" I wasn't seeing them... sigh

1 Answer 1

1

To answer my own question: It had nothing to do with UFW or microk8s and everything to do with the upgrade to 20.04 adding a new default flag to /etc/init.d/dnsmasq :

DNSMASQ_OPTS="$DNSMASQ_OPTS --local-service"

Removing --local-service from this file fixed the issue.

I cannot believe how many hours I've wasted tracking this down. I wonder if this is a bug, because the 10.0.0.0/8 CIDR is technically considered a private/local IP range, so I'm not sure how/why dnsmasq concluded that those particular IPs are "not local" when they are in fact quite local.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .