I have a server that started out on Ubuntu 16.04. I'm trying to get it current so I did an upgrade from 16.04 to 18.04 (Which was mostly trouble free) I let that run for a day to make sure everything worked as expected and then I upgraded to 20.04.. And that upgrade has not gone so smoothly.
I have resolved most of the issues with that upgrade (kernel panics, MySQL 5.7 to mariadb incompatibilities, etc) but the one thing I cannot figure out is why my UFW rules are behaving differently with my kubernetes (microk8s) install.
I have a few pods running in microk8s. These pods ran fine under 16.04 and 18.04. But once I upgraded to 20.04 they lost their ability to reach my DNS server (Which is dnsmasq running on the same host)
I have microk8s configured to forward to 192.168.0.50 (My hosts main local IP address). Again this worked fine prior to the upgrade to 20.04. I need the pods to be able to access this local DNS server as there are specific services on my network that only this DNS server knows about. I have disabled the built in systemd-resolver in favor of dnsmasq.
Also note that 192.168.0.50 is exposed to the public internet (My router has it's DMZ pointed to 192.168.0.50) as I run my own web server, mail server, etc. So I want DNS exposed to any internal interfaces and/or IPs but I do not want it exposed to public IPs from the internet.
If I check netstat it looks like dnsmasq is listening on all of the correct interfaces:
# netstat -lepn | grep ":53 " | grep -v tcp6 | grep -v udp6
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 0 10627970 3213753/dnsmasq
tcp 0 0 192.168.0.50:53 0.0.0.0:* LISTEN 0 10627968 3213753/dnsmasq
tcp 0 0 192.168.123.1:53 0.0.0.0:* LISTEN 0 10627966 3213753/dnsmasq
tcp 0 0 172.18.0.1:53 0.0.0.0:* LISTEN 0 10627964 3213753/dnsmasq
tcp 0 0 172.17.0.1:53 0.0.0.0:* LISTEN 0 10627962 3213753/dnsmasq
tcp 0 0 10.1.206.192:53 0.0.0.0:* LISTEN 0 10627960 3213753/dnsmasq
udp 0 0 127.0.0.1:53 0.0.0.0:* 0 10627969 3213753/dnsmasq
udp 0 0 192.168.0.50:53 0.0.0.0:* 0 10627967 3213753/dnsmasq
udp 0 0 192.168.123.1:53 0.0.0.0:* 0 10627965 3213753/dnsmasq
udp 0 0 172.18.0.1:53 0.0.0.0:* 0 10627963 3213753/dnsmasq
udp 0 0 172.17.0.1:53 0.0.0.0:* 0 10627961 3213753/dnsmasq
udp 0 0 10.1.206.192:53 0.0.0.0:* 0 10627959 3213753/dnsmasq
Now after the upgrade I see UFW appears to be blocking DNS requests from the k8s network interfaces:
Apr 28 12:20:37 server kernel: [ 58.148877] [UFW BLOCK] IN=cali00ea8fe43e0 OUT=enp1s0f0 MAC=ee:ee:ee:ee:ee:ee:fa:cd:4a:b9:17:d8:08:00 SRC=10.1.206.231 DST=10.1.206.232 LEN=86 TOS=0x00 PREC=0x00 TTL=63 ID=20903 DF PROTO=UDP SPT=49282 DPT=53 LEN=66
Apr 28 12:20:37 server kernel: [ 58.148898] [UFW BLOCK] IN=cali00ea8fe43e0 OUT=enp1s0f0 MAC=ee:ee:ee:ee:ee:ee:fa:cd:4a:b9:17:d8:08:00 SRC=10.1.206.231 DST=10.1.206.232 LEN=86 TOS=0x00 PREC=0x00 TTL=63 ID=53947 DF PROTO=UDP SPT=45603 DPT=53 LEN=66
Where cali00ea8fe43e0
is one of the network interfaces for my k8s cluster.
If I look at microk8s core-dns log I see lines like this:
[ERROR] plugin/errors: 2 8297669929822044709.913121715873788735. HINFO: read udp 10.1.206.203:45629->192.168.0.50:53: i/o timeout
So it certainly seems to me like UFW is blocking this traffic. The only issue is I cannot figure out why it stopped working exactly. I have a backup copy of my /etc/ufw/user.rules
file and the upgrade appears to have added these lines:
### tuple ### allow any 30080 0.0.0.0/0 any 0.0.0.0/0 in
-A ufw-user-input -p tcp --dport 30080 -j ACCEPT
-A ufw-user-input -p udp --dport 30080 -j ACCEPT
### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 in_vxlan.calico
-A ufw-user-input -i vxlan.calico -j ACCEPT
### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 out_vxlan.calico
-A ufw-user-output -o vxlan.calico -j ACCEPT
### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 in_cali+
-A ufw-user-input -i cali+ -j ACCEPT
### tuple ### allow any any 0.0.0.0/0 any 0.0.0.0/0 out_cali+
-A ufw-user-output -o cali+ -j ACCEPT
I did not manually add these, they were added automatically during the upgrade. Perhaps by moving to a newer version of microk8s? These new rules are all tied to my k8s cluster in one way or another.
These lines show up in ufw status numbered
like so:
# ufw status numbered
WARN: Duplicate profile 'Dovecot IMAP', using last found
WARN: Duplicate profile 'Dovecot Secure IMAP', using last found
WARN: Duplicate profile 'Dovecot POP3', using last found
WARN: Duplicate profile 'Dovecot Secure POP3', using last found
Status: active
To Action From
-- ------ ----
[ 1] 22 ALLOW IN Anywhere
[ 2] 80 ALLOW IN Anywhere
[ 3] 6969 ALLOW IN Anywhere
[ 4] 9696 ALLOW IN Anywhere
[ 5] 110 ALLOW IN Anywhere
[ 6] 143 ALLOW IN Anywhere
[ 7] 25 ALLOW IN Anywhere
[ 8] 993 ALLOW IN Anywhere
[ 9] 995 ALLOW IN Anywhere
[10] 9300:9599/tcp ALLOW IN Anywhere
[11] Apache Full ALLOW IN Anywhere
[12] Anywhere ALLOW IN 192.168.0.0/16
[13] 192.168.1.255 53 ALLOW IN 192.168.1.0 53
[14] 127.0.1.1 53 ALLOW IN 127.0.0.1 53
[15] 30080 ALLOW IN Anywhere
[16] Anywhere on vxlan.calico ALLOW IN Anywhere
[17] Anywhere ALLOW OUT Anywhere on vxlan.calico (out)
[18] Anywhere on cali+ ALLOW IN Anywhere
[19] Anywhere ALLOW OUT Anywhere on cali+ (out)
What I'm assuming has happened here is that microk8s networking has changed in some fundamental way. I don't remember all of these "calico" and "cali+" interfaces before. So microk8s must have added some firewall rules during the upgrade to allow pods to get outbound Internet access... But it looks like the rules may not allow the pods to access "Anywhere" on my host server.
IE: if I'm reading this correctly: Line 16 ALLOWs IN any traffic destined for "Anywhere on vxlan.calico" And line 17 ALLOWs OUT to "Anywhere" from Anywhere on vxlan.calico
But neither of these seems to "ALLOW IN" traffic FROM vxlan.calico? (Which IMO should be allowed).
I've been trying to figure out how to alter that line in user.rules
to ALLOW IN any/all traffic from vxlan.calico and cali+ interfaces but I have not had any luck in finding the right syntax to do that. I suppose I could just allow all traffic from 10.0.0.0/8 since that appears to be the CIDR used by all of those interfaces but I'm not sure if that's the "correct" way to do it.
So to summarize:
Main question #1: Is there a way to allow traffic IN to any port from a specific interface. IE: I want to add a line that shows up in UFW status like:
To Action From
-- ------ ----
[20] Anywhere ALLOW **IN** Anywhere on vxlan.calico
Basically identical to rule #17 except the "Action" is ALLOW IN instead of ALLOW OUT
Main question #2: Am I missing some other fundamental issue here? Seems like upgrading from 18.04 to 20.04 should not have caused this much of a headache. Did I miss a fundamental step somewhere along the way?
EDIT #1: So maybe it's not UFW after all... I just completely disabled UFW and I'm still seeing this traffic blocked. Is this being blocked within microk8s itself?
I am at a total loss here. What am I missing?
I checked and I can connect to other ports on the host server (192.168.0.50) just fine from my k8s pods. It's only requests to port 53 that are getting hung up.
For example: running curl -v http://192.168.0.50
on one of the k8s pods returns content from my web server just fine.
So my next thought was maybe it's a UDP specific issue but trying TCP for DNS doesn't work either:
$ kubectl exec -i -t dnsutils -- dig cnn.com @192.168.0.50 +tcp
;; communications error to 192.168.0.50#53: connection reset
But from the host the same dig command with +tcp returns exactly as expected.
I also checked using tcpdump: tcpdump -i any -XX port 53
And when I call dig from one of the k8s pods I see the request in tcpdump output, but no response:
14:55:09.849245 IP 10.1.206.251.53235 > 192.168.0.50.domain: 27889+ [1au] A? cnn.com. (36)
When I execute the same dig command from another machine on my local network, I see the request and the response:
14:56:45.430750 IP 192.168.0.233.36836 > 192.168.0.50.domain: 43090+ [1au] A? cnn.com. (48)
14:56:45.447998 IP 192.168.0.50.domain > 192.168.0.233.36836: 43090 4/0/1 A 151.101.131.5, A 151.101.195.5, A 151.101.3.5, A 151.101.67.5 (100)
So I enabled logging in dnsmasq and it logs entries when I make dns requests from the server itself or from other machines on my network but it does NOT log anything at all when I make requests from the k8s pods.
So tcpdump
sees the DNS request packet from the k8s pod, but dnsmasq
doesn't seem to be receiving it? How is that possible?
Editted to add: Well this might be the clue I needed:
# grep "ignoring query from non-local network" /var/log/syslog
Apr 29 07:59:25 server dnsmasq[1540970]: ignoring query from non-local network 10.1.206.240
Apr 29 07:59:30 server dnsmasq[1541198]: ignoring query from non-local network 10.1.206.240
Apr 29 12:43:20 server dnsmasq[2803264]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 13:56:41 server dnsmasq[3122204]: ignoring query from non-local network 10.1.206.240
Apr 29 14:04:56 server dnsmasq[3157857]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 14:40:45 server dnsmasq[3313551]: ignoring query from non-local network 10.1.206.203 (logged only once)
Apr 29 14:51:24 server dnsmasq[3360405]: ignoring query from non-local network 10.1.206.251
Apr 29 14:52:09 server dnsmasq[3363587]: ignoring query from non-local network 10.1.206.251
Apr 29 15:47:35 server dnsmasq[3603248]: ignoring query from non-local network 10.1.206.203 (logged only once)
Because those are "logged only once" I wasn't seeing them... sigh