2

I have a following mangle table rule in a lab server which marks UDP traffic with 1 if the destination address is 6.6.6.6:

$ sudo iptables -t mangle -L PREROUTING 2 -v -n --line-numbers
2       17   884 MARK       udp  --  ge-0.0.0-Iosv6 *       0.0.0.0/0            6.6.6.6              MARK set 0x1
$

6.6.6.6/32 is configured on lo in that server. Each time I execute the traceroute towards 6.6.6.6, the rule counter above increases which is expected. In other words, the packets seem to get marked. My routing policy database looks like this:

$ ip rule show
0:      from all lookup local
32764:  from all fwmark 0x2 lookup twohundred
32765:  from all fwmark 0x1 lookup threehundred
32766:  from all lookup main
32767:  from all lookup default
$

.. and the table threehundred looks like this:

$ ip r sh table threehundred
default via 192.168.100.2 dev ge-0.0.0-Iosv6
$

However, the marked packets are not routed based on the entry in table threehundred, but rather based on the entry in table main. I can confirm with the tcpdump, that the UDP packets ingress the server via ge-0.0.0-Iosv6, but ICMP port unreachable reply is sent out via eth0 which is associated with the default route in the main table. As I mentioned earlier, the mangle table PREROUTING rule #2 is incremented during this.

What might cause such behavior? I'm running Ubuntu 16.04.6 LTS.

0

1 Answer 1

3
+200

Here's a Packet flow in Netfilter and General Networking schematic:

Packet flow in Netfilter and General Networking

While the ingress packet was marked in PREROUTING, the locally generated reply packet did egress through OUTPUT: there's no rule to mark it there, so there's no mark and the route is different.

Altering the packet in mangle/OUTPUT, including changing meta-information such as the mark, triggers a reroute check. This reroute should switch the route from eth0 to ge-0.0.0-Iosv6 (note: with nftables instead of iptables, the dedicated route chain type is required to have this effect). This rule will do that:

iptables -t mangle -A OUTPUT -s 6.6.6.6 -j MARK --set-mark 1

Instead of marking independent packets with specific rules in both ways, it's possible to mark automatically the whole flow (as tracked by conntrack). The connmark match and its CONNMARK target counterpart can be used. This blog gives examples of use: Netfilter Connmark.

For this case, instead of the iptables rule above:

  • should be the last rule in mangle/PREROUTING:

    iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j CONNMARK --save-mark
    
  • should be the first rule in mangle/OUTPUT so it can still be altered if needed. This will trigger a reroute check:

    iptables -t mangle -I OUTPUT -m connmark ! --mark 0 -j CONNMARK --restore-mark
    

There are also a few things to know and caveats quite difficult to predict reliably without testing:

  • Toggling fwmark_reflect (eg: sysctl -w net.ipv4.fwmark_reflect=1) might have been enough for this specific case and used instead of the rules above, but wouldn't help for a more general case. Likewise there's tcp_fwmark_accept to ease the TCP case. There's no equivalent for other protocols like UDP.

  • sometimes the route fails before the reroute check because of Strict Reverse Path Forwarding and the packet is dropped early, before it got a chance to be marked and rerouted. Obviously that's not the case here (SRPF might even not be enabled), but should this happen, relaxing the check too Loose Mode should be done on one of the involved interfaces (tests required to figure out which one) by changing rp_filter settings (eg: sysctl -w net.ipv4.conf.eth0.rp_filter=2).

  • sometimes some additional routes from the main table must be duplicated in the additional table since it's read first before falling back to the main table and might not match. It's difficult to figure out when it's required, especially when marks are involved. Eg:

    ip route add table threehundred 192.168.100.2/32 dev ge-0.0.0-Iosv6
    
  • the command ip route get ... even if supplied with the adequate mark appears to not always predict accurately what is currently happening when marks and iptables are involved.

  • the behaviour related to interactions between mark, route and maybe ip route get prediction can be altered with the undocumented src_valid_mark toggle (also via sysctl). Use it only if it appears to fix things.

  • UDP server behaviour in case of policy routing can be found to differ from TCP server behaviour for reasons complex to explain. Using marks can only increase the complexity.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .