0

Background

I have a linux machine with bridge interfaces as shown below...

            ---{prenat}-->         ---{postnat}-->
          source: 172.25.0.3      source: 192.0.2.1

+--------------------+                 +-----------------------+
|   br-2f0c8e39d468  |------{linux}----|   br-dee49672169b     |
|   172.25.0.0/16    |                 |   192.0.2.0/24        |
|   Docker Compose   |                 |   containerlab Docker |
|       Hosts        |                 |        Hosts          |
+--------------------+                 +-----------------------+

I want to NAT the IPv4 source address of any traffic from br-2f0c8e39d468 to br-dee49672169b using the interface IP address of br-dee49672169b (192.0.2.1).

$ ip route show
default via 10.100.50.1 dev ens160 proto static
10.100.50.0/28 dev ens160 proto kernel scope link src 10.100.50.5
172.16.0.0/16 via 192.0.2.2 dev br-dee49672169b
172.25.0.0/16 dev br-2f0c8e39d468 proto kernel scope link src 172.25.0.1
192.0.2.0/24 dev br-dee49672169b proto kernel scope link src 192.0.2.1
$

Docker Compose bridge

This is my Docker Compose yaml for the zabbix br-2f0c8e39d468 segment...

version: '3.3'

services:
  # Zabbix database
  zabbix-db:
    container_name: zabbix-db
    image: mariadb:10.11.4
    restart: always
    volumes:
      - ${ZABBIX_DATA_PATH}/zabbix-db/mariadb:/var/lib/mysql:rw
      - ${ZABBIX_DATA_PATH}/zabbix-db/backups:/backups
    command:
      - mariadbd
      - --character-set-server=utf8mb4
      - --collation-server=utf8mb4_bin
      - --default-authentication-plugin=mysql_native_password
    environment:
      - MYSQL_USER=${MYSQL_USER}
      - MYSQL_PASSWORD=${MYSQL_PASSWORD}
      - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
    stop_grace_period: 1m
    networks:
      - statics

  # Zabbix server
  zabbix-server:
    container_name: zabbix-server
    image: zabbix/zabbix-server-mysql:ubuntu-6.4-latest
    restart: always
    ports:
      - 10051:10051
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/alertscripts:/usr/lib/zabbix/alertscripts:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/externalscripts:/usr/lib/zabbix/externalscripts:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/dbscripts:/var/lib/zabbix/dbscripts:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/export:/var/lib/zabbix/export:rw
      - ${ZABBIX_DATA_PATH}/zabbix-server/modules:/var/lib/zabbix/modules:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/enc:/var/lib/zabbix/enc:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/ssh_keys:/var/lib/zabbix/ssh_keys:ro
      - ${ZABBIX_DATA_PATH}/zabbix-server/mibs:/var/lib/zabbix/mibs:ro
    environment:
      - MYSQL_ROOT_USER=root
      - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
      - DB_SERVER_HOST=zabbix-db
      - ZBX_STARTPINGERS=${ZBX_STARTPINGERS}
    depends_on:
      - zabbix-db
    stop_grace_period: 30s
    sysctls:
      - net.ipv4.ip_local_port_range=1024 65000
      - net.ipv4.conf.all.accept_redirects=0
      - net.ipv4.conf.all.secure_redirects=0
      - net.ipv4.conf.all.send_redirects=0
    networks:
      - statics

  # Zabbix web UI
  zabbix-web:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: zabbix-web
    image: zabbix/zabbix-web-nginx-mysql:ubuntu-6.4-latest
    restart: always
    ports:
      - 9000:8080
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ${ZABBIX_DATA_PATH}/zabbix-web/nginx:/etc/ssl/nginx:ro
      - ${ZABBIX_DATA_PATH}/zabbix-web/modules/:/usr/share/zabbix/modules/:ro
    environment:
      - MYSQL_USER=${MYSQL_USER}
      - MYSQL_PASSWORD=${MYSQL_PASSWORD}
      - DB_SERVER_HOST=zabbix-db
      - ZBX_SERVER_HOST=zabbix-server
      - ZBX_SERVER_NAME=Zabbix Docker
      - PHP_TZ=America/Chicago
    depends_on:
      - zabbix-db
      - zabbix-server
    stop_grace_period: 10s
    networks:
      - statics

networks:
  statics:
    driver: macvlan

containerlab bridge

This is the yaml for the Cisco CSR1000V containerlab bridged segment on 192.0.2.0/24...

name: rr

mgmt:
  network: statics
  ipv4-subnet: 192.0.2.0/24
  ipv4-range: 192.0.2.0/24

# ACCESS for linux:
#     docker exec -it <container_name> bash
# ACCESS for frr:
#     docker exec -it <container_name> vtysh
# ACCESS for srlinux:
#     docker exec -it <container_name> sr_cli
# ACCESS for vr-csr:
#     telnet <container_ip> 5000
topology:
  nodes:
    csr01:
      kind: vr-csr
      image: vrnetlab/vr-csr:16.12.08
      startup-config: config/csr01/config.txt
      mgmt-ipv4: 192.0.2.2
    csr02:
      kind: vr-csr
      image: vrnetlab/vr-csr:16.12.08
      startup-config: config/csr02/config.txt
      mgmt-ipv4: 192.0.2.3
    csr03:
      kind: vr-csr
      image: vrnetlab/vr-csr:16.12.08
      startup-config: config/csr03/config.txt
      mgmt-ipv4: 192.0.2.6
    PC01:
      kind: linux
      image: ubuntu:22.04
      mgmt-ipv4: 192.0.2.4
    PC02:
      kind: linux
      image: ubuntu:22.04
      mgmt-ipv4: 192.0.2.5
      #image: netenglabs/suzieq:latest
    # Manual creation of bridge required before deploying the topology
    #     sudo brctl addbr br-clab
    br-clab:
      kind: bridge
  links:
    - endpoints: ["csr01:eth3", "csr02:eth3"]
    - endpoints: ["csr01:eth4", "csr02:eth4"]
    - endpoints: ["csr03:eth3", "csr01:eth5"]
    - endpoints: ["csr03:eth4", "csr02:eth5"]
    - endpoints: ["PC01:eth1", "csr01:eth6"]
    - endpoints: ["PC02:eth1", "csr02:eth6"]
    - endpoints: ["br-clab:eth1", "csr01:eth2"]
    - endpoints: ["br-clab:eth2", "csr02:eth2"]
    - endpoints: ["br-clab:eth3", "csr03:eth2"]

What I tried

To circumvent the DOCKER-ISOLATION-* chains, I used this...

sudo iptables -I INPUT -i br-2f0c8e39d468 -j ACCEPT
sudo iptables -I FORWARD -i br-2f0c8e39d468 -j ACCEPT
sudo iptables -I FORWARD -o br-2f0c8e39d468 -j ACCEPT

This results in the following iptables rules

$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
DOCKER-USER  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (4 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.25.0.2           tcp dpt:http-alt
ACCEPT     tcp  --  anywhere             172.25.0.3           tcp dpt:zabbix-trapper

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-ISOLATION-STAGE-2 (4 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-USER (1 references)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             /* set by containerlab */

I used:

$ sudo sysctl net.bridge.bridge-nf-call-iptables=1
$ sudo sysctl net.bridge.bridge-nf-call-arptables=1
$ sudo sysctl -w net.ipv4.ip_forward=1

I also used sudo iptables -t nat -A POSTROUTING -o br-dee49672169b -j MASQUERADE.

When I ping a 192.0.2.2 Docker containerlab container from 172.25.0.3 Docker Compose system and I sniff the 192.0.2.0/24 bridge interface on the {linux} host, I see:

19:32:31.870772 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 0, length 64
19:32:31.870807 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 0, length 64

19:32:32.871777 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 1, length 64
19:32:32.871811 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 1, length 64

19:32:33.871761 IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 10775, seq 2, length 64
19:32:33.871794 IP 192.0.2.2 > 172.25.0.3: ICMP echo reply, id 10775, seq 2, length 64

As you can see, the NAT is applied when sending to 192.0.2.2, but the reply goes to 172.25.0.3, so something is rather broken here.

What commands should I use to implement this NAT correctly?

4
  • Where/how are you running tcpdump? Where are you running the ping command?
    – larsks
    Commented Jun 21 at 21:00
  • @larsks I updated the question
    – mc1
    Commented Jun 22 at 18:36
  • Apparently it's because the "reverse DNAT" is applied once the replies have entered the system, regardless of whether the first interface they "got through" is a "bridge port/slave". (I do wonder if it's implemented like this for a certain reason, or if it's just some sort of "miss" that happen to "still work"...)
    – Tom Yan
    Commented Jun 23 at 10:32
  • 1
    Wait, why is your Docker network using the macvlan driver (without specifying a parent device explicitly)? You neglected to mention that in your question earlier. That is critical information that completely changes the traffic flow. Using the macvlan driver means there isn't any bridge device associated with that network. (I'm not even sure that using macvlan without an explicit parent device leads to a useful configuration.)
    – larsks
    Commented Jun 24 at 13:54

1 Answer 1

0

Docker by default masquerades all traffic leaving a Docker network.

I've set up an environment to reproduce your configuration, like this:

networks:
  net172:
    driver_opts:
      com.docker.network.bridge.name: br172
    ipam:
      config:
        - subnet: 172.25.0.0/16
  net192:
    driver_opts:
      com.docker.network.bridge.name: br192
    ipam:
      config:
        - subnet: 192.0.2.0/24

services:
  node-172-0:
    image: docker.io/alpine:latest
    networks:
      net172:
        ipv4_address: 172.25.0.2
    init: true
    command:
    - sh
    - -c
    - |
      apk add tcpdump
      sleep inf
  node-192-0:
    image: docker.io/alpine:latest
    networks:
      net192:
        ipv4_address: 192.0.2.2
    init: true
    command:
    - sh
    - -c
    - |
      apk add tcpdump
      sleep inf

Initially, if I try to ping 192.0.2.2 from node-172-0, it will simply fail:

/ # ping -c1 192.0.2.2
PING 192.0.2.2 (192.0.2.2): 56 data bytes

--- 192.0.2.2 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

This is because the packets never make it to the 192.0.2.0/24 network. Running tcpdump -nn -i br172, we see:

23:01:31.473515 IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 22, seq 0, length 64

But running tcpdump -nn -i br192, we never see the packet arrive. That's because of the rules Docker sets up in the FORWARD chain. First we hit this rule:

-A DOCKER-ISOLATION-STAGE-1 -i br172 ! -o br172 -j DOCKER-ISOLATION-STAGE-2

Which leads us to:

-A DOCKER-ISOLATION-STAGE-2 -o br192 -j DROP

So the first thing we need to do is allow the kernel to forward packets between these two networks. We can do that by adding a rule to the DOCKER-USER chain, since that gets called before any other Docker rules:

iptables -A DOCKER-USER -i br172 -o br192 -j ACCEPT

We might as well add one for the reverse direction as well, since we know we're going to need it:

iptables -A DOCKER-USER -i br192 -o br172 -j ACCEPT

And double check to see if the first rule in the DOCKER-USER chain is -j RETURN; if you see this:

# iptables -S DOCKER-USER
-N DOCKER-USER
-A DOCKER-USER -j RETURN

Then you'll need to remove it:

iptables -D DOCKER-USER -j RETURN

With these changes in place, we can now successfully ping from node-172-0 to node-192-0:

/ # ping -c1 192.0.2.2
PING 192.0.2.2 (192.0.2.2): 56 data bytes
64 bytes from 192.0.2.2: seq=0 ttl=63 time=0.233 ms

--- 192.0.2.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.233/0.233/0.233 ms

If on the host we run tcpdump -nn -i any icmp, we see:

  1. The packet comes in on br172:

    23:13:54.292707 vethce5d047 P   IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64
    23:13:54.292707 br172 In  IP 172.25.0.2 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64
    
  2. It gets masqueraded when it exits on br192 (because masquerading happens in the POSTROUTING chain, which is part of the output path, not the input path):

    23:13:54.292776 br192 Out IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64
    23:13:54.292783 veth6c3d3ea Out IP 192.0.2.1 > 192.0.2.2: ICMP echo request, id 33, seq 0, length 64
    

    That's due to the rule that Docker added to the nat POSTROUTING chain:

    -A POSTROUTING -s 172.25.0.0/16 ! -o br172 -j MASQUERADE
    
  3. Container node-192-0 sends a reply to the (masqueraded) source address:

    23:13:54.292815 veth6c3d3ea P   IP 192.0.2.2 > 192.0.2.1: ICMP echo reply, id 33, seq 0, length 64
    
  4. But when this enters br192, the destination address gets de-masqueraded, since it's a reply to the original request:

    23:13:54.292815 br192 In  IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64
    23:13:54.292830 br172 Out IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64
    23:13:54.292832 vethce5d047 Out IP 192.0.2.2 > 172.25.0.2: ICMP echo reply, id 33, seq 0, length 64
    

(NB: The above demonstrates the configuration set up by Docker version 26.1.4.)

3
  • Unfortunately there is no POSTROUTING chain in my topology. Perhaps this is because you're simulating with two Docker Compose bridges, but there is only one Docker Compose bridge in my topology. The br-dee49672169b bridge in my question comes from containerlab and it's running some Cisco CSR1000V containers. I will update my question with the full dump of my iptables rules.
    – mc1
    Commented Jun 24 at 12:41
  • This would all be correct if you weren't use the macvlan driver on your docker network (which you hadn't mentioned in your original question). Because the masquerade rules we care about are those for the source network, it wouldn't matter that your second network is managed with containerlab.
    – larsks
    Commented Jun 24 at 13:56
  • The device is the eth0 on {linux} and this actually works without specifying a parent. There is quite a bit of wierd behavior that containerlab introduces and I'm not sure we can solve this on the site because after working on this for a while, I think the details of the CSR1000V containers are pretty relevant. I'll accept your answer though
    – mc1
    Commented Jun 24 at 14:04

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .