The Proxy ARP method of routing subnets to solve the docker networking problem
Recently I discovered something called Proxy ARP. I had seen this earlier in sysctl options but never understood it and why would someone need it, until one day I worked in a networking setup which used this to route traffic from the machine to the Internet. It’s an interesting technique and can solve a big problem when you want to use the currently popular tool, docker in your LAN subnet that has DHCP without having to do some other stuff like port forwarding when trying to give access to others.
In the standard setup, docker will create a bridge for itself which goes by the name docker0 and it will have a private IP address in the class B private range. It is indeed possible to specify the subnet for your docker containers, so let’s say my LAN subnet is 192.168.0.0/20 (i.e. 255.255.240.0) which gives me IP addresses from 192.168.0.1 to 192.168.15.254. When I previously did that with docker (I’m not sure whether it does the same now), if you give it such a subnet it would start assigning ip addresses from the first address available in the subnet, i.e. 192.168.0.1. That’s a recipe for a disaster, because the first ip addresses of any subnet always tend to be routers and you definitely do not want a docker container hoarding your router’s ip address.
So I came up with this – let’s configure the DHCP server of the router so that it does not allocate any ip address after 192.168.14.255 even though our subnet does support 192.168.15.254. I’ll use 192.168.15.0/24 (i.e. 255.255.255.0) as my docker subnet. I have an ethernet interface on my machine (i.e. the machine on which docker runs) and it has an ip address of 192.168.1.204/20 (as you can see here, the subnet is the original one). I don’t have to bridge the NIC to the docker bridge nor do any mess like static routing.
Next, I change three parameters in sysctl:
sysctl net.ipv4.conf.eth0.proxy_arp=1 sysctl net.ipv4.conf.docker0.proxy_arp=1 sysctl net.ipv4.ip_forward=1
Once docker starts up, it will assign 192.168.15.1 for the host itself and then assign the rest of the ip addresses to the containers one by one as you go. If I ping 192.168.15.1 from any other machine on the LAN, it works. Similarly I can ping any machine in the subnet 192.168.0.0/20 from inside a docker container.
To understand why this works, we need to go back to the data link layer. Whenever a frame is to be transmitted to a machine, the sending machine needs to look up the physical or MAC address of the destination machine. This is done using ARP (Address Resolution Protocol). Let’s say I send a packet from 192.168.3.24 to 192.168.15.2, and as per our networking setup, 192.168.15.2 is included in the subnet. So 192.168.3.24 will now send a broadcast on the networking asking for the MAC address of the IP address 192.168.15.2. Since we have enabled Proxy ARP in the kernel, my machine replies to the ARP broadcast with its own MAC address (here it doesn’t matter what MAC address your container has). Now the frame from 192.168.3.24 will come to my machine and after reaching my machine, the routing happens to 192.168.15.2. So basically, this is ARP proxy with split subnet. We just split the large subnet 192.168.0.0/20 into two, one in which all other machines exist and another in which docker containers exist.
Now you might ask what is the advantage of this over static routing? The answer is, this does not require any kind of configuration like static routing, port forwarding or NAT. The IPv6 version of this is NDP (Neighbor Discovery Protocol) proxy.
It is indeed possible to add your NIC to the docker0 bridge and not require any such stuff, but then when you have mixed applications like running the machine as a KVM host in addition to Docker host, things can get easily complicated when this comes as a rescue for me.
Neat trick.. Is there a way to allocate static IPs when docker containers restart.. if so, with this trick a small PaaS setup can be done easily.
It seems there’s no direct option for this, but previously I managed to have the NET_ADMIN capability in a container and set ip address from inside the container. It’s a little bit of security risk though.