Avoid downtime while DNS changes propagate with Linux

Your business grow, you need more power, you want to change your hosting provider - there can be a lot of reasons to move production applications from one server to another. If there is no simple way to scale an application, plus you need to update DNS entries, there may be a downtime while these DNS changes propagate. Perhaps, some proxy servers, Google, Microsoft or whoever disrespect your shortened DNS TTL? I’ll combine some nice and uncommon Linux networking features to completely avoid that downtime.

Scenario and Prerequisites

Assume that your application runs on server old inside a Docker container, listening on ports 587 and 25 for incoming TCP connections. It should move to server new.

There is an important prerequisite: Both servers need to be connected to each other via Layer 2 (Data Link Layer). If you have two arbitrary servers that are not part of the same network, you will need a VPN. With OpenVPN, choose device tap for a Layer 2 connection. The following table summarises all assumptions in this example.

Property	Old Server	New Server
Hostname	old	new
VPN device ip	10.31.104.1	10.31.104.2
VPN device name	tun0	tun0
Application ports	TCP 587 and 25	TCP 587 and 25
Docker bridge network	172.18.0.0/16	172.19.0.0/16

In our scenario, we setup traffic forwarding from new to old first. The application will receive correct source ips, even if connections are routed through new. If you do simple proxying, all pass-through connections will have new’s ip as it’s source ip. The second step is to update the DNS entries to point to new and to wait until all clients connect to your application through new. Finally, we schedule a downtime and actually move the application from old to new.

Setup new server

First, new gets a simple and common NAT setup:

# Add DNAT rule to route traffic to old
iptables -t nat -A PREROUTING -p tcp -m multiport --dports 587,25 -j DNAT \
  --to-destination 10.31.104.1

# Add MASQUERADE rule for traffic coming from old's Docker
iptables -t nat -A POSTROUTING -s 172.18.0.0/16 -o eth0 -j MASQUERADE

# Enable routing
sysctl -w sysctl net.ipv4.ip_forward=1

Setup old server

Now, packets will arrive at new, but any replies back to the sender will just route through old’s public network interface instead of going back through new. Therefore, the sender will likely reject these reply packets, because he sent packets to new, and does not expect to receive replies from old. We cannot set new as a default gateway for old, this would cause a similiar problem, the other way round, for packets arriving at old.

We’re going to use connection marking and multiple routing tables to solve this situation and route all packets correctly on new. By the way: If you run a system with two public network interfaces, you’ll need a similar solution.

The two following rule marks each connection coming from new upon establishing with mark 2:

iptables -t mangle -A PREROUTING -i tap0 -p tcp -m state --state NEW \
  -j CONNMARK --set-mark 2

The next one looks up and restores the connection mark, if present, for each packet that comes from any other interface (you may narrow this rule down further):

iptables -t mangle -A PREROUTING ! -i tap0 -p tcp -m state \
  --state RELATED,ESTABLISHED -j CONNMARK --restore-mark

You can read more about the CONNMARK target here and here.

Now, we setup a second routing table. All packets being marked with 2 should be routed using this second table instead of old’s default table.

# Setup routing table 2
ip route add default via 10.31.104.2 dev tap0 table 2
ip route add 10.31.104.0/24 dev tap0 table 2

# Use table 2 to route packets marked with '2'
ip rule add fwmark 2 table 2

Finally, we have to turn off Reverse Path Filtering. Basically, Reverse Path Filtering does the following: If the reply to a packet wouldn’t go out the interface the packet came in, then it is considered as a bogus packet and should be ignored (taken from The Linux Documentation Project where you can find more information as well).

sysctl -w sysctl net.ipv4.conf.all.rp_filter=0
sysctl -w sysctl net.ipv4.conf.tap0.rp_filter=0

Now, all incoming traffic routes as expected. If you want to route outgoing traffic from your Docker container through new, too, here are two more rules for old to achieve that:

iptables -t mangle -A PREROUTING -s 172.18.0.14/32 -p tcp -m state --state NEW \
  -j CONNMARK --set-mark 2
iptables -t mangle -A PREROUTING -s 172.18.0.14/32 -p tcp -m state --state NEW \
  -j MARK --set-mark 2

We need both CONNMARK and MARK here: CONNMARK marks the connection internally as we have done it before. MARK marks the packet itself, similar to the –restore-mark flag above.

Note that the particular ip of the Docker container is part of these two rules. That’s because you probably don’t want that all outgoing Docker container traffic to be routed through new. On the other hand, a Docker setup with fixed container ips is required.

I hope that these steps are a good preparation before your application actually moves from one server to another. By setting up the tunnel from the new to the old server, you can perform required DNS changes in advance.