< Index

My NixOS Router Journey

Previously, I used the great OpenBSD Router Guide to turn a laptop into an OpenBSD router. The guide itself is great - however, OpenBSD hardware support is lacking - the drivers for the external NIC I am using simply broke at some point, requiring a reboot. Old connections were maintained just fine - matrix still worked, and if a ping command was running, it continued to ping just fine - but if I tried to run a new ping command, or open a website, or do anything else I wasn't already doing, it printed "sendmsg: permission denied". Essentially, it dropped most new connections at that point, while keeping established ones. Perhaps I could attempt to debug it - but I simply didn't feel like the investment would've been worth the time, especially since I wanted btrfs support - I even went as far as successfully compiling lkl for OpenBSD, though it failed at runtime... but I digress.

Anyway, I decided to switch to NixOS on my router as part of my crusade to switch most of my devices to it. So, here is how I did it! This post is more or less a raw thought stream from during the setup process. It also resembles a tutorial - that is purely because it makes it easier for me to write, this isn't really intended as a tutorial, more as an explanation of what I did - you're free to use this as reference though!


First, I need to make sure I have internet during router maintenance, as well as the ability to do maintenance at all - my ISP requires routers to have a certain mac address, so I can't just plug in the USB drive and press "install NixOS". That is fairly simple to solve - I have a "plain old" router connected as a wireless AP, so I will simply use it as my main router for the short while.

At first I wanted to set the router up before even booting it, but I quickly abandoned the idea. NixOS has the great feature that allows you to boot a previous generation - so if I iterate step-by-step, I can always return to an earlier generation in case of an error and try again. That would be impossible if I simply created a config and it suddenly failed to boot or connect to the network. So, I simply installed a minimal NixOS system.

The first thing I want to do is to set the network interface names so I can rest assured they won't suddenly break if I connect something to the router, or if a system update happens, or in any other case. Furthermore, I want to make sure I have the ability to set an interface's mac address. Granted, I can't set it to the correct address just yet - if I did that, it would conflict with my router's mac address - and bad things would happen.

So, let's set it all up!

NixOS doesn't really support changing interface names as-is, but you can add custom udev rules so it's simple enough:

# replace with actual values
let
  lan_mac = "11:11:11:11:11:11";
  wan_mac = "22:22:22:22:22:22";
  wan_target_mac = "11:22:33:44:55:66";
  wlan_mac = "33:33:33:33:33:33";
in
{
  services.udev.extraRules = ''
    SUBSYSTEM=="net", ACTION=="add", ATTR{address}==${lan_mac}, NAME="lan0"
    SUBSYSTEM=="net", ACTION=="add", ATTR{address}==${wan_mac}, NAME="wan0"
    SUBSYSTEM=="net", ACTION=="add", ATTR{address}==${wlan_mac}, NAME="wlan0"
  '';

  networking.interfaces.wan0 = {
    useDHCP = true;
    macAddress = wan_target_mac;
  };
}

Now reboot, and...

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 11:11:11:11:11:11 brd ff:ff:ff:ff:ff:ff
3: wan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 22:22:22:22:22:22 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.69/24 brd 192.168.1.255 scope global dynamic noprefixroute wan0
       valid_lft 86075sec preferred_lft 75275sec
4: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 33:33:33:33:33:33 brd ff:ff:ff:ff:ff:ff

Success? Wait! wan0's mac address is still not 11:22:33:44:55:66! Something went wrong... Let's debug it!

systemd-udevd: wan0: Could not set Alias=, MACAddress=, TransmitQueues=, ReceiveQueues=, TransmitQueueLength=, MTU=, GenericSegmentOffloadMaxBytes= or GenericSegmentOffloadMaxSegments=, ignoring: Cannot assign requested address

...huh? Why? It worked just fine on OpenBSD, clearly the hardware at least supports changing the mac address!.. Maybe the device gets initialized after network configuration? Anyway, let's try to figure it out!

But to do that, I need to know how to change the mac address manually! Apparently, you can do it using ip link set wan0 address 11:22:33:44:55:66? Let's try it out!

# ip link set wan0 down
# ip link set wan0 address 11:22:33:44:55:66
RTNETLINK answers: Cannot assign requested address

Fine... Let's try something else?

# systemctl stop network-addresses-wlan0
# ip link set wan0 address 11:22:33:44:55:66
RTNETLINK answers: Cannot assign requested address

...Maybe this time, third time's the charm!

# systemctl stop network-addresses-wlan0
# ip link set wan0 down
# ip link set wan0 address 11:22:33:44:55:66
RTNETLINK answers: Cannot assign requested address

...Nope. Let's look it up?

...Oh. Apparently, the first two bits in mac addresses are reserved. You learn something new every day I guess... Anyway, let's change that to 00:11:22:33:44:55 (remember, I still can't use the real mac address as it would conflict with my current router).

nixos-rebuild boot, and... reboot!

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: lan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 11:11:11:11:11:11 brd ff:ff:ff:ff:ff:ff
3: wan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.69/24 brd 192.168.1.255 scope global dynamic noprefixroute wan0
       valid_lft 86093sec preferred_lft 75293sec
4: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 33:33:33:33:33:33 brd ff:ff:ff:ff:ff:ff

Thank god! Now I can start actually setting up the router...

First things first - I need to let the kernel forward packets.

# set the kernel parameters necessary to let us forward packets
boot.kernel.sysctl = {
  "net.ipv4.conf.all.forwarding" = true;
  "net.ipv6.conf.all.forwarding" = true;
};

Now I need to set the lan address space (using temporary ip addresses):

networking.interfaces.lan0 = {
  ipv4.addresses = [{
    address = "10.0.0.1";
    prefixLength = 24;
  }];
  ipv6.addresses = [{
    address = "fd00::1";
    prefixLength = 48;
  }];
};

For real config, remember to generate a unique local address according to RFC 4193. tl;dr: first 2 (hex) digits are "fd", then 10 random digits follow, then 4 digits specify the subnet. As an example:

fd01:2345:6789:0000:0000:0000:0000:0000
  ^^^^^^^^^^^^ ^^^^ ^^^^^^^^^^^^^^^^^^^
  |            |    |
  |            |    |> the actual device id
  |            |
  |            |> subnet id, picked by you
  |
  |> global site id, should be picked randomly

or, fd01:2345:6789:: for short (:: means "fill with zeroes" if you didn't know)

You don't exactly have to, unless you plan to interact with other networks (e. g. via a VPN) using the same IP subnet, but it's an RFC, why wouldn't you follow it????? If you need a random online tool to do that for you, use this.

And, finally, make it actually function as a router:

networking = {
  firewall.enable = false;
  nftables.enable = true;
  # Be careful - this won't allow you to revert to old config! I simply use this for quick iterations, make sure to switch to plain "ruleset" later.
  nftables.rulesetFile = "/etc/nixos/nftables.conf";
};

You could use networking.nat to do it declaratively via iptables, but I wanted to use the new and shiny nftables instead.

Network rules are complicated! Let's create a basic config for now:

flush ruleset

define LAN_SPACE = 10.0.0.0/24
define LAN6_SPACE = fd00::/64

table inet global {
  chain inbound_wan {
    # https://shouldiblockicmp.com/
    # that said, icmp has some dangerous packet types, so limit it to
    # some extent
    ip protocol icmp icmp type { destination-unreachable, echo-request, time-exceeded, parameter-problem } accept
    ip6 nexthdr icmpv6 icmpv6 type { destination-unreachable, echo-request, time-exceeded, parameter-problem, packet-too-big } accept
  }
  chain inbound_lan {
    # I trust my LAN, however you might have different requirements
    accept
  }
  chain inbound {
    type filter hook input priority 0; policy drop;

    ct state vmap { established : accept, related : accept, invalid : drop }

    iifname vmap { lo : accept, wan0 : jump inbound_wan, lan0 : jump inbound_lan, wlan0 : jump inbound_lan }
  }
  chain forward {
    type filter hook forward priority 0; policy drop;

    ct state vmap { established : accept, related : accept, invalid : drop }

    iifname lan0 accept
  }
  chain postrouting {
    type nat hook postrouting priority 100; policy accept;
    ip saddr $LAN_SPACE oifname wan0 masquerade
    ip6 saddr $LAN6_SPACE oifname wan0 masquerade
  }
}

and finally, a DHCP server:

services.dhcpd4 = {
  enable = true;
  interfaces = [ "lan0" ];
  extraConfig = ''
    option routers 10.0.0.1;
    option domain-name-servers 8.8.8.8, 8.8.4.4;
    option domain-name "local";
    subnet 10.0.0.0 netmask 255.255.255.0 {
      range 10.0.0.2 10.0.0.254;
    }
  '';
};
services.dhcpd6 = {
  enable = true;
  interfaces = [ "lan0" ];
  extraConfig = ''
    option dhcp6.name-servers 2001:4860:4860::8888, 2001:4860:4860::8844;
    option domain-name "local";
    subnet6 fd00::/64 {
      range6 fd00::2 fd00::ff00;
    }
  '';
};
# advertise the router, required for ipv6
services.radvd = {
  enable = true;
  config = ''
    interface lan0 {
      AdvSendAdvert on;
      AdvManagedFlag on;
      prefix 1111:2222:3333:4444::/64 {
    AdvAutonomous off;
      };
    };
  '';
};

Now let's connect my Thinkpad to it... And it works! There is internet, IPs 10.0.0.2 and fd00::<random gibberish> are issued correctly.

Though, this is just the beginning. First, the natural thing to do is to make sure the configuration is secure.

For starters, let's make sure no martian gets through! I feel a little bad for the poor Mars citizens, but this is a necessary measure for my peace of mind.

As usual, there's a kernel option for that - net.ipv4.conf.<iface>.rp_filter. Let's check the current value:

# sysctl -r '\.rp_filter' net.ipv4.conf
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.lan0.rp_filter = 2
net.ipv4.conf.lo.rp_filter = 2
net.ipv4.conf.wan0.rp_filter = 2
net.ipv4.conf.wlan0.rp_filter = 2

It's set to 2 for all of my interfaces. As kernel docs tell us:

Essentially, what it means is that 2 will discard packets that are clearly coming from nonexistent IPs (IPs that aren't reachable via any interface), while 1 will discard packets that come from IPs from the "wrong" interface. 1 seems better, so let's change it to 1!

boot.kernel.sysctl = {
  "net.ipv4.conf.all.forwarding" = true;
  "net.ipv6.conf.all.forwarding" = true;
  "net.ipv4.conf.default.rp_filter" = 1;
  "net.ipv4.conf.lan0.rp_filter" = 1;
  "net.ipv4.conf.wan0.rp_filter" = 1;
  "net.ipv4.conf.wlan0.rp_filter" = 1;
};

By the way, I'm renaming the interfaces, will it work fine after a reboot? Let's reboot and see!

# nixos-rebuild switch
# reboot
...
# sysctl -r '\.rp_filter' net.ipv4.conf
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.lan0.rp_filter = 1
net.ipv4.conf.lo.rp_filter = 2
net.ipv4.conf.wan0.rp_filter = 1
net.ipv4.conf.wlan0.rp_filter = 1

Okay, great! (As you can see, I decided not to change it for the loopback adapter)

And in case you were wondering - the router still works just fine!

Did you think that's all? Think again! IPv6 was left as-is, because rp_filter isn't implemented for it! This requires us to add some nftables rules. While we're at it, let's see what else we can do!

After some searching, I decided to add the following rules:

# These run before any other chains
table netdev filter {
  chain ingress {
    type filter hook ingress devices = { wan0, lan0, wlan0 } priority -500;

    # drop fin and syn at the same time
    tcp flags & (fin|syn) == (fin|syn) drop
    # same for syn and rst
    tcp flags & (syn|rst) == (syn|rst) drop

    # XMAS packets
    tcp flags & (fin|syn|rst|psh|ack|urg) == (fin|syn|rst|psh|ack|urg) drop
    # NULL packets
    tcp flags & (fin|syn|rst|psh|ack|urg) == 0 drop
    # reject packets with irregular MSS
    tcp flags syn tcp option maxseg size 0-500 drop

    # Spoofing protection - protect against others pretending to be the router
    ip saddr 10.0.0.1 drop
    ip6 saddr fe00::1 drop

    # drop if coming from wrong interface
    fib saddr . iif oif missing drop
  }
  chain ingress_wan {
    type filter hook ingress device wan0 priority -500;
    # rate limit icmp
    ip protocol icmp limit rate 5/second accept
    ip protocol icmp counter drop
    ip6 nexthdr icmpv6 limit rate 5/second accept
    ip6 nexthdr icmpv6 counter drop
    # only accept packets to local (i.e. our own) addresses from wan
    # in case of lan, we WANT non-local packets - we will be forwarding them!
    fib daddr . iif type != local drop
  }
}

And also the following rule:

# new packet but no syn
tcp flags & syn != syn ct state new drop

to the inbound rule that I added previously. It requires state, so it can't be added to ingress.

Alright, enough with the hardening! Let's set a DNS server up next. Except, wouldn't it be boring if it were just a plain old recursive DNS server? Let's spice it up by making it forward mDNS!

First, let's set Avahi up to gain access to mDNS.

services.avahi = {
  enable = true;
  hostName = "router";
  interfaces = [ "lan0" "wlan0" ];
  publish = {
    enable = true;
    addresses = true;
    domain = true;
    userServices = true;
  };
};

In order to give Unbound access to Avahi, we need a Python plugin. That plugin also requires pydbus and dnspython. However, NixOS doesn't currently support Python plugins for Unbound! Let's override it:

services.unbound.package =
  # Use python with pydbus and dnspython for Unbound
  let python = pkgs.python3.withPackages (pkgs: with pkgs; [ pydbus dnspython ]);
in pkgs.unbound-with-systemd.overrideAttrs(old: {
  preConfigure = "export PYTHON_VERSION=${python.pythonVersion}";
  # swig is needed for bindings generation
  nativeBuildInputs = old.nativeBuildInputs ++ [ pkgs.swig ];
  buildInputs = old.buildInputs ++ [ python ];
  configureFlags = old.configureFlags ++ [ "--with-pythonmodule" ];
  # Patch makefile to use correct output directory
  postPatch = (old.postPatch or "") + ''
    substituteInPlace Makefile.in \
      --replace "\$(DESTDIR)\$(PYTHON_SITE_PKG)" "$out/${python.sitePackages}"
  '';
  # Export correct PYTHONPATH for the resulting unbound binary
  # Namely, export both the output module generated by Unbound,
  # and the modules bundled with the Python defined above
  postInstall = old.postInstall + ''
    wrapProgram $out/bin/unbound \
      --prefix PYTHONPATH : "$out/${python.sitePackages}" \
      --prefix PYTHONPATH : "${python}/${python.sitePackages}" \
      --argv0 $out/bin/unbound
  '';
});

And the actual configuration:

services.unbound = {
  enable = true;
  # Setting this allows using unbound-control
  localControlSocketPath = "/run/unbound/unbound.ctl";
  settings = {
    server = {
      # Listen on loopback to ensure we can access the DNS server locally,
      # and also expose it to LAN
      interface = [ "127.0.0.1" "::1" "10.0.0.1" "fd00::1" ];
      access-control =  [
        "0.0.0.0/0 refuse"
        "127.0.0.0/8 allow"
        "10.0.0.0/24 allow"
        "::0/0 refuse"
        "::1 allow"
        "fd00::/64 allow"
      ];
      aggressive-nsec = true;
      # Enable Python module
      module-config = ''"validator python iterator"'';
      # Hardcode some address records
      local-zone = ''"local." static'';
      local-data = [
        ''"local. A 10.0.0.1"''
        ''"local. AAAA fd00::1"''
        ''"router.local. A 10.0.0.1"''
        ''"router.local. AAAA fd00::1"''
      ];
    };
    # Load Python plugin
    python.python-script = "/path/to/avahi-resolver.py";
    # Enable unbound-control
    remote-control.control-enable = true;
  };
};
# Only attempt to resolve .local domains through Avahi
systemd.services.unbound.environment.MDNS_ACCEPT_NAMES = "^.*\.local\.$";

I could wrap it up at this point... But that isn't all just yet. I also use a WireGuard VPN, and I prefer to route all of my traffic through it.

The logical first step would be to add the Wireguard network interface. But how? Do I use networking.wireguard, networking.wg-quick, or something else entirely? What's the difference between them?

Apparently, wireguard uses the kernel module directly (more or less), while wg-quick uses wireguard-tools. There are probably pros and cons to each approach, so I'll simply use wireguard because it seems like it will work well enough.

networking.wireguard.interfaces.wg0 = {
  # interface IPs
  ips = [ "192.168.100.13/32" "fd55:aaaa::1/128" ];
  peers = [{
    allowedIPs = [ "0.0.0.0/0" "::/0" ];
    publicKey = "bm9wZSwgbm8ga2V5cyBoZXJlLCB0cnkgYWdhaW4gbGF0ZXIK";
    endpoint = "13.37.10.10:420";
    persistentKeepalive = 60;
  }];
  # Keep in mind the files needs to be in base64, not binary
  privateKeyFile = "/etc/nixos/wireguard_key";
};

And... no internet connection? Perhaps it's because I didn't mention wg0 in nftables config? Let's fix that! Also, let's switch NAT from wan0 to wg0, so router clients automatically have VPN enabled.

...However, that didn't fix it. There's no internet, neither locally on the router, nor on the client devices. wg-quick didn't work either.

After loads of troubleshooting, it turned out net.ipv4.conf.wan0.rp_filter and the nftables anti-spoofing options actually prevented some possible solutions from working! I temporarily relaxed the rules so I could harden them later.

Everything works fine when I change allowedIPs to a small address range. So, a loop route? I could fix it by hardcoding the main route, but I don't like that solution, what if the ISP suddenly changes the route? I could use wg-quick which allegedly solves that issue by defining multiple routing tables, but I'd like to be able to control these tables myself, besides, I want to route some traffic via the physical interface too, what if the VPN goes down but everything still expects wg0 to exist? I won't be able to route anything at all! So I decided to try to find another solution.

In order to maintain WAN connection even when Wireguard is down, the best option is probably isolating it in a namespace.

Namespaces are a Linux feature that allows isolating routing tables, network interfaces, etc. To create one, ip netns add needs to be run. Alternatively, systemd can create them (of course it can). At its core, network namespace is a mount. ip netns add test will create the following mount:

# mount | grep netns
nsfs on /run/netns/test type nsfs (rw)

(Frankly, at this point, I tried out loads of possible solutions, got a segfault in nft, didn't have internet for 99% of the time, if I wrote about all of that it would turn out way too long, but anyway, now that I have a working configuration, here you go)

First, I need to set the namespaces up. NixOS doesn't really have a native way to do that, so let's create a systemd unit.

systemd.services.custom-network-setup = {
  description = "custom network setup";
  # before nftables, because it might depend on the configuration we changed
  # before wireguard-wg0 because it *will* depend on the config we changed
  # before dhcpcd because it needs to run in the namespace we will create here
  before = [ "nftables.service" "wireguard-wg0.service" "dhcpcd.service" ];
  wantedBy = [ "network.target" ];
  unitConfig = {
    StopWhenUnneeded = true;
  };
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    ExecStart = with pkgs; writeScript "custom-network-setup-start" ''
      #! ${bash}/bin/bash
      # create namespaces
      ${iproute2}/bin/ip netns add vpn
      ${iproute2}/bin/ip netns add wan
      # move wan0 into the wan namespace
      ${iproute2}/bin/ip link set wan0 netns wan

      # make sure all sysctl variables are set correctly in the new namespaces
      ${iproute2}/bin/ip netns exec wan ${procps}/bin/sysctl net.ipv4.conf.wan0.rp_filter=1
      ${iproute2}/bin/ip netns exec wan ${procps}/bin/sysctl net.ipv4.conf.all.forwarding=1
      ${iproute2}/bin/ip netns exec wan ${procps}/bin/sysctl net.ipv6.conf.all.forwarding=1
    '';
    ExecStop = with pkgs; writeScript "custom-network-setup-start" ''
      #! ${bash}/bin/bash
      ${iproute2}/bin/ip -4 route del default via ${vpnGate4}
      ${iproute2}/bin/ip -6 route del default via ${vpnGate6}
      ${iproute2}/bin/ip rule del fwmark 1 table wan_table
      ${iproute2}/bin/ip rule del fwmark 2 table vpn_table
      ${iproute2}/bin/ip netns exec vpn ${iproute2}/bin/ip link del veth-wan-b
      ${iproute2}/bin/ip link del veth-wan-a
      ${iproute2}/bin/ip netns exec vpn ${iproute2}/bin/ip link del veth-vpn-b
      ${iproute2}/bin/ip link del veth-vpn-a
      ${iproute2}/bin/ip link del br0
      ${iproute2}/bin/ip netns exec wan ${iproute2}/bin/ip link set wan0 netns 1
      ${iproute2}/bin/ip netns del wan
      ${iproute2}/bin/ip netns del vpn
    '';
  };
};

I also need to set the kernel parameter net.netfilter.nf_log_all_netns to 1, so that I can properly read the logs from nft rules in the other namespaces.

Now that the namespaces are ready, I need to move the services to them accordingly. Let's start with dhcpcd, so wan0 can configure itself again. Should be simple, right?

systemd.services.dhcpcd.serviceConfig.NetworkNamespacePath = "/var/run/netns/wan";

Now switch... Oh?

dhcpcd-8.1.4 starting                                                                                                          
udev: starting                                                                                                                 
dev: loaded udev                                                                                                               
no valid interfaces found

I'm not the first one to face this issue! The fix is easy enough - select the device manually by specifying it in the argument list.

Let's copy the default dhcpcd config to /etc/nixos/dhcpcd.conf so I could keep using it. Now, override the dhcpcd.service's executable:

systemd.services.dhcpcd.serviceConfig.ExecStart = lib.mkForce "@${pkgs.dhcpcd}/sbin/dhcpcd dhcpcd --quiet  --config /etc/nixos/dhcpcd.conf wan0";

This is essentially the default, but with a hardcoded config path instead.

Anyway, switch, and... what now?

nixos systemd[1]: dhcpcd.service: Can't open PID file /run/dhcpcd.pid (yet?) after start: Operation not permitted

Ah, I see. /run/dhcpd.pid doesn't exist, so, naturally, systemd can't open it. Well, /run/ has dhcpcd-wan0.pid now, so I guess the filename changes depending on the arguments...

systemd.services.dhcpcd.serviceConfig.PIDFile = lib.mkForce "/run/dhcpcd-wan0.pid";

Now, let's make the VPN create a socket in the wan namespace:

networking.wireguard.interfaces.wg0 = {
  ...
  socketNamespace = "wan";
  interfaceNamespace = "vpn";
  postSetup = with pkgs; ''
    # after the interface gets brought up, set sysctl vars correctly
    ${iproute2}/bin/ip netns exec vpn ${procps}/bin/sysctl net.ipv4.conf.wg0.rp_filter=1
  '';
};

Let's test it...

# ip netns exec wan ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=59.0 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=109 time=65.6 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=109 time=58.8 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=109 time=58.8 ms
...
# ip netns exec vpn ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=120 time=91.4 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=120 time=91.4 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=120 time=91.3 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=120 time=91.4 ms
...

It works! A small change in VPN configuration allowed me to use it with namespaces, it was much easier than expected!

While we're at it, let's reboot to make sure things work as expected...

systemd[1]: sys-subsystem-net-devices-wan0.device: Job sys-subsystem-net-devices-wan0.device/start timed out.
systemd[1]: Timed out waiting for device RTL8153 Gigabit Ethernet Adapter.
systemd[1]: Dependency failed for Address configuration of wan0
systemd[1]: network-addresses-wan0.service: Job network-addresses-wan0.service/start failed with result 'dependency'.
systemd[1]: sys-subsystem-net-devices-wan0.device: Job sys-subsystem-net-devices-wan0.device/start failed with result 'timeout'.

Hm, the only search result is from 2012, and it links to a bug that got fixed already... But whatever! Let's try the solution - which is to make it use the pci ID instead.

systemd.services.network-addresses-wan0.bindsTo = lib.mkForce [ "sys-devices-pci0000:00-0000:00:1c.5-0000:03:00.0-net-lan0.device" ];

Reboot... And it's fixed! The proper solution might be to move network-addresses-wan0.service to the wan namespace, but I've yet to test it - and since it works anyway, I don't really care.

Well, now I have to bridge the network, which I expect to be harder.

I need to create the bridges themselves first, and then set up the routing. Furthermore, I need to switch the clients between two separate routes - with VPN and without VPN. How do I do that? Here's how.

First, let's create the bridge. It's a fairly long chain of commands, but it works...

# add to custom-network-setup

# create a bridge - which is like a virtual switch
${iproute2}/bin/ip link add br0 type bridge
# enable it
${iproute2}/bin/ip link set br0 up

# set bridge ip
${iproute2}/bin/ip addr add 10.10.10.1/24 dev br0
${iproute2}/bin/ip addr add fd01::1/64 dev br0

# create a veth device pair, which is like two ends of a virtual ethernet cable
${iproute2}/bin/ip link add veth-vpn-a type veth peer name veth-vpn-b
# attach the first "end" to br0 by setting it as the master bridge, and enable it
${iproute2}/bin/ip link set veth-vpn-a master br0 up
# move the other end to the vpn namespace
${iproute2}/bin/ip link set veth-vpn-b netns vpn
# turn the other end on
${iproute2}/bin/ip netns exec vpn ${iproute2}/bin/ip link set veth-vpn-b up
# then set the ip
${iproute2}/bin/ip netns exec vpn ${iproute2}/bin/ip addr add 10.10.10.2/24 dev veth-vpn-b
${iproute2}/bin/ip netns exec vpn ${iproute2}/bin/ip addr add fd01::2/64 dev veth-vpn-b

# now do the same for the other namespace
${iproute2}/bin/ip link add veth-wan-a type veth peer name veth-wan-b
${iproute2}/bin/ip link set veth-wan-a master br0 up
${iproute2}/bin/ip link set dev veth-wan-b netns wan
${iproute2}/bin/ip netns exec wan ${iproute2}/bin/ip link set veth-wan-b up
${iproute2}/bin/ip netns exec wan ${iproute2}/bin/ip addr add ${wanGate4}/${bridge4Bits} dev veth-wan-b
${iproute2}/bin/ip netns exec wan ${iproute2}/bin/ip addr add ${wanGate6}/${bridge6Bits} dev veth-wan-b

Hopefully the comments should be clear enough.

Finally, add the following nftables rules to the main config:

ip saddr { 10.10.10.0/24 } jump inbound_lan;
ip6 saddr { fd01::/64 } jump inbound_lan;

These ensure the services running in main namespace are reachable from other namespaces, i. e. other namespaces are considered part of the LAN.

I also added the bridge IP range to Unbound's whitelist, and changed the default DNS from 127.0.0.1 to 10.10.10.1, so the address doesn't depend on the namespace:

# dont use 127.0.0.1 as the nameserver
services.unbound.resolveLocalQueries = false;

networking.resolvconf.extraConfig = ''
  name_servers="10.10.10.1 fd01::1"
'';

Now, looking into the future, let's add src_valid_mark=1 alongside all rp_filter=1 settings we previously set, to make sure any marks we set to route traffic through different tables get accounted for in kernel's reverse path filter. That also probably fixes any other issues related to using Wireguard with rp_path=1.

I now need to set the routes up. custom-network-setup.service runs before bash.service, which means it runs before NixOS assigns ip addresses to any interfaces - so I can't add routes in custom-network-setup.service unless I set the IPs manually. That won't do - I want to let NixOS do as much as possible! So, let's add another service, which will only set the routes up:

custom-network-setup-2 = {
  description = "custom network setup 2";
  wantedBy = [ "network.target" ];
  after = [ "custom-network-setup.service" "network-addresses-lan0.service" ];
  unitConfig = {
    StopWhenUnneeded = true;
  };
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    ExecStart = with pkgs; writeScript "custom-network-setup-2-start" ''
      #! ${bash}/bin/bash
      ${iproute2}/bin/ip -4 route add default via 10.10.10.2
      ${iproute2}/bin/ip -6 route add default via fd01::2
    '';
    ExecStop = with pkgs; writeScript "custom-network-setup-2-stop" ''
      #! ${bash}/bin/bash
      ${iproute2}/bin/ip -4 route del default via 10.10.10.2
      ${iproute2}/bin/ip -6 route del default via fd01::2
    '';
  };
};

Let's execute the commands (You could add an ExecStop rule that reverses the rules, which I did - but I found it pretty unreliable, so I chose to simply reboot whenever I want to apply them instead).

There's still no internet, even though I added the rules! Why?

That's because 10.10.10.2 and 10.10.10.3 are now glorified routers! I only set one router up so far, so I can't really expect 3 routers to work now without changing anything.

I need to create more nftables rules!

I wrote the rules in files/etc/nixos/vpn.conf and /etc/nixos/wan.conf. The rules themselves are basically identical to main rules, except with different interface names and IP addresses.

Now, how do I apply the rules?

I could (probably) do some systemd magic, I could copy-paste NixOS's nftables service definition and add network namespace support. But I prefer to do it the simplest way there is - which is to set up a service myself:

vpn-nftables = {
  after = [ "network.target" "network-online.target" "wireguard-wg0.service" ];
  requires = [ "network-online.target" "wireguard-wg0.service" ];
  wantedBy = [ "default.target" ];
  unitConfig = {
    StopWhenUnneeded = true;
  };
  serviceConfig = {
    Type = "oneshot";
    RemainAfterExit = true;
    NetworkNamespacePath = "/var/run/netns/vpn";
    ExecStart = "${pkgs.nftables}/bin/nft -f /etc/vpn.conf";
    ExecReload = "${pkgs.nftables}/bin/nft -f /etc/vpn.conf";
  };
};

And a similar config for wan-nftables (but without the wireguard-wg0 dependency).

After a reboot, everything will, hopefully, work. If it doesn't work in your case, may God help you...

By "everything" I mean the fact there should finally be internet on the router, and the clients should use the router's default route and connect successfully. But that's not what I wanted, isn't it? I want separate routes for different clients!

To do that, I need to create multiple routing tables. First, let's actually create them. Or, rather, "table" is just a number, tables already exist - let's just give them nice names.

networking.iproute2 = {
  enable = true;
  rttablesExtraConfig = ''
    1 wan_table
    2 vpn_table
  '';
};

Now table 1 will be named wan_table, and table 2 will be named vpn_table! Not sure if a reboot is needed, though I did it anyway for good measure ;)

Let's configure the tables now. First, let's create the rules to forward packets based on a "mark":

# add to an inet table
chain prerouting {
  type filter hook prerouting priority 0; policy accept;
  # set meta mark to conntrack mark (if already set for this connection)
  meta mark set ct mark
  # if already marked, just use that mark
  mark != 0x0 accept
  # set mark to 1
  ip saddr $LAN_SPACE meta mark set 0x2
  ip6 saddr $LAN6_SPACE meta mark set 0x2
  # your rules to choose the route (mark 2 is VPN, mark 1 is no VPN) go here...

  # example to route 10.0.0.5 without vpn:
  ip saddr 10.0.0.5 meta mark set 0x1

  # set conntrack mark (for this connection)
  ct mark set mark 
}

And the commands:

# add to custom-network-setup
${iproute2}/bin/ip rule add fwmark 1 table wan_table
${iproute2}/bin/ip rule add fwmark 2 table vpn_table

...And now actually set the routing tables:

# add to custom-network-setup-2

# set the default route for the tables
${iproute2}/bin/ip -4 route add default via 10.10.10.2 table vpn_table
${iproute2}/bin/ip -6 route add default via fd01::2 table vpn_table
${iproute2}/bin/ip -4 route add default via 10.10.10.3 table wan_table
${iproute2}/bin/ip -6 route add default via fd01::3 table wan_table

# now set the routes *inside* the tables so that the default gateway can even be reached!
# I dont know what any of that means, I just copied it from the default rules on the default routing table
${iproute2}/bin/ip -4 route add 10.10.10.0/24 dev br0 proto kernel scope link src 10.10.10.1 table vpn_table
${iproute2}/bin/ip -6 route add fd01::/64 dev br0 proto kernel metric 256 pref medium table vpn_table
${iproute2}/bin/ip -4 route add 10.10.10.0/24 dev br0 proto kernel scope link src 10.10.10.1 table wan_table
${iproute2}/bin/ip -6 route add fd01::/64 dev br0 proto kernel metric 256 pref medium table wan_table

# Finally, make LAN routable within that table. Dont know what the options mean here either.
${iproute2}/bin/ip -4 route add 10.0.0.0/24 dev lan0 proto kernel scope link src 10.0.0.1 table vpn_table
${iproute2}/bin/ip -6 route add fd00::/64 dev lan0 proto kernel metric 256 pref medium table vpn_table
${iproute2}/bin/ip -4 route add 10.0.0.0/24 dev lan0 proto kernel scope link src 10.0.0.1 table wan_table
${iproute2}/bin/ip -6 route add fd00::/64 dev lan0 proto kernel metric 256 pref medium table wan_table

Let's try rebooting, and then connecting to the internet from an external device again...

Success! For me, anyway. YMMV.

Now, let's do some regular configuration.

CUPS! Who doesn't love printers? I don't!

Personally, I have an HP printer, so I'll have to use the hplip driver.

services.printing = {
  enable = true;
  allowFrom = [ "localhost" lan4Cidr lan6Cidr ];
  browsing = true;
  clientConf = ''
    ServerName router.local
  '';
  defaultShared = true;
  drivers = [ pkgs.hplip ];
  # start on boot, not on socket activation
  startWhenNeeded = false;
};

This should be it! It uses mDNS, and nowadays even Windows supports mDNS printers, so it hopefully should be visible on all other PCs that can connect to mDNS printers (Install Avahi if it's a Linux desktop).

Let's also set fail2ban up!

services.fail2ban = {
  enable = true;
  packageFirewall = pkgs.nftables;
  banaction = "nftables-multiport";
  banaction-allports = "nftables-allport";
};

Since I live in Russia, I want VPN to be used for all clients in case of a site block - even if the traffic for said client is normally routed without VPN. Let's automate it using RosKomSvoboda's API.

First, let's create an IP set for the inet table:

set force_vpn4 {
  type ipv4_addr;
  # allow ip ranges
  flags interval;
  # allow overlapping ip ranges
  auto-merge;
}
set force_vpn6 {
  type ipv6_addr;
  flags interval;
  auto-merge;
}
chain prerouting {
  ...
  ip daddr @force_vpn4 counter meta mark set 0x2
  ip6 daddr @force_vpn6 counter meta mark set 0x2
  ct mark set mark
}

Unlike OpenBSD's pf, Linux's NetFilter doesn't hold a persistent state. Which is fine with me, but it's something to keep in mind anyway - you have apply the rules again after every reboot. I opted to do it with a daily timer instead, I don't care if the rules get applied immediately.

systemd.services.update-rkn-blacklist =
  let updateRknBlacklist = with pkgs; writeScript "update-rkn-blacklist" ''
    #! ${bash}/bin/bash
    BLACKLIST=$(${coreutils}/bin/mktemp) || exit 1
    RULESET=$(${coreutils}/bin/mktemp) || exit 1

    ${curl}/bin/curl "https://reestr.rublacklist.net/api/v2/ips/csv/" > $BLACKLIST || (${coreutils}/bin/rm $BLACKLIST && exit 1) || exit 1
    ${coreutils}/bin/echo "add element inet global force_vpn4 {" > $RULESET || (${coreutils}/bin/rm $BLACKLIST && exit 1) || exit 1
    ${gnugrep}/bin/grep '\.' $BLACKLIST >> $RULESET
    ${coreutils}/bin/echo "};" >> $RULESET
    ${coreutils}/bin/echo "add element inet global force_vpn6 {" >> $RULESET
    ${gnugrep}/bin/grep '\:' $BLACKLIST >> $RULESET
    ${coreutils}/bin/echo "};" >> $RULESET
    ${coreutils}/bin/rm $BLACKLIST
    ${nftables}/bin/nft -f $RULESET || (${coreutils}/bin/rm $RULESET && exit 1) || exit 1
    ${coreutils}/bin/rm $RULESET
    exit 0
  '';
in {
  serviceConfig = {
    Type = "oneshot";
    ExecStart = updateRknBlacklist;
  };
};
systemd.timers.update-rkn-blacklist = {
  wantedBy = [ "timers.target" ];
  partOf = [ "update-rkn-blacklist.service" ];
  # Use slightly unusual time to reduce network load,
  # since most people probably set their timers at :00
  timerConfig.OnCalendar = [ "*-*-* *:00:20" ];
};

With this, all IP's from the registry should be automatically routed via VPN! The registry actually has some outdated IP data, because some of the domains in the registry changed their IP address since the last time they were resolved - maybe at some point I'll write an Unbound plugin to check whether a domain is blocked and add it to the list. For now though, I'm content with this solution.

Wait, let's try IPv6, I set that up, right?

...Nope, doesn't connect, what's wrong? Let's ping the VPN gateway via IPv6... It works. Wait, it started connecting properly again?

...Turns out IPv6 doesn't work unless I constantly ping the gateway. More specifically, packets get sent - but the response gets dropped after the "forwarding" stage. I could try to figure out why that happens - but I'm too tired to do that at this point, so I'll just use this hack:

systemd.services = {
  ping-ipv6 = {
    after = [ "network.target" "network-online.target" ];
    wantedBy = [ "default.target" ];
    serviceConfig = {
      ExecStart = "${pkgs.iputils}/bin/ping fd01::2";
      Restart = "on-failure";
      RestartSec = "30s";
    };
  };
  # Just in case... what if IPv4 actually has the
  # same problem, but is simply being used often
  # enough for me not to notice?
  ping-ipv4 = {
    after = [ "network.target" "network-online.target" ];
    wantedBy = [ "default.target" ];
    serviceConfig = {
      ExecStart = "${pkgs.iputils}/bin/ping 10.10.10.2";
      Restart = "on-failure";
      RestartSec = "30s";
    };
  };
  ...
};

I would be grateful if somebody told me why that happens and how to fix it. My suspicion is that somehow it can't really trace the route back, but by pinging it I'm "reminding" it of that route - but I'm really not a Linux networking expert to tell if that's indeed the case,

Wrapping up

Don't forget to run nix-collect-garbage -d to clean up old system generations! Personally, it took me 200 generations to finish setting it up, the command cleaned up 2900 store paths and 800MB.

With this, the configuration is finally complete! I did some cleaning up at the end, you can check out the final (for now) result here. I'm a NixOS beginner so I didn't do anything special with it, it's just a single file with a few options to change at the beginning.

Was NixOS easier than other Linux distros for the purpose of setting up a router? Perhaps, perhaps not. What really matters is that by setting up this one router, I also set up all of my future routers at the same time - and that's what I really love about NixOS.

The file doesn't have many options to change, but if you read this article carefully you should be able to mend the config into whatever you want, or even try and build your own router! Regardless, so far the results have been great for me.

Speaking of NixOS configs - this server is running NixOS as well - so feel free to check that out too! Just recently I set up an authoritative DNS server, so now even DNS records are part of that configuration LOL.


Have any comments, questions, feedback? You can click here to leave it, publicly or privately!