Margaret is the name of my new core router in the home lab. It is named after:

Margaret Elaine Hamilton) (née Heafield; born August 17, 1936) is an American computer scientist, systems engineer, and business owner. She was director of the Software Engineering Division of the MIT Instrumentation Laboratory, which developed on-board flight software for NASA's Apollo program. She later founded two software companies—Higher Order Software in 1976 and Hamilton Technologies in 1986, both in Cambridge, Massachusetts.

Hamilton has published more than 130 papers, proceedings, and reports, about sixty projects, and six major programs. She invented the term "software engineering". -- Wikipedia)

Hamilton wrote the software that landed men on the moon, yet no woman has yet to have that privilege.

I began to use the term 'software engineering' to distinguish it from hardware and other kinds of engineering, yet treat each type of engineering as part of the overall systems engineering process. -- Margaret Hamilton

Specifications

The machine is currently implemented using a Protectli FW2B with the following specifications:

It's basically a small black box with two network ports, 8GB of RAM, 64GB of storage, and that's it.

Bootstrapping

To boot from the USB stick, I stuck a cable in the serial console port with a DB9 to USB-A adapter, then booted the machine. I got served with the prompt, which looked like this after pressing F11:

SeaBIOS (version v1.0.4-0-g5137b91)
coreboot version v4.9.0.3
Press F11 key for boot menu
Select boot device:

1. AHCI/0: Protectli 64GB mSATA ATA-11 Hard-Disk (61057 MiBytes)

That is: I didn't see the USB stick. Strangely, moving it to the bottom USB port then worked, after rebooting:

SeaBIOS (version v1.0.4-0-g5137b91)
coreboot version v4.9.0.3
Press F11 key for boot menu
Select boot device:

1. AHCI/0: Protectli 64GB mSATA ATA-11 Hard-Disk (61057 MiBytes)
2. USB MSC Drive Kingston DataTraveler 3.0

Booting from Hard Disk...

ISOLINUX 6.04 20200816 EHDD Copyright (C) 1994-2015 H. Peter Anvin et al

But then I was stuck at that prompt. grml, or more exactly ISOLINUX, somehow didn't manage to display its terminal properly. I could see the cursor moving, but it would just display a blank screen.

The grml cheatcodes say you should just be able to type serial then ENTER but that didn't work in my tests. I had to type TAB then SPACE then console=ttyS0,115200n8 to get the serial console to work.

Obviously, the machine also ships with HDMI and USB, so I could have used a monitor instead, but I wanted to test the serial console... Not sure if this is a bug in the serial console or the (coreboot) BIOS.

Installation

For installation, I'm reusing the installer I built for Tor, from the fabric-tasks repository.

I first configure the network over DHCP with:

netcardconfig

that could have simply been:

killall dhclient ; dhclient -d eth0 &

... as well.

Then I set my SSH key:

cat > ~/.ssh/authorized_keys
service ssh restart

... and dump the host keys for Fabric to use after:

for key in /etc/ssh/ssh_host_*_key; do
    ssh-keygen -E md5 -l -f $key
done

There's a bug in the installer that doesn't use the right format anymore, so we need to remove some colons, the magic incantation is now:

./install -H root@192.168.0.221 \
          --fingerprint b41e:db22:5576:2168:e694:bb59:6934:cad2 \
          hetzner-robot \
          --fqdn=margaret.torproject.org \
          --fai-disk-config=installer/disk-config/single-disk-plaintext \
          --package-list=installer/packages \
          --post-scripts-dir=installer/post-scripts/ \
          --ipv4-address 192.168.0.2 \
          --ipv4-subnet 24 \
          --ipv4-gateway 192.168.0.1

The resulting SSH keys were:

ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFAtMq1j+tznnC0Tlf3oYtlyY28yMELX7E0tVAyOHlvv+Wvr+1sGbHq3fHG+qBvzjcKZz+KJzqKlgfc+zfGl4d8= root@margaret
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILjhQGDtv+c8zOkdJe8OR5483QbZeA8jEaKS7PZKhnLS root@margaret
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCteIzijiRrq0P94WeZ8F0GS64u58zA+9Alpc9OfPbKST7oSjmml/u30bIJxDGuumVp9dbVqRcp54lK/DUHGBacJQdNy4kV0tacQbUr6F2LNXlZWgSafYoLaZdywi3hcTqjtNxl/j/8iIVp8xBk0dcXAOfaH9726dkaSZOUUBQoG6itsefW1kg0QIfVU5kNhvWkO9BSRvXjGNx/KmVzCK9vvRDrQsZSp41uRIJTS5sSd8UkT5/qsfbx3zyRMfqHZVAu50FV+P2IZVt3UN3Qxqys+oqmL6GB0GhT6kPAP65Ja2xPeexptjBE1ddOPi/YPKpM8NqrqlH8FGG/dIkTk4WqaT5kdD7j09GjUAD49HFtymxSy0JJZ4awTH9FE+mliOzqUqJJZub9cTtuICy4dxKcUFqpm3Wf5XtZl7YBIkJC/zEhE68JXXxxsRhUREECXL8/4ERLmTw2TnedoI5Pt58JUcVNoplo/pL27fx4OGF3bVFJZQSlLlmdNR/Xd+ElrXs= root@margaret

This required a lot of changes to the installer, basically all commits on January 21st were for this project.

Remaining work

  1. puppet bootstrap ✅
  2. systemd-networkd ✅
  3. dump fw rules on belleville ✅ and omnia ✅
  4. firewall with nftables (forwarding and NAT) ✅
  5. DHCP assignments and configuration (dnsmasq) ✅
  6. recursive DNS (dnsmasq) ✅
  7. bufferbloat tests ✅
  8. swap omnia and router ✅ (see octavia for part of that procedure)
  9. re-do bufferbloat tests ✅
  10. micah backup move ✅
  11. forward tests ✅
  12. reflection ✅ (upstream discussion)
  13. APU move
  14. marcos move
  15. monitoring: octavia (collectd + prometheus?) and margaret
  16. mail relay

Tests to run

From octavia:

  1. Web, on another host:

    curl https://anarc.at/
    

    If no shell access, try Webbloatscore since it produces a screenshot.

  2. DNS: local network should resolve locally, outside should show CNAMEs

    $ host shell.anarc.at
    shell.anarc.at is an alias for marcos.anarc.at.
    marcos.anarc.at has address 206.248.172.91
    
  3. SSH: test if we can reach the inside server from the outside of the network (and not the router)

    $ nc -v shell.anarc.at 22
    Connection to shell.anarc.at 22 port [tcp/ssh] succeeded!
    SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
    

Other ports are assumed to be correctly configured unless otherwise noticed during later use.

failed flent bufferbloat tests

OpenWRT has this guide to configure SQM to fight bufferbloat.

octavia was tested with https://dslreports.com/speedtest but those don't seem to work anymore, https://fast.com/ and https://speed.cloudflare.com/ both feature latency tests. before deployment, both show a +40-50ms bufferbloat on download, less so on upload, but that test was done over wifi...

The octavia page also recommends flent as a testing tool:

flent rrul netperf.bufferbloat.net

... but here it fails with a large amount of error messages, ending with:

ERROR: No data to aggregate. Run with -L and check log file to investigate.

And that's after installing netperf it complains is missing otherwise.

Installing it on the pristine router yields:

0 upgraded, 390 newly installed, 0 to remove and 0 not upgraded.
Need to get 309 MB of archives.
After this operation, 1,425 MB of additional disk space will be used.

And that doesn't include netperf.

bufferbloat tests

Tests were performed on https://speed.cloudflare.com/ using Firefox ESR 115.6 on Debian bookworm around 2024-01-23 and -24, with too many tabs opened (so possibly interference).

Test Down Up Latency Loaded Jitter Loaded Notes
octavia 133 22.2 20.5 34.5/53.5 4.32 17.9/11.2 old setup CSV
margaret-staging 131 61.2 21 40.5/56.5 2.58 14.4/5.53 margaret as a client CSV
margaret-prod 131 33.7 20.0 42.5/72.0 3.37 12.8/4.32 margaret as a router CSV
margaret-direct 132 22.1 20.0 42.5/26.5 3.63 14.1/5.79 same, no switch CSV
direct-2.5g 131 21.7 21.0 39.5/26.0 3.74 12.8/19.3 same, framework card CSV

All tests reported 0% packet loss. Down/Up is bandwidth in megabit per second. Latency/Loaded/Jitter is in milliseconds. "Loaded" is the column to the left when downloading/uploading (e.g. first one is the latency, in millisecond, when downloading / uploading content).

Observations:

Theories:

Disaster recovery procedures

We do not have an exact replica of the ProtectLI to recover the network in case of a hardware failure. Two options are possible:

Alternatives

In the wifi replacement project, I evaluated a bunch of options for core router replacement.

Qotom

Qotom might be cheaper, and the Q190G4U S01 is about as simple as it gets, but it means buying on Amazon.com which refuses to ship to Canada for this product, or Aliexpress (280$, so not actually cheaper). Problem with Qotom is their model line is utterly confusing, for example I found the above on their site, but Aliexpress has this model for 233$. Go figure. Serve the home has a good review of the Qotom Q20332G9-S10 (4x2.5G 4xSFP+ 10G). I was also recommended this 4x2.5G router.

They have been negatively reviewed on OpenWRT forums.

Turris

The Turris Omnia is the device that was used as a core router before (octavia), so getting a second device might have made sense here. Unfortunately, they were too hard to find, (e.g. B/O at Amazon).

Turris are saying they will publish a new "entreprise-ready" board soon, in the meantime Discomp has some in stock and should ship internationally for 262.95EUR or 390$CAD, quite reasonable, actually...

Others

If we fail to get an Omnia, we need to find a OpenWRT-supported SFP router. The MicroTik hAP ac maybe? Nope. Other options:

One option is to move the Omnia to the office and replace the core router with something beefier, and add a new AP downstairs.

Another Omnia replacement is the replacement Sophos series, which we were recommended the Sophos 105w Rev 3 and so on. It's surprisingly similar to the Omnia...

This 2021 review also includes Protectli and Qotom products, among others.

Created . Edited .