margaret
Margaret is the name of my new core router in the home lab. It is named after:
Margaret Elaine Hamilton) (née Heafield; born August 17, 1936) is an American computer scientist, systems engineer, and business owner. She was director of the Software Engineering Division of the MIT Instrumentation Laboratory, which developed on-board flight software for NASA's Apollo program. She later founded two software companies—Higher Order Software in 1976 and Hamilton Technologies in 1986, both in Cambridge, Massachusetts.
Hamilton has published more than 130 papers, proceedings, and reports, about sixty projects, and six major programs. She invented the term "software engineering". -- Wikipedia)
Hamilton wrote the software that landed men on the moon, yet no woman has yet to have that privilege.
I began to use the term 'software engineering' to distinguish it from hardware and other kinds of engineering, yet treat each type of engineering as part of the overall systems engineering process. -- Margaret Hamilton
Specifications
The machine is currently implemented using a Protectli FW2B with the following specifications:
- Processor: Intel J3060 (64 Bit, 1.6 GHz, Turbo 2.48 GHz, 2MB L2 Cache)
- Processor Cores: 2
- Network: 2x Intel 1G Ethernet, RJ-45
- Graphics: Intel Clear Video HD, 2x HDMI 1.4
- Audio: HDMI, 1x 3.5mm Audio Jack
- Memory: 1x SO-DIMM DDR3L-1600, 1.35v, 8GB (max)
- Storage: 1x mSATA (Protectli 64GB SSD)
- Optional Storage: 1x Internal SATA 3.0 (unused)
- 2x USB 3.0 Type A, 4x USB 2.0 Type A
- 2x HDMI
- 2x WiFi/LTE Antenna Mounting Holes
- 1x 12V DC Power Jack
- 1x Full Height mPCIe (USB/PCIe 2.0) for WiFi or LTE
- 1x USB 2.0 Header
- 1x CMOS Reset (2 pin)
- 1x CPU Fan Header (4 pin)
- 1x Front Panel Header (9 pin)
- BIOS: coreboot v4.9.0.3, SeaBIOS 0.4-0-g5137b91
- Indicators: 1x LED Power Button (Blue), 1x LED Power Indicator (Green), 1x LED Disk Activity Indicator (Red), 1x LED Disk Activity Indicator (Yellow)
- Power Usage: Max 16W
- Chassis: Fanless, Aluminum, Black, 4.5 x 4.3 x 1.5 in, 115 x 107.5 x 39 mm
- Mounting Options: Desktop, VESA Bracket, Optional 1RU Rack Mount
- Weight: 1.1 lbs, .50 Kg
- Operating Temperature: +14° - +122° F, -10° - +50° C
- Operating Humidity: 0 – 95% relative humidity, non-condensing
- Approvals: UL (Power Supply), FCC Part 15 Class B, CE, RoHS
- Country of Origin: Made in China, Assembled in USA
It's basically a small black box with two network ports, 8GB of RAM, 64GB of storage, and that's it.
Bootstrapping
To boot from the USB stick, I stuck a cable in the serial console port with a DB9 to USB-A adapter, then booted the machine. I got served with the prompt, which looked like this after pressing F11:
SeaBIOS (version v1.0.4-0-g5137b91)
coreboot version v4.9.0.3
Press F11 key for boot menu
Select boot device:
1. AHCI/0: Protectli 64GB mSATA ATA-11 Hard-Disk (61057 MiBytes)
That is: I didn't see the USB stick. Strangely, moving it to the bottom USB port then worked, after rebooting:
SeaBIOS (version v1.0.4-0-g5137b91)
coreboot version v4.9.0.3
Press F11 key for boot menu
Select boot device:
1. AHCI/0: Protectli 64GB mSATA ATA-11 Hard-Disk (61057 MiBytes)
2. USB MSC Drive Kingston DataTraveler 3.0
Booting from Hard Disk...
ISOLINUX 6.04 20200816 EHDD Copyright (C) 1994-2015 H. Peter Anvin et al
But then I was stuck at that prompt. grml, or more exactly ISOLINUX, somehow didn't manage to display its terminal properly. I could see the cursor moving, but it would just display a blank screen.
The grml cheatcodes say you should
just be able to type serial
then ENTER but that didn't
work in my tests. I had to type TAB then SPACE
then console=ttyS0,115200n8
to get the serial console to work.
Obviously, the machine also ships with HDMI and USB, so I could have used a monitor instead, but I wanted to test the serial console... Not sure if this is a bug in the serial console or the (coreboot) BIOS.
Installation
For installation, I'm reusing the installer I built for Tor, from the fabric-tasks repository.
I first configure the network over DHCP with:
netcardconfig
that could have simply been:
killall dhclient ; dhclient -d eth0 &
... as well.
Then I set my SSH key:
cat > ~/.ssh/authorized_keys
service ssh restart
... and dump the host keys for Fabric to use after:
for key in /etc/ssh/ssh_host_*_key; do
ssh-keygen -E md5 -l -f $key
done
There's a bug in the installer that doesn't use the right format anymore, so we need to remove some colons, the magic incantation is now:
./install -H root@192.168.0.221 \
--fingerprint b41e:db22:5576:2168:e694:bb59:6934:cad2 \
hetzner-robot \
--fqdn=margaret.torproject.org \
--fai-disk-config=installer/disk-config/single-disk-plaintext \
--package-list=installer/packages \
--post-scripts-dir=installer/post-scripts/ \
--ipv4-address 192.168.0.2 \
--ipv4-subnet 24 \
--ipv4-gateway 192.168.0.1
The resulting SSH keys were:
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFAtMq1j+tznnC0Tlf3oYtlyY28yMELX7E0tVAyOHlvv+Wvr+1sGbHq3fHG+qBvzjcKZz+KJzqKlgfc+zfGl4d8= root@margaret
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILjhQGDtv+c8zOkdJe8OR5483QbZeA8jEaKS7PZKhnLS root@margaret
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCteIzijiRrq0P94WeZ8F0GS64u58zA+9Alpc9OfPbKST7oSjmml/u30bIJxDGuumVp9dbVqRcp54lK/DUHGBacJQdNy4kV0tacQbUr6F2LNXlZWgSafYoLaZdywi3hcTqjtNxl/j/8iIVp8xBk0dcXAOfaH9726dkaSZOUUBQoG6itsefW1kg0QIfVU5kNhvWkO9BSRvXjGNx/KmVzCK9vvRDrQsZSp41uRIJTS5sSd8UkT5/qsfbx3zyRMfqHZVAu50FV+P2IZVt3UN3Qxqys+oqmL6GB0GhT6kPAP65Ja2xPeexptjBE1ddOPi/YPKpM8NqrqlH8FGG/dIkTk4WqaT5kdD7j09GjUAD49HFtymxSy0JJZ4awTH9FE+mliOzqUqJJZub9cTtuICy4dxKcUFqpm3Wf5XtZl7YBIkJC/zEhE68JXXxxsRhUREECXL8/4ERLmTw2TnedoI5Pt58JUcVNoplo/pL27fx4OGF3bVFJZQSlLlmdNR/Xd+ElrXs= root@margaret
This required a lot of changes to the installer, basically all commits on January 21st were for this project.
Remaining work
- puppet bootstrap ✅
- systemd-networkd ✅
- dump fw rules on belleville ✅ and omnia ✅
- firewall with nftables (forwarding and NAT) ✅
- DHCP assignments and configuration (dnsmasq) ✅
- recursive DNS (dnsmasq) ✅
- bufferbloat tests ✅
- swap omnia and router ✅ (see octavia for part of that procedure)
- re-do bufferbloat tests ✅
- micah backup move ✅
- forward tests ✅
- reflection ✅ (upstream discussion)
- APU move
- marcos move
- monitoring: octavia (collectd + prometheus?) and margaret
- mail relay
Tests to run
From octavia:
Web, on another host:
curl https://anarc.at/
If no shell access, try Webbloatscore since it produces a screenshot.
DNS: local network should resolve locally, outside should show CNAMEs
$ host shell.anarc.at shell.anarc.at is an alias for marcos.anarc.at. marcos.anarc.at has address 206.248.172.91
SSH: test if we can reach the inside server from the outside of the network (and not the router)
$ nc -v shell.anarc.at 22 Connection to shell.anarc.at 22 port [tcp/ssh] succeeded! SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
Other ports are assumed to be correctly configured unless otherwise noticed during later use.
failed flent bufferbloat tests
OpenWRT has this guide to configure SQM to fight bufferbloat.
octavia was tested with https://dslreports.com/speedtest but those don't seem to work anymore, https://fast.com/ and https://speed.cloudflare.com/ both feature latency tests. before deployment, both show a +40-50ms bufferbloat on download, less so on upload, but that test was done over wifi...
The octavia page also recommends flent as a testing tool:
flent rrul netperf.bufferbloat.net
... but here it fails with a large amount of error messages, ending with:
ERROR: No data to aggregate. Run with -L and check log file to investigate.
And that's after installing netperf
it complains is missing
otherwise.
Installing it on the pristine router yields:
0 upgraded, 390 newly installed, 0 to remove and 0 not upgraded.
Need to get 309 MB of archives.
After this operation, 1,425 MB of additional disk space will be used.
And that doesn't include netperf
.
bufferbloat tests
Tests were performed on https://speed.cloudflare.com/ using Firefox ESR 115.6 on Debian bookworm around 2024-01-23 and -24, with too many tabs opened (so possibly interference).
Test | Down | Up | Latency | Loaded | Jitter | Loaded | Notes |
---|---|---|---|---|---|---|---|
octavia | 133 | 22.2 | 20.5 | 34.5/53.5 | 4.32 | 17.9/11.2 | old setup CSV |
margaret-staging | 131 | 61.2 | 21 | 40.5/56.5 | 2.58 | 14.4/5.53 | margaret as a client CSV |
margaret-prod | 131 | 33.7 | 20.0 | 42.5/72.0 | 3.37 | 12.8/4.32 | margaret as a router CSV |
margaret-direct | 132 | 22.1 | 20.0 | 42.5/26.5 | 3.63 | 14.1/5.79 | same, no switch CSV |
direct-2.5g | 131 | 21.7 | 21.0 | 39.5/26.0 | 3.74 | 12.8/19.3 | same, framework card CSV |
All tests reported 0% packet loss. Down/Up is bandwidth in megabit per second. Latency/Loaded/Jitter is in milliseconds. "Loaded" is the column to the left when downloading/uploading (e.g. first one is the latency, in millisecond, when downloading / uploading content).
Observations:
- ingress bandwidth is steady at 131-133 mbps
- egress bandwidth is strangely variable, spike at 61mbps particularly odd
- latency stable at 20-21ms, within 1ms of each other
- ingress load is +20ms buffer bloat, possibly +5ms with margaret
- egress is much more variable, +30ms on octavia, first test +50ms but direct tests are only +6ms
- ISP package is 120/20, so bandwidth is +10% more than spec
Theories:
- switch is crap and possibly introduces +50ms buffer bloat on upload
- cable modem introduces +15-20ms buffer bloat on download
- margaret reduces egress buffer bloat down to 6ms
- margaret increases ingress bloat by +5-7ms
Disaster recovery procedures
We do not have an exact replica of the ProtectLI to recover the network in case of a hardware failure. Two options are possible:
- deploy the Omnia, rolling back to a previous snapshot from before ~2024-01-23
- deploy
dal-rescue-02
, reinstalling it from scratch with theprofile::router
Puppet class - use any darn PC with two network cards with the above
Alternatives
In the wifi replacement project, I evaluated a bunch of options for core router replacement.
Qotom
Qotom might be cheaper, and the Q190G4U S01 is about as simple as it gets, but it means buying on Amazon.com which refuses to ship to Canada for this product, or Aliexpress (280$, so not actually cheaper). Problem with Qotom is their model line is utterly confusing, for example I found the above on their site, but Aliexpress has this model for 233$. Go figure. Serve the home has a good review of the Qotom Q20332G9-S10 (4x2.5G 4xSFP+ 10G). I was also recommended this 4x2.5G router.
They have been negatively reviewed on OpenWRT forums.
Turris
The Turris Omnia is the device that was used as a core router before (octavia), so getting a second device might have made sense here. Unfortunately, they were too hard to find, (e.g. B/O at Amazon).
Turris are saying they will publish a new "entreprise-ready" board soon, in the meantime Discomp has some in stock and should ship internationally for 262.95EUR or 390$CAD, quite reasonable, actually...
Others
If we fail to get an Omnia, we need to find a OpenWRT-supported SFP router. The MicroTik hAP ac maybe? Nope. Other options:
- SuperMicro has a series they call "IoT", e.g. 2 gbit 2SFP Xeon SATA PCIe, a bit overkill, and not enough ports to act as a switch
- Protectli has interesting series, e.g. 4x2.5gbit switch + wifi and coreboot, but no SFP (that's what we ended up going with here)
- Qotom has a 4xSFP+ 5x2.5gbit beast, but no wifi
- Mikrotik has sturdy routers and switches, the latter are often locked in their proprietary hardware, but their routers are a little better, e.g. noodles says he uses a mikrotik RB5009 in this blog post about DNS, but, surprisingly, i don't see any Mikrotik entry in InstallingDebianOn. in this post noodles says the mikrotik run mainline, so that's really encouraging
One option is to move the Omnia to the office and replace the core router with something beefier, and add a new AP downstairs.
Another Omnia replacement is the replacement Sophos series, which we were recommended the Sophos 105w Rev 3 and so on. It's surprisingly similar to the Omnia...
This 2021 review also includes Protectli and Qotom products, among others.