Switching from OpenNTPd to Chrony
A friend recently reminded me of the existence of chrony, a "versatile implementation of the Network Time Protocol (NTP)". The excellent introduction is worth quoting in full:
It can synchronise the system clock with NTP servers, reference clocks (e.g. GPS receiver), and manual input using wristwatch and keyboard. It can also operate as an NTPv4 (RFC 5905) server and peer to provide a time service to other computers in the network.
It is designed to perform well in a wide range of conditions, including intermittent network connections, heavily congested networks, changing temperatures (ordinary computer clocks are sensitive to temperature), and systems that do not run continuosly, or run on a virtual machine.
Typical accuracy between two machines synchronised over the Internet is within a few milliseconds; on a LAN, accuracy is typically in tens of microseconds. With hardware timestamping, or a hardware reference clock, sub-microsecond accuracy may be possible.
Now that's already great documentation right there. What it is, why
it's good, and what to expect from it. I want more. They have a very
handy comparison table between chrony
, ntp and
openntpd.
My problem with OpenNTPd
Following concerns surrounding the security (and complexity) of the
venerable ntp
program, I have, a long time ago, switched to using
openntpd on all my computers. I hadn't thought about it until I
recently noticed a lot of noise on one of my servers:
jan 18 10:09:49 curie ntpd[1069]: adjusting local clock by -1.604366s
jan 18 10:08:18 curie ntpd[1069]: adjusting local clock by -1.577608s
jan 18 10:05:02 curie ntpd[1069]: adjusting local clock by -1.574683s
jan 18 10:04:00 curie ntpd[1069]: adjusting local clock by -1.573240s
jan 18 10:02:26 curie ntpd[1069]: adjusting local clock by -1.569592s
You read that right, openntpd
was constantly rewinding the clock,
sometimes in less than two minutes. The above log was taken while
doing diagnostics, looking at the last 30 minutes of logs. So, on
average, one 1.5 seconds rewind per 6 minutes!
That might be due to a dying real time clock (RTC) or some other
hardware problem. I know for a fact that the CMOS battery on that
computer (curie) died and I wasn't able to replace
it (!). So that's partly garbage-in, garbage-out here. But still, I
was curious to see how chrony
would behave... (Spoiler: much better.)
But I also had trouble on another workstation, that one a much more recent machine (angela). First, it seems OpenNTPd would just fail at boot time:
anarcat@angela:~(main)$ sudo systemctl status openntpd
● openntpd.service - OpenNTPd Network Time Protocol
Loaded: loaded (/lib/systemd/system/openntpd.service; enabled; vendor pres>
Active: inactive (dead) since Sun 2022-01-23 09:54:03 EST; 6h ago
Docs: man:openntpd(8)
Process: 3291 ExecStartPre=/usr/sbin/ntpd -n $DAEMON_OPTS (code=exited, sta>
Process: 3294 ExecStart=/usr/sbin/ntpd $DAEMON_OPTS (code=exited, status=0/>
Main PID: 3298 (code=exited, status=0/SUCCESS)
CPU: 34ms
jan 23 09:54:03 angela systemd[1]: Starting OpenNTPd Network Time Protocol...
jan 23 09:54:03 angela ntpd[3291]: configuration OK
jan 23 09:54:03 angela ntpd[3297]: ntp engine ready
jan 23 09:54:03 angela ntpd[3297]: ntp: recvfrom: Permission denied
jan 23 09:54:03 angela ntpd[3294]: Terminating
jan 23 09:54:03 angela systemd[1]: Started OpenNTPd Network Time Protocol.
jan 23 09:54:03 angela systemd[1]: openntpd.service: Succeeded.
After a restart, somehow it worked, but it took a long time to sync the clock. At first, it would just not consider any peer at all:
anarcat@angela:~(main)$ sudo ntpctl -s all
0/20 peers valid, clock unsynced
peer
wt tl st next poll offset delay jitter
159.203.8.72 from pool 0.debian.pool.ntp.org
1 5 2 6s 6s ---- peer not valid ----
138.197.135.239 from pool 0.debian.pool.ntp.org
1 5 2 6s 7s ---- peer not valid ----
216.197.156.83 from pool 0.debian.pool.ntp.org
1 4 1 2s 9s ---- peer not valid ----
142.114.187.107 from pool 0.debian.pool.ntp.org
1 5 2 5s 6s ---- peer not valid ----
216.6.2.70 from pool 1.debian.pool.ntp.org
1 4 2 2s 8s ---- peer not valid ----
207.34.49.172 from pool 1.debian.pool.ntp.org
1 4 2 0s 5s ---- peer not valid ----
198.27.76.102 from pool 1.debian.pool.ntp.org
1 5 2 5s 5s ---- peer not valid ----
158.69.254.196 from pool 1.debian.pool.ntp.org
1 4 3 1s 6s ---- peer not valid ----
149.56.121.16 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
162.159.200.123 from pool 2.debian.pool.ntp.org
1 4 3 1s 6s ---- peer not valid ----
206.108.0.131 from pool 2.debian.pool.ntp.org
1 4 1 6s 9s ---- peer not valid ----
205.206.70.40 from pool 2.debian.pool.ntp.org
1 5 2 8s 9s ---- peer not valid ----
2001:678:8::123 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
2606:4700:f1::1 from pool 2.debian.pool.ntp.org
1 4 3 2s 6s ---- peer not valid ----
2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org
1 4 2 5s 9s ---- peer not valid ----
2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org
1 4 4 1s 6s ---- peer not valid ----
209.115.181.110 from pool 3.debian.pool.ntp.org
1 5 2 5s 6s ---- peer not valid ----
205.206.70.42 from pool 3.debian.pool.ntp.org
1 4 2 0s 6s ---- peer not valid ----
68.69.221.61 from pool 3.debian.pool.ntp.org
1 4 1 2s 9s ---- peer not valid ----
162.159.200.1 from pool 3.debian.pool.ntp.org
1 4 3 4s 7s ---- peer not valid ----
Then it would accept them, but still wouldn't sync the clock:
anarcat@angela:~(main)$ sudo ntpctl -s all
20/20 peers valid, clock unsynced
peer
wt tl st next poll offset delay jitter
159.203.8.72 from pool 0.debian.pool.ntp.org
1 8 2 5s 6s 0.672ms 13.507ms 0.442ms
138.197.135.239 from pool 0.debian.pool.ntp.org
1 7 2 4s 8s 1.260ms 13.388ms 0.494ms
216.197.156.83 from pool 0.debian.pool.ntp.org
1 7 1 3s 5s -0.390ms 47.641ms 1.537ms
142.114.187.107 from pool 0.debian.pool.ntp.org
1 7 2 1s 6s -0.573ms 15.012ms 1.845ms
216.6.2.70 from pool 1.debian.pool.ntp.org
1 7 2 3s 8s -0.178ms 21.691ms 1.807ms
207.34.49.172 from pool 1.debian.pool.ntp.org
1 7 2 4s 8s -5.742ms 70.040ms 1.656ms
198.27.76.102 from pool 1.debian.pool.ntp.org
1 7 2 0s 7s 0.170ms 21.035ms 1.914ms
158.69.254.196 from pool 1.debian.pool.ntp.org
1 7 3 5s 8s -2.626ms 20.862ms 2.032ms
149.56.121.16 from pool 2.debian.pool.ntp.org
1 7 2 6s 8s 0.123ms 20.758ms 2.248ms
162.159.200.123 from pool 2.debian.pool.ntp.org
1 8 3 4s 5s 2.043ms 14.138ms 1.675ms
206.108.0.131 from pool 2.debian.pool.ntp.org
1 6 1 0s 7s -0.027ms 14.189ms 2.206ms
205.206.70.40 from pool 2.debian.pool.ntp.org
1 7 2 1s 5s -1.777ms 53.459ms 1.865ms
2001:678:8::123 from pool 2.debian.pool.ntp.org
1 6 2 1s 8s 0.195ms 14.572ms 2.624ms
2606:4700:f1::1 from pool 2.debian.pool.ntp.org
1 7 3 6s 9s 2.068ms 14.102ms 1.767ms
2607:5300:205:200::1991 from pool 2.debian.pool.ntp.org
1 6 2 4s 9s 0.254ms 21.471ms 2.120ms
2607:5300:201:3100::345c from pool 2.debian.pool.ntp.org
1 7 4 5s 9s -1.706ms 21.030ms 1.849ms
209.115.181.110 from pool 3.debian.pool.ntp.org
1 7 2 0s 7s 8.907ms 75.070ms 2.095ms
205.206.70.42 from pool 3.debian.pool.ntp.org
1 7 2 6s 9s -1.729ms 53.823ms 2.193ms
68.69.221.61 from pool 3.debian.pool.ntp.org
1 7 1 1s 7s -1.265ms 46.355ms 4.171ms
162.159.200.1 from pool 3.debian.pool.ntp.org
1 7 3 4s 8s 1.732ms 35.792ms 2.228ms
It took a solid five minutes to sync the clock, even though the peers were considered valid within a few seconds:
jan 23 15:58:41 angela systemd[1]: Started OpenNTPd Network Time Protocol.
jan 23 15:58:58 angela ntpd[84086]: peer 142.114.187.107 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 198.27.76.102 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 207.34.49.172 now valid
jan 23 15:58:58 angela ntpd[84086]: peer 209.115.181.110 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 159.203.8.72 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 138.197.135.239 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 162.159.200.123 now valid
jan 23 15:58:59 angela ntpd[84086]: peer 2607:5300:201:3100::345c now valid
jan 23 15:59:00 angela ntpd[84086]: peer 2606:4700:f1::1 now valid
jan 23 15:59:00 angela ntpd[84086]: peer 158.69.254.196 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 216.6.2.70 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 68.69.221.61 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.40 now valid
jan 23 15:59:01 angela ntpd[84086]: peer 205.206.70.42 now valid
jan 23 15:59:02 angela ntpd[84086]: peer 162.159.200.1 now valid
jan 23 15:59:04 angela ntpd[84086]: peer 216.197.156.83 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 206.108.0.131 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 2001:678:8::123 now valid
jan 23 15:59:05 angela ntpd[84086]: peer 149.56.121.16 now valid
jan 23 15:59:07 angela ntpd[84086]: peer 2607:5300:205:200::1991 now valid
jan 23 16:03:47 angela ntpd[84086]: clock is now synced
That seems kind of odd. It was also frustrating to have very little
information from ntpctl
about the state of the daemon. I understand
it's designed to be minimal, but it could inform me on his known
offset, for example. It does tell me about the offset with the
different peers, but not as clearly as one would expect. It's also
unclear how it disciplines the RTC at all.
Compared to chrony
Now compare with chrony
:
jan 23 16:07:16 angela systemd[1]: Starting chrony, an NTP client/server...
jan 23 16:07:16 angela chronyd[87765]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
jan 23 16:07:16 angela chronyd[87765]: Initial frequency 3.814 ppm
jan 23 16:07:16 angela chronyd[87765]: Using right/UTC timezone to obtain leap second data
jan 23 16:07:16 angela chronyd[87765]: Loaded seccomp filter
jan 23 16:07:16 angela systemd[1]: Started chrony, an NTP client/server.
jan 23 16:07:21 angela chronyd[87765]: Selected source 206.108.0.131 (2.debian.pool.ntp.org)
jan 23 16:07:21 angela chronyd[87765]: System clock TAI offset set to 37 seconds
First, you'll notice there's none of that "clock synced" nonsense, it
picks a source, and then... it's just done. Because the clock on this
computer is not drifting that much, and openntpd had (presumably) just
sync'd it anyways. And indeed, if we look at detailed stats from the
powerful chronyc
client:
anarcat@angela:~(main)$ sudo chronyc tracking
Reference ID : CE6C0083 (ntp1.torix.ca)
Stratum : 2
Ref time (UTC) : Sun Jan 23 21:07:21 2022
System time : 0.000000311 seconds slow of NTP time
Last offset : +0.000807989 seconds
RMS offset : 0.000807989 seconds
Frequency : 3.814 ppm fast
Residual freq : -24.434 ppm
Skew : 1000000.000 ppm
Root delay : 0.013200894 seconds
Root dispersion : 65.357254028 seconds
Update interval : 1.4 seconds
Leap status : Normal
We see that we are nanoseconds away from NTP time. That was ran very
quickly after starting the server (literally in the same second as
chrony
picked a source), so stats are a bit weird (e.g. the Skew
is
huge). After a minute or two, it looks more reasonable:
Reference ID : CE6C0083 (ntp1.torix.ca)
Stratum : 2
Ref time (UTC) : Sun Jan 23 21:09:32 2022
System time : 0.000487002 seconds slow of NTP time
Last offset : -0.000332960 seconds
RMS offset : 0.000751204 seconds
Frequency : 3.536 ppm fast
Residual freq : +0.016 ppm
Skew : 3.707 ppm
Root delay : 0.013363549 seconds
Root dispersion : 0.000324015 seconds
Update interval : 65.0 seconds
Leap status : Normal
Now it's learning how good or bad the RTC clock is ("Frequency"), and
is smoothly adjusting the System time
to follow the average offset
(RMS offset
, more or less). You'll also notice the Update interval
has risen, and will keep expanding as chrony
learns more about the
internal clock, so it doesn't need to constantly poll the NTP servers
to sync the clock. In the above, we're 487 micro seconds (less than a
milisecond!) away from NTP time.
(People interested in the explanation of every single one of those fields can read the excellent chronyc manpage. That thing made me want to nerd out on NTP again!)
On the machine with the bad clock, chrony
also did a 1.5 second
adjustment, but just once, at startup:
jan 18 11:54:33 curie chronyd[2148399]: Selected source 206.108.0.133 (2.debian.pool.ntp.org)
jan 18 11:54:33 curie chronyd[2148399]: System clock wrong by -1.606546 seconds
jan 18 11:54:31 curie chronyd[2148399]: System clock was stepped by -1.606546 seconds
jan 18 11:54:31 curie chronyd[2148399]: System clock TAI offset set to 37 seconds
Then it would still struggle to keep the clock in sync, but not as badly as openntpd. Here's the offset a few minutes after that above startup:
System time : 0.000375352 seconds slow of NTP time
And again a few seconds later:
System time : 0.001793046 seconds slow of NTP time
I don't currently have access to that machine, and will update this
post with the latest status, but so far I've had a very good
experience with chrony
on that machine, which is a testament to its
resilience, and it also just works on my other machines as well.
Extras
On top of "just working" (as demonstrated above), I feel that
chrony
's feature set is so much superior... Here's an excerpt of the
extras in chrony, taken from the comparison table:
- source frequency tracking
- source state restore from file
- temperature compensation
- ready for next NTP era (year 2036)
- replace unreachable / falseticker servers
- aware of jitter
- RTC drift tracking
- RTC trimming
- Restore time from file w/o RTC
- leap seconds correction, in slew mode
- drops root privileges
I even understand some of that stuff. I think.
So kudos to the chrony
folks, I'm switching.
Caveats
One thing to keep in mind in the above, however is that it's quite
possible chrony
does as bad of a job as openntpd
on that old
machine, and just doesn't tell me about it. For example, here's
another log sample from another server
(marcos):
jan 23 11:13:25 marcos ntpd[1976694]: adjusting clock frequency by 0.451035 to -16.420273ppm
I get those basically every day, which seems to show that it's at least trying to keep track of the hardware clock.
In other words, it's quite possible I have no idea what I'm talking about and you definitely need to take this article with a grain of salt. I'm not an NTP expert.
Update: I should also mentioned that I haven't evaluated systemd-timesyncd, for a few reasons:
- I have enough things running under systemd
- I wasn't aware of it when I started writing this
- I couldn't find good documentation on it... later I found the above manpage and of course the Arch Wiki but that is very minimal
- therefore I can't tell how it compares with chrony or (open)ntpd, so I don't see an enticing reason to switch
It has a few things going for it though:
- it's likely shipped with your distribution already
- it drops privileges (possibly like chrony, unclear if it also has seccomp filters)
- it's minimalist: it only does SNTP so not the server side
- the status command is good enough that you can tell the clock frequency, precision, and so on (especially when compared to openntpd's ntpctl)
So I'm reserving judgement over it, but I'd certainly note that I'm always a little weary in trusting systemd daemons with the network, and would prefer to keep that attack surface to a minimum. Diversity is a good thing, in general, so I'll keep chrony for now.
It would certainly nice to see it added to chrony's comparison table.
Switching to chrony
Because the default configuration in chrony
(at least as shipped in
Debian) is sane (good default peers, no open network by default),
installing it is as simple as:
apt install chrony
And because it somehow conflicts with openntpd
, that also takes care
of removing that cruft as well.
Update: Debian defaults
So it seems like I managed to write this entire blog post without putting it in relation with the original reason I had to think about this in the first place, which is odd and should be corrected.
This conversation came about on an IRC channel that mentioned that the ntp package (and upstream) is in bad shape in Debian. In that discussion, chrony and ntpsec were discussed as possible replacements, but when we had the discussion on chat, I mentioned I was using openntpd, and promptly realized I was actually unhappy with it. A friend suggested chrony, I tried it, and it worked amazingly, I switched, wrote this blog post, end of story.
Except today (2022-02-07, two weeks later), I actually read that
thread and realized that something happened in Debian I wasn't
actually aware of. In bookworm, systemd-timesyncd
was not only
shipped, but it was installed by default, as it was marked as a hard
dependency of systemd
. That was "fixed" in systemd-247.9-2
(see
bug 986651), but only by making the dependency a Recommends
and marking it as Priority: important
.
So in effect, systemd-timesyncd
became the default NTP daemon in
Debian in bookworm, which I find somewhat surprising. timesyncd
has
many things going for it (as mentioned above), but I do find it a bit
annoying that systemd
is replacing all those utilities in such a
way. I also wonder what is going to happen on upgrades. This is all a
little frustrating too because there is no good comparison between the
other NTP daemons and timesyncd
anywhere. The chrony comparison
table doesn't mention it, and an audit by the Core Infrastructure
Initiative from 2017 doesn't mention it either, even though
timesyncd was announced in 2014. (Same with this blog post from Facebook.)
Update: now (2023-03-28, about a year later), bookworm is about to
ship with ntpsec as a replacement package for ntp. There is a
discussion about an update to the release notes that is relevant
here; it seems we never clearly announced that timesyncd
was the
default, and that there are actually serious issue with it.
It's probably not necessary to compare chrony with timesyncd since it's, well, even simpler than openntpd because it's pure SNTP and only a client. Everything that applies to openntpd on chrony's comparison page will apply, minus the aforementioned feature differences.
Not quite true since SNTP does not preclude serving time. Not that you want to trust such a server for giving accurate times...