A look at terminal emulators, part 2
This article is the second in a two-part series about terminal emulators.
- part one: features
- part two: performance
A comparison of the feature sets for a handful of terminal emulators was the subject of a recent article; here I follow that up by examining the performance of those terminals. This might seem like a lesser concern, but as it turns out, terminals exhibit surprisingly high latency for such fundamental programs. I also examine what is traditionally considered "speed" (but is really scroll bandwidth) and memory usage, with the understanding that the impact of memory use is less than it was when I looked at this a decade ago (in French).
Latency
After thorough research on terminal emulators performance, I have come to the conclusion that its most important aspect is latency. In his Typing with pleasure article, Pavel Fatin reviewed the latency of various text editors and hinted that terminal emulators might be slower than the fastest text editors. That is what eventually led me to run my own tests on terminal emulators and write this series.
But what is latency and why does it matter? In his article, Fatin defined latency as "a delay between the keystroke and corresponding screen update" and quoted the Handbook of Human-Computer Interaction which says: "Delay of visual feedback on a computer display have important effects on typist behavior and satisfaction."
Fatin explained that latency has more profound effects than just satisfaction: "typing becomes slower, more errors occur, eye strain increases, and muscle strain increases". In other words, latency can lead to typos but also to lesser code quality as it imposes extra cognitive load on the brain. But worse, "eye and muscle strain increase" seems to imply that latency can also lead to physical repetitive strain injuries.
Some of those effects have been known for a long time, with some results published in the Ergonomics journal in 1976 showing that a hundred-millisecond delay "significantly impairs the keying speed". More recently, the GNOME Human Interface Guidelines set the acceptable response time (archive.org link, new hig guide has no guidelines) at ten milliseconds and, pushing this limit down even further, this video from Microsoft Research shows that the ideal target might even be as low as one millisecond.
Fatin performed his tests on editors, but he created a portable tool called Typometer that I used to test latency in terminal emulators. Keep in mind that the test is a simulation: in reality, we also need to take into account input (keyboard, USB controller, etc.) and output (graphics card and monitor buffers) latency. Those typically add up to more than 20ms in typical configurations, according to Fatin. With more specialized "gaming" hardware, the bare minimum is around three milliseconds. There is therefore little room for applications to add any latency to the pipeline. Fatin's goal is to bring that extra latency down to one millisecond or even zero-latency typing, which was released as part of IntelliJ IDEA 15. Here are my measurements, which include some text editors, showing that my results are consistent with Fatin's (all times in miliseconds):
Program | mean | std | min | 90% | max |
---|---|---|---|---|---|
uxterm | 1.7 | 0.3 | 0.7 | 2 | 2.4 |
mlterm | 1.8 | 0.3 | 0.7 | 2.2 | 2.5 |
Vim (Athena) | 2.8 | 1.1 | 0.4 | 3.5 | 12.7 |
Vim (GTK2) | 3.9 | 1.2 | 0.7 | 4.8 | 11.9 |
Emacs | 4.8 | 2.3 | 0.5 | 5.8 | 32.5 |
gedit | 8.9 | 3.4 | 2.8 | 12.5 | 14.2 |
Konsole | 13.4 | 1.2 | 11.5 | 15 | 16.1 |
Alacritty | 15.1 | 1.2 | 12.8 | 15.9 | 26.3 |
st | 15.7 | 3.9 | 10.6 | 19.4 | 19.6 |
Vim (GTK3) | 16.5 | 7.9 | 0.4 | 21.9 | 27.2 |
urxvt | 18.3 | 0.3 | 17.3 | 18.7 | 19 |
pterm | 23.4 | 0.9 | 21.7 | 24.5 | 25.4 |
GNOME Terminal | 27.1 | 1 | 25.9 | 27.5 | 39.3 |
Xfce Terminal | 27.4 | 0.4 | 26.4 | 27.9 | 28.7 |
Terminator | 28.1 | 0.7 | 26.4 | 29 | 29.4 |
The first thing that struck me is that old programs like xterm and mlterm have the best response time, having worse case latency (2.4ms) better than the best case for all other terminals (10.6ms for st). No modern terminal crosses the ten milliseconds threshold. In particular, Alacritty doesn't seem to live up to his "fastest terminal emulator in existence" claims either, although results have improved since I first tested the program in July 2017. Indeed, the project seems to be aware of the situation and is working on improving the display pipeline with threads. We can also note that Vim using GTK3 is slower than its GTK2 counterpart by an order of magnitude. It might therefore be possible that the GTK3 framework introduces extra latency, as we can also observe other that GTK3-based terminals (Terminator, Xfce4 Terminal, and GNOME Terminal, for example) have higher latency.
You might not notice those differences. As Fatin explains: "one does
not necessarily need to perceive latency consciously to be affected by
it". Fatin also warns about standard deviation (the std
column above
and the width of error bars in the graph): "any irregularities in delay
durations (so called jitter) pose additional problem because of their
inherent unpredictability".
The graph above is from a clean Debian 9 (stretch) profile with the i3 window manager. That environment gives the best results in the latency test: as it turns out, GNOME introduces about 20ms of latency to all measurements. A possible explanation could be that there are programs running that synchronously handle input events: Fatin gives the example of Workrave, which adds latency by processing all input events synchronously. By default, GNOME also includes compositing window manager (Mutter), an extra buffering layer that adds at least eight milliseconds in Fatin's tests.
In the graph above, we can see the same tests performed on Fedora 27 with GNOME running on X.org. The change is drastic; latency at least doubled and in some cases is ten times larger. Forget the one millisecond target: all terminals go far beyond the ten milliseconds budget. The VTE family gets closer to fifty milliseconds with Terminology and GNOME Terminal having spikes well above that threshold. We can also see there's more jitter in those tests. Even with the added latency, we can see that mlterm and, to a lesser extent xterm still perform better than their closest competitors, Konsole and st.
Scrolling speed
The next test is the traditional "speed" or "bandwidth" test that
measures how fast the terminal can scroll by displaying a large amount
of text on the terminal at once. The mechanics of the test vary; the
original test I found was simply to generate the same test string
repeatedly using the seq
command. Other tests include one from Thomas
E. Dickey, the xterm maintainer, which dumps the terminfo.src
file
repeatedly. In another review of terminal performance, Dan
Luu uses a base32-encoded string of
random bytes that is simply dumped on the terminal with cat
. Luu
considers that kind of test to be "as useless a benchmark as I can
think of" and suggests using the terminal's responsiveness during the
test as a metric instead. Dickey also dismisses that test as misleading.
Yet both authors recognize that bandwidth can be a problem: Luu
discovered the Emacs Eshell hangs while showing large files and Dickey
implemented an optimization to work around the perceived slowness of
xterm. There is therefore still some value in this test as the rendering
process varies a lot between terminals; it also serves as a good test
harness for testing other parameters.
Here we can see rxvt and st are ahead of all others, closely followed by the much newer Alacritty, expressly designed for speed. Xfce (representing the VTE family) and Konsole are next, running at almost twice the time while xterm comes last, almost five times as slow as rxvt. During the test, xterm also had jitter in the display: it was difficult to see the actual text going by, even if it was always the same string. Konsole was fast, but it was cheating: the display would hang from time to time, showing a blank or partially drawn display. Other terminals generally display all lines faithfully, including st, Alacritty, and rxvt.
Dickey explains that performance variations are due to the design of scrollback buffers in the different terminals. Specifically, he blames the disparity on rxvt and other terminals "not following the same rules":
Unlike xterm, rxvt did not attempt to display all updates. If it fell behind, it would discard some of the updates, to catch up. Doing that had a greater effect on the apparent scrolling speed than its internal memory organization, since it was useful for any number of saved-lines. One drawback was that ASCII animations were somewhat erratic.
To fix this perceived slowness of xterm, Dickey introduced the fastScroll resource to allow xterm to drop some screen updates to catch up with the flow and, indeed, my tests confirm the resource improves performance to match rxvt. It is, however, a rather crude implementation as Dickey explains: "sometimes xterm — like konsole — appears to stop, since it is waiting for a new set of screen updates after having discarded some". In this case, it seems that other terminals found a better compromise between speed and display integrity.
Resource usage
Regardless of the worthiness of bandwidth as a performance metric, it
does provide a way to simulate load on the terminals, which in turn
allows us to measure other parameters like memory or disk usage. The
metrics here were obtained by running the above seq
benchmark under
the supervision of a Python process that collected the results of
getrusage()
counters for ru_maxrss
, the sum of ru_oublock
and ru_inblock
, and
a simple timer for wall clock time.
St comes first in this benchmark with the smallest memory footprint, 8MB on average, which was no surprise considering the project's focus on simplicity. Slightly larger are mlterm, xterm, and rxvt at around 12MB. Another notable result is Alacritty, which takes a surprising amount of memory at 30MB. Next comes the VTE family members which vary between 40 and 60MB, a higher result that could be explained by those programs use of higher-level libraries like GTK. Konsole comes last with a whopping 65MB of memory usage during the tests, although that might be excused due to its large feature set.
Compared with the results I had a decade ago, all programs take up much more memory. Xterm used to take 4MB of memory, but now takes 15MB just on startup. A similar increase also affects rxvt, which now takes 16MB of memory out of the box. The Xfce Terminal now takes 34MB, a three-fold increase, yet GNOME Terminal only takes 20MB on startup. Of course, the previous tests were done on a 32-bit architecture. At LCA 2012, Rusty Russell also explained there are many more subtle reasons that could explain such an increase. Besides, maybe this is something we can live with in this modern day and age of multi-gigabyte core memory sizes.
Yet I can't help but feel there's a waste of resources for something so fundamental as a terminal. Those programs should be the smallest of the small and should be able to run in a shoe box, when those eventually run Linux (you know they will). Yet with those numbers, memory usage would be a concern only when running multiple terminals in anything but the most limited of environments. To compensate, GNOME Terminal, Konsole, urxvt, Terminator, and Xfce Terminal feature a daemon mode that manages multiple terminals through a single process which limits the impact of their larger memory footprint.
Another result I have found surprising in my tests is actual disk I/O: I
did not expect any, yet some terminals write voluminous amounts of data
to disk. It turns out the VTE library actually writes the scrollback
buffer to disk, a "feature" that was noticed back in
2010 and that is
still present in modern implementations. At least the file contents are
now encrypted with AES256 GCM
since
0.39.2, but this
raises the question of what's so special about the VTE library that it
requires such an exotic approach.
Conclusion
In the previous article, we found that VTE-based terminals have a good feature set, yet here we see that this comes with some performance costs. Memory isn't a big issue since all VTE terminals are spawned from a single daemon process that limits memory usage. Old systems tight on core memory might still need older terminals with lower memory usage, however. While VTE terminals behave well in bandwidth tests, their latency is higher than the criterion set in the GNOME Human Interface Guidelines, which is probably something that the developers of the VTE library should look into. Considering how inevitable the terminal is even for novice users in Linux, those improvements might make the experience slightly less traumatic. For seasoned geeks, changing from the default terminal might even mean quality improvements and less injuries during long work sessions. Unfortunately, only the old xterm and mlterm get us to the magic 10ms range, which might involve unacceptable compromises for some.
The latency benchmarks also show there are serious tradeoffs that came with the introduction of compositors in Linux graphical environments. Some users might want to take a look at conventional window managers, since they provide significant latency improvements. Unfortunately, it was not possible to run latency tests in Wayland: the Typometer program does exactly the kind of things Wayland was designed to prevent, namely inject keystrokes and spy on other windows. Hopefully, Wayland compositors are better than X.org at performance and someone will come up with a way of benchmarking latency in those environments in the future.
This article first appeared in the Linux Weekly News.
Updates
Keyboard latency hardware experiments
I have had many comments elsewhere about how latency shouldn't matter so much, particularly from keyboard hardware providers. I wasn't actually convinced until I saw this video from Ben Eater (who has some serious oscilloscope skills). His experiment shows that high-speed, high quality USB keyboards shouldn't have to worry about this, because they have a 1ms latency -- which is still high, but not much higher than the PS/2 latency (0.7ms). Basically, the 16ms figure comes from the low-speed USB poll interval; as long as your keyboard is high speed, it should be okay.
I checked, and my keyboard is actually high speed:
aoû 23 16:32:52 angela kernel: usb 1-6.1: new full-speed USB device number 24 using xhci_hcd
aoû 23 16:32:52 angela kernel: usb 1-6.1: New USB device found, idVendor=0c45, idProduct=7692, bcdDevice= 3.0e
aoû 23 16:32:52 angela kernel: usb 1-6.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
aoû 23 16:32:52 angela kernel: usb 1-6.1: Product: USB Keyboard
aoû 23 16:32:52 angela kernel: usb 1-6.1: Manufacturer: SONiX
Interestingly, my mouse doesn't run at high speed, so that might mean more latency there:
aoû 23 16:32:53 angela kernel: usb 1-6.2: new low-speed USB device number 25 using xhci_hcd
aoû 23 16:32:53 angela kernel: usb 1-6.2: New USB device found, idVendor=047d, idProduct=1020, bcdDevice= 1.06
aoû 23 16:32:53 angela kernel: usb 1-6.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
aoû 23 16:32:53 angela kernel: usb 1-6.2: Product: Kensington Expert Mouse
But it seems I would need an oscilloscope (and know how to use it!) to debug this.
Note that some people have actually built physical test harnesses for this kind of stuff, with open firmware so it should be possible to reproduce this at home.
zsh latency
Some people seem to care about zsh latency as well.
Wayland terminal emulators
I have started some notes on reviewing the terminal emulators available in Wayland, which significantly lowers the range of applications available. See 2022-09-19-wayland-terminal-emulators.
Similar research
The above latency benchmarks were done with Typometer on X11 by beuke.org. Their results are different on some points: xterm's maximum latency (9.8ms) is much higher than ours (2.4ms) which makes me think there's something wrong with their test bench. But other results (rxvt, st, Terminator) are strikingly similar. One notable change is how well Alacritty performs, probably because it improved in 6 years since I ran those benchmarks.
I'm still waiting for someone to figure out how to perform those tests under wayland and compare against foot. Right now it's really hard to tell, but I get the feeling Alacritty and xterm are pretty close, and that foot and gnome-terminal are slower.
Update: 9 days later, just found out about Ivan Molodetskikh VTE end-to-end tests which show precisely how well VTE has improved over the years, to be on par with Alacritty (which, somehow, managed to become a reference after lagging behind). Excellent work! My only criticism is the article focuses exclusively on VTE but the author also made other benchmarks including of Foot, the terminal emulator I'm currently using now and that I was, above, feeling slower, but that tests show is the fastest on the block, which is really nice to hear.
They also made compositor tests which show Sway (~12ms) is ahead of Mutter (~14ms, GNOME's simplest compositor), itself ahead of normal GNOME (~16ms). Only X11/i3 goes below the 10ms mark there, which is a bit depressing, but the author is quick to point out that "work to add tearing flips to kernel and Wayland is ongoing".
Oh, and they don't test Emacs in their editors, arguing it lacks a good editor, ha ha.
As with the previous article in this series, a lengthy discussion about this article has taken place on the LWN.net article, which might be interesting to readers here.
As usual, I have published extensive documentation on the process by which those benchmarks were created here:
https://gitlab.com/anarcat/terms-benchmarks
... with a (potentially out of date) mirror on GitHub here:
https://github.com/anarcat/terms-benchmarks
For those who still doubt how latency can affect cognitive load, I strongly recommend trying out those tests:
https://input-delay.glitch.me/
https://aresluna.org/keyboard-secrets/typing-delay/
Just mind blowing.
If I would do this review again today, it seems I would definitely need to include a terminal emulator called Zutty. It checks all the boxes for me (almost):
Major blockers for adoption:But author seems open to improvements, so who knows.Update: both were implemented! Very nice! I guess my only concern at switching now would be whether it will survive the Wayland apocalypse (whether that will come or not... In theory, since it relies so much on OpenGL, Wayland shouldn't be "that hard"...Anyways, author did an excellent latency review and general comparison that is definitely worth a read if you liked this article.