1. A look at terminal emulators, part 2
    1. Latency
    2. Scrolling speed
    3. Resource usage
    4. Conclusion

This article is the second in a two-part series about terminal emulators.

A comparison of the feature sets for a handful of terminal emulators was the subject of a recent article; here I follow that up by examining the performance of those terminals. This might seem like a lesser concern, but as it turns out, terminals exhibit surprisingly high latency for such fundamental programs. I also examine what is traditionally considered "speed" (but is really scroll bandwidth) and memory usage, with the understanding that the impact of memory use is less than it was when I looked at this a decade ago (in French).

Latency

After thorough research on terminal emulators performance, I have come to the conclusion that its most important aspect is latency. In his Typing with pleasure article, Pavel Fatin reviewed the latency of various text editors and hinted that terminal emulators might be slower than the fastest text editors. That is what eventually led me to run my own tests on terminal emulators and write this series.

But what is latency and why does it matter? In his article, Fatin defined latency as "a delay between the keystroke and corresponding screen update" and quoted the Handbook of Human-Computer Interaction which says: "Delay of visual feedback on a computer display have important effects on typist behavior and satisfaction."

Fatin explained that latency has more profound effects than just satisfaction: "typing becomes slower, more errors occur, eye strain increases, and muscle strain increases". In other words, latency can lead to typos but also to lesser code quality as it imposes extra cognitive load on the brain. But worse, "eye and muscle strain increase" seems to imply that latency can also lead to physical repetitive strain injuries.

Some of those effects have been known for a long time, with some results published in the Ergonomics journal in 1976 showing that a hundred-millisecond delay "significantly impairs the keying speed". More recently, the GNOME Human Interface Guidelines set the acceptable response time at ten milliseconds and, pushing this limit down even further, this video from Microsoft Research shows that the ideal target might even be as low as one millisecond.

Fatin performed his tests on editors, but he created a portable tool called Typometer that I used to test latency in terminal emulators. Keep in mind that the test is a simulation: in reality, we also need to take into account input (keyboard, USB controller, etc.) and output (graphics card and monitor buffers) latency. Those typically add up to more than 20ms in typical configurations, according to Fatin. With more specialized "gaming" hardware, the bare minimum is around three milliseconds. There is therefore little room for applications to add any latency to the pipeline. Fatin's goal is to bring that extra latency down to one millisecond or even zero-latency typing, which was released as part of IntelliJ IDEA 15. Here are my measurements, which include some text editors, showing that my results are consistent with Fatin's (all times in miliseconds):

Program mean std min 90% max
uxterm 1.7 0.3 0.7 2 2.4
mlterm 1.8 0.3 0.7 2.2 2.5
Vim (Athena) 2.8 1.1 0.4 3.5 12.7
Vim (GTK2) 3.9 1.2 0.7 4.8 11.9
Emacs 4.8 2.3 0.5 5.8 32.5
gedit 8.9 3.4 2.8 12.5 14.2
Konsole 13.4 1.2 11.5 15 16.1
Alacritty 15.1 1.2 12.8 15.9 26.3
st 15.7 3.9 10.6 19.4 19.6
Vim (GTK3) 16.5 7.9 0.4 21.9 27.2
urxvt 18.3 0.3 17.3 18.7 19
pterm 23.4 0.9 21.7 24.5 25.4
GNOME Terminal 27.1 1 25.9 27.5 39.3
Xfce Terminal 27.4 0.4 26.4 27.9 28.7
Terminator 28.1 0.7 26.4 29 29.4

The first thing that struck me is that old programs like xterm and mlterm have the best response time, having worse case latency (2.4ms) better than the best case for all other terminals (10.6ms for st). No modern terminal crosses the ten milliseconds threshold. In particular, Alacritty doesn't seem to live up to his "fastest terminal emulator in existence" claims either, although results have improved since I first tested the program in July 2017. Indeed, the project seems to be aware of the situation and is working on improving the display pipeline with threads. We can also note that Vim using GTK3 is slower than its GTK2 counterpart by an order of magnitude. It might therefore be possible that the GTK3 framework introduces extra latency, as we can also observe other that GTK3-based terminals (Terminator, Xfce4 Terminal, and GNOME Terminal, for example) have higher latency.

You might not notice those differences. As Fatin explains: "one does not necessarily need to perceive latency consciously to be affected by it". Fatin also warns about standard deviation (the std column above and the width of error bars in the graph): "any irregularities in delay durations (so called jitter) pose additional problem because of their inherent unpredictability".

Terminal latency compared on Debian with i3 on Xorg

The graph above is from a clean Debian 9 (stretch) profile with the i3 window manager. That environment gives the best results in the latency test: as it turns out, GNOME introduces about 20ms of latency to all measurements. A possible explanation could be that there are programs running that synchronously handle input events: Fatin gives the example of Workrave, which adds latency by processing all input events synchronously. By default, GNOME also includes compositing window manager (Mutter), an extra buffering layer that adds at least eight milliseconds in Fatin's tests.

Terminal latency compared on Fedora with GNOME on Xorg

In the graph above, we can see the same tests performed on Fedora 27 with GNOME running on X.org. The change is drastic; latency at least doubled and in some cases is ten times larger. Forget the one millisecond target: all terminals go far beyond the ten milliseconds budget. The VTE family gets closer to fifty milliseconds with Terminology and GNOME Terminal having spikes well above that threshold. We can also see there's more jitter in those tests. Even with the added latency, we can see that mlterm and, to a lesser extent xterm still perform better than their closest competitors, Konsole and st.

Scrolling speed

The next test is the traditional "speed" or "bandwidth" test that measures how fast the terminal can scroll by displaying a large amount of text on the terminal at once. The mechanics of the test vary; the original test I found was simply to generate the same test string repeatedly using the seq command. Other tests include one from Thomas E. Dickey, the xterm maintainer, which dumps the terminfo.src file repeatedly. In another review of terminal performance, Dan Luu uses a base32-encoded string of random bytes that is simply dumped on the terminal with cat. Luu considers that kind of test to be "as useless a benchmark as I can think of" and suggests using the terminal's responsiveness during the test as a metric instead. Dickey also dismisses that test as misleading. Yet both authors recognize that bandwidth can be a problem: Luu discovered the Emacs Eshell hangs while showing large files and Dickey implemented an optimization to work around the perceived slowness of xterm. There is therefore still some value in this test as the rendering process varies a lot between terminals; it also serves as a good test harness for testing other parameters.

Scrolling speed

Here we can see rxvt and st are ahead of all others, closely followed by the much newer Alacritty, expressly designed for speed. Xfce (representing the VTE family) and Konsole are next, running at almost twice the time while xterm comes last, almost five times as slow as rxvt. During the test, xterm also had jitter in the display: it was difficult to see the actual text going by, even if it was always the same string. Konsole was fast, but it was cheating: the display would hang from time to time, showing a blank or partially drawn display. Other terminals generally display all lines faithfully, including st, Alacritty, and rxvt.

Dickey explains that performance variations are due to the design of scrollback buffers in the different terminals. Specifically, he blames the disparity on rxvt and other terminals "not following the same rules":

Unlike xterm, rxvt did not attempt to display all updates. If it fell behind, it would discard some of the updates, to catch up. Doing that had a greater effect on the apparent scrolling speed than its internal memory organization, since it was useful for any number of saved-lines. One drawback was that ASCII animations were somewhat erratic.

To fix this perceived slowness of xterm, Dickey introduced the fastScroll resource to allow xterm to drop some screen updates to catch up with the flow and, indeed, my tests confirm the resource improves performance to match rxvt. It is, however, a rather crude implementation as Dickey explains: "sometimes xterm — like konsole — appears to stop, since it is waiting for a new set of screen updates after having discarded some". In this case, it seems that other terminals found a better compromise between speed and display integrity.

Resource usage

Regardless of the worthiness of bandwidth as a performance metric, it does provide a way to simulate load on the terminals, which in turn allows us to measure other parameters like memory or disk usage. The metrics here were obtained by running the above seq benchmark under the supervision of a Python process that collected the results of getrusage() counters for ru_maxrss, the sum of ru_oublock and ru_inblock, and a simple timer for wall clock time.

Memory use

St comes first in this benchmark with the smallest memory footprint, 8MB on average, which was no surprise considering the project's focus on simplicity. Slightly larger are mlterm, xterm, and rxvt at around 12MB. Another notable result is Alacritty, which takes a surprising amount of memory at 30MB. Next comes the VTE family members which vary between 40 and 60MB, a higher result that could be explained by those programs use of higher-level libraries like GTK. Konsole comes last with a whopping 65MB of memory usage during the tests, although that might be excused due to its large feature set.

Compared with the results I had a decade ago, all programs take up much more memory. Xterm used to take 4MB of memory, but now takes 15MB just on startup. A similar increase also affects rxvt, which now takes 16MB of memory out of the box. The Xfce Terminal now takes 34MB, a three-fold increase, yet GNOME Terminal only takes 20MB on startup. Of course, the previous tests were done on a 32-bit architecture. At LCA 2012, Rusty Russell also explained there are many more subtle reasons that could explain such an increase. Besides, maybe this is something we can live with in this modern day and age of multi-gigabyte core memory sizes.

Yet I can't help but feel there's a waste of resources for something so fundamental as a terminal. Those programs should be the smallest of the small and should be able to run in a shoe box, when those eventually run Linux (you know they will). Yet with those numbers, memory usage would be a concern only when running multiple terminals in anything but the most limited of environments. To compensate, GNOME Terminal, Konsole, urxvt, Terminator, and Xfce Terminal feature a daemon mode that manages multiple terminals through a single process which limits the impact of their larger memory footprint.

Disk I/O

Another result I have found surprising in my tests is actual disk I/O: I did not expect any, yet some terminals write voluminous amounts of data to disk. It turns out the VTE library actually writes the scrollback buffer to disk, a "feature" that was noticed back in 2010 and that is still present in modern implementations. At least the file contents are now encrypted with AES256 GCM since 0.39.2, but this raises the question of what's so special about the VTE library that it requires such an exotic approach.

Conclusion

In the previous article, we found that VTE-based terminals have a good feature set, yet here we see that this comes with some performance costs. Memory isn't a big issue since all VTE terminals are spawned from a single daemon process that limits memory usage. Old systems tight on core memory might still need older terminals with lower memory usage, however. While VTE terminals behave well in bandwidth tests, their latency is higher than the criterion set in the GNOME Human Interface Guidelines, which is probably something that the developers of the VTE library should look into. Considering how inevitable the terminal is even for novice users in Linux, those improvements might make the experience slightly less traumatic. For seasoned geeks, changing from the default terminal might even mean quality improvements and less injuries during long work sessions. Unfortunately, only the old xterm and mlterm get us to the magic 10ms range, which might involve unacceptable compromises for some.

The latency benchmarks also show there are serious tradeoffs that came with the introduction of compositors in Linux graphical environments. Some users might want to take a look at conventional window managers, since they provide significant latency improvements. Unfortunately, it was not possible to run latency tests in Wayland: the Typometer program does exactly the kind of things Wayland was designed to prevent, namely inject keystrokes and spy on other windows. Hopefully, Wayland compositors are better than X.org at performance and someone will come up with a way of benchmarking latency in those environments in the future.

This article first appeared in the Linux Weekly News.

detailed results and discussions

As with the previous article in this series, a lengthy discussion about this article has taken place on the LWN.net article, which might be interesting to readers here.

As usual, I have published extensive documentation on the process by which those benchmarks were created here:

https://gitlab.com/anarcat/terms-benchmarks

... with a (potentially out of date) mirror on GitHub here:

https://github.com/anarcat/terms-benchmarks

Comment by anarcat
Created . Edited .