I have been using Workrave to try to force me to step away from the computer regularly to work around Repetitive Strain Injury (RSI) issues that have plagued my life on the computer intermittently in the last decade.

Workrave itself is only marginally efficient at getting me away from the machine: as any warning systems, it suffers from alarm fatigue as you frenetically click the dismiss button every time a Workrave warning pops up. However, it has other uses.

Analyzing data input

In the past, I have used Workrave to document how I work too much on the computer, but never went through more serious processing of the vast data store that Workrave accumulates about mouse movements and keystrokes. Interested in knowing how much my leave from Koumbit affected time spent on the computer, I decided to look into this again.

It turns out I am working as much, if not more, on the computer since I took that "time off":

per machine keystrokes per day and average

We can see here that I type a lot on the computer. Normal days range from 10 000 to 60 000 keystrokes, with extremes at around 100 000 keystrokes per day. The average seem to fluctuate around 30 to 40 000 keystrokes per day, but rises sharply around the end of the second quarter of this very year. For those unfamiliar with the underlying technology, one keystroke is roughly one byte I put on the computer. So the average of 40 000 keystrokes is 40 kilobyte (KB) per day on the computer. That means 15 MB over a year or about 150MB (or 100 MiB if you want to be picky about it) over the course of the last decade.

That is a lot of typing.

I originally thought this could have been only because I type more now, as opposed to use more the mouse previously. Unfortunately, Workrave also tracks general "active time" which we can also examine:

per machine keystrokes per day and average

Here we see that I work around 4 hours a day continuously on the computer. That is active time: not just login, logout time. In other words, the time where i look away from the computer and think for a while, jot down notes in my paper agenda or otherwise step away from the computer for small breaks is not counted here. Notice how some days go up to 12 hours and how recently the average went up to 7 hours of continuous activity.

So we can clearly see that I basically work more on the computer now than I ever did in the last 7 years. This is a problem - one of the reasons of this time off was to step away from the computer, and it seems I have failed.

Update: it turns out the graph was skewed towards the last samples. I went more easy on the keyboard in the last few days and things have significantly improved:

per machine keystrokes per day and average, 3 days later

Another interesting thing we can see is when I switched from using my laptop to using the server as my main workstation, around early 2011, which is about the time marcos was built. Now that marcos has been turned into a home cinema downstairs, I went back to using my laptop as my main computing device, in late 2015. We can also clearly see when I stopped using Koumbit machines near the end of 2015 as well.

Further improvements and struggle for meaning

The details of how the graph was produced are explained at the end of this article.

This is all quite clunky: it doesn't help that the Workrave data structure is not easily parsable and so easily corruptible. It would be best if each data point was on its own separate line, which would be long, granted, but so easier to parse.

Furthermore, I do not like the perl/awk/gnuplot data processing pipeline much. It doesn't allow me to do interesting analysis like averages, means and linear regressions easily. It could be interesting to rewrite the tools in Python to allow better graphs and easier data analysis, using the tools I learned in 2015-09-28-fun-with-batteries.

Finally, this looks only at keystrokes and non-idle activity. It could be more interesting to look at idle/active times and generally the amount of time spent on the computer each day. And while it is interesting to know that I basically write a small book on the computer every day (according to Wikipedia, 120KB is about the size of a small pocket book), it is mostly meaningless if all that stuff is machine-readable code.

Where is, after all, the meaning in all those shell commands and programs we constantly input on our keyboards, in the grand scheme of human existence? Most of those bytes are bound to be destroyed by garbage collection (my shell's history) or catastrophic backup failures.

While the creative works from the 16th century can still be accessed and used by others, the data in some software programs from the 1990s is already inaccessible. - Lawrence Lessig

But is my shell history relevant? Looking back at old posts on this blog, one has to wonder if the battery life of the Thinkpad 380z laptop or how much e-waste I threw away in 2005 will be of any historical value in 20 years, assuming the data survives that long.

How this graph was made

I was happy to find that Workrave has some contrib scripts to do such processing. Unfortunately, those scripts are not shipped with the Debian package, so I requested that to be fixed (#825982). There were also some fixes necessary to make the script work at all: first, there was a syntax error in the Perl script. But then since my data is so old, there was bound to be some data corruption in there: incomplete entries or just plain broken data. I had lines that were all NULL characters, typical of power failures or disk corruptions. So I have made a patch to fix that script (#826021).

But this wasn't enough: while this processes data on the current machine fine, it doesn't deal with multiple machines very well. In the last 7 years of data I could find, I was using 3 different machines: this server (marcos), my laptop (angela) and Koumbit's office servers (koumbit). I ended up modifying the contrib scripts to be able to collate that data meaningfully. First, I copied over the data from Koumbit in a local fake-koumbit directory. Second, I mounted marcos home directory locally with SSHFS:

sshfs anarc.at:/home/anarcat marcos

I also made this script to sum up datasets:

#!/usr/bin/perl -w

use List::MoreUtils 'pairwise';

$| = 1;
my %data = ();
while (<>) {
    my @fields = split;
    my $date = shift @fields;
    if (defined($data{$date})) {
        my @t = pairwise { $a + $b } @{$data{$date}}, @fields;
        $data{$date} = \@t;
    }
    else {
        $data{$date} = \@fields;
    }

}
foreach my $d ( sort keys %data ) {
    print "$d @{$data{$d}}\n";
}

Then I made a modified version of the Gnuplot script that processes all those files together:

#!/usr/bin/gnuplot

set title "Workrave"
set ylabel "Keystrokes per day"
set timefmt "%Y%m%d%H%M"
#set xrange [450000000:*]
set format x "%Y-%m-%d"
set xtics rotate
set xdata time
set terminal svg
set output "workrave.svg"
plot "workrave-angela.dat" using 1:28 title "angela", \
     "workrave-marcos.dat" using 1:28 title "marcos", \
     "workrave-koumbit.dat" using 1:28 title "koumbit", \
     "workrave-sum.dat" using 1:2 smooth sbezier linewidth 3 title "average"
#plot "workrave-angela.dat" using 1:28 smooth sbezier title "angela", \
#     "workrave-marcos.dat" using 1:28 smooth sbezier title "marcos", \
#     "workrave-koumbit.dat" using 1:28 smooth sbezier title "koumbit"

And finally, I made a small shell script to glue this all together:

#!/bin/sh

perl workrave-dump > workrave-$(hostname).dat
HOME=$HOME/marcos perl workrave-dump > workrave-marcos.dat
HOME=$PWD/fake-koumbit perl workrave-dump > workrave-koumbit.dat
# remove idle days as they skew the average
sed -i '/ 0$/d' workrave-*.dat
# per-day granularity
sed -i 's/^\(........\)....\? /\1 /' workrave-*.dat
# sum up all graphs
cat workrave-*.dat | sort | perl sum.pl > workrave.dat
./gnuplot-workrave-anarcat

I used a different gnuplot script to generate the activity graph:

#!/usr/bin/gnuplot

set title "Workrave"
set ylabel "Active hours per day"
set timefmt "%Y%m%d%H%M"
#set xrange [450000000:*]
set format x "%Y-%m-%d"
set xtics rotate
set xdata time
set terminal svg
set output "workrave.svg"
plot "workrave-angela.dat"  using 1:($23/3600) title "angela", \
     "workrave-marcos.dat"  using 1:($23/3600) title "marcos", \
     "workrave-koumbit.dat" using 1:($23/3600) title "koumbit", \
     "workrave.dat"         using 1:($23/3600) title "average" smooth sbezier linewidth 3
#plot "workrave-angela.dat" using 1:28 smooth sbezier title "angela", \
#     "workrave-marcos.dat" using 1:28 smooth sbezier title "marcos", \
#     "workrave-koumbit.dat" using 1:28 smooth sbezier title "koumbit"
svg URL & Lawrence Lessig

the graphs URLs in your RSS feed are already 404... https://anarc.at/tag/debian-planet/workrave.svg

Comment by M Noit
well that was quick...

that will teach me to use arbitrary html, damnit... i think i fixed the issue by using the !img directive instead of directly using a relative link in a <img> tag.

unfortunately, the debian planet fix is probably permanently broken...

Comment by anarcat [id.koumbit.net]
Created . Edited .