Building a status page service with Hugo
The Tor Project now has a status page which shows the state of our major services.
You can check status.torprojet.org for news about major outages in Tor services, including v3 and v2 onion services, directory authorities, our website (torproject.org), and the check.torproject.org tool. The status page also displays outages related to Tor internal services, like our GitLab instance.
This post documents why we launched status.torproject.org, how the service was built, and how it works.
Why a status page
The first step in setting up a service page was to realize we needed one in the first place. I surveyed internal users at the end of 2020 to see what could be improved, and one of the suggestions that came up was to "document downtimes of one hour or longer" and generally improve communications around monitoring. The latter is still on the sysadmin roadmap, but a status page seemed like a good solution for the former.
We already have two monitoring tools in the sysadmin team: Icinga (a fork of Nagios) and Prometheus, with Grafana dashboards. But those are hard to understand for users. Worse, they also tend to generate false positives, and don't clearly show users which issues are critical.
In the end, a manually curated dashboard provides important usability benefits over an automated system, and all major organisations have one.
Picking the right tool
It wasn't my first foray in status page design. In another life, I had setup a status page using a tool called Cachet. That was already a great improvement over the previous solutions, which were to use first a wiki and then a blog to post updates. But Cachet is a complex Laravel app, which also requires a web browser to update. It generally requires more maintenance than what we'd like, needing silly things like a SQL database and PHP web server.
So when I found cstate, I was pretty excited. It's basically a theme for the Hugo static site generator, which means that it's a set of HTML, CSS, and a sprinkle of Javascript. And being based on Hugo means that the site is generated from a set of Markdown files and the result is just plain HTML that can be hosted on any web server on the planet.
Deployment
At first, I wanted to deploy the site through GitLab CI, but at that time we didn't have GitLab pages set up. Even though we do have GitLab pages set up now, it's not (yet) integrated with our mirroring infrastructure. So, for now, the source is hosted and built in our legacy git and Jenkins services.
It is nice to have the content hosted in a git repository: sysadmins can just edit Markdown in the git repository and push to deploy changes, no web browser required. And it's trivial to setup a local environment to preview changes:
hugo serve --baseUrl=http://localhost/
firefox https://localhost:1313/
Only the sysadmin team and gitolite administrators have access to the repository, at this stage, but that could be improved if necessary. Merge requests can also be issued on the GitLab repository and then pushed by authorized personnel later on, naturally.
Availability
One of the concerns I have is that the site is hosted inside our normal mirror
infrastructure. Naturally, if an outage occurs there, the site goes down. But
I figured it's a bridge we'll cross when we get there. Because it's so easy to
build the site from scratch, it's actually trivial to host a copy of the site
on any GitLab server, thanks to the .gitlab-ci.yml
file shipped (but not
currently used) in the repository. If push comes to shove, we can just publish
the site elsewhere and point DNS there.
And, of course, if DNS fails us, then we're in trouble, but that's the situation anyway: we can always register a new domain name for the status page when we need to. It doesn't seem like a priority at the moment.
Comments and feedback are welcome!
This article was first published on the Tor Project Blog.