The rise of Linux-based networking hardware
[[!meta Erreur: cannot parse date/time: 2017-04-39T13:36:25-0500]]
This article is part of a larger series about NetConf/NetDev 2.1.
- A report from Netconf: Day 1
- A report from Netconf: Day 2
- New approaches to network fast paths
- The rise of Linux-based networking hardware
Linux usage in networking hardware has been on the rise for some time. During the latest Netdev conference held in Montreal this April, people talked seriously about Linux running on high end, "top of rack" (TOR) networking equipment. Those devices have long been the realm of proprietary hardware and software companies like Cisco or Juniper, but Linux seems to be making some significant headway into the domain. According to Shrijeet Mukherjee, VP of Engineering at Cumulus Networks: "we are seeing a 28% adoption rate in the Fortune 50" companies.
As someone who has worked in system administration and networking for over a decade, I was surprised by this: switches have always been black boxes of proprietary hardware that we barely get a shell into. But as more and more TOR hardware gets Linux support, some of that work trickles down outside of that niche. Are we really seeing the rise of Linux in high-end networking hardware?
Linux as the standard interface
During his keynote at Netdev, Jesse Brandeburg explained that traffic is exploding on the Internet: "From 2006 to 2016 the compound annual growth rate was 78% of network traffic. So the network traffic is growing like crazy. In 2010 to 2023, it's going to grow by a thousand times."
He also mentioned Intel was working on devices that could do up to 400 Gbps. In his talk, he argued that Linux has a critical place in the modern world by repeating the mantra that "everything is on the network, and the network runs Linux", but does it really? Through Android, Linux has become the most popular end-user device operating system and is also dominant on the server — but what about the actual fabric of the network: the routers and switches that interconnect all those machines?
Mukherjee, in his own keynote, argued that even though Linux has achieved dominance in the server and virtualization markets, there is a "discontinuity [...] at the boundary where the host and the network meet" and argued "that the host without the network will not survive". In other words, proprietary hardware and software in the network threaten free software in the server. Even though some manufacturers are already providing a "Linux interface" in their hardware, it is often only some sort of compatibility shell which might be compared with the Ubuntu compatibility layer in Windows: it's not a real Linux.
Mukherjee pushed this idea further by saying that those companies are limiting themselves by not using the full Linux stack. He presented Linux as the "top vehicle for innovation" that provides a featureful network stack, citing VXLAN, eBPF, and Quagga as prime features used on switches. Mukherjee also praised the diversity of the user-space Linux ecosystem as something that commercial alternatives can't rival; he compared the faster Linux development to the commercial sector where similar top features stay in the beta stage for up to 3 years.
Because of its dominance in the server market, consumers are expecting a
Linux-like interface on their networking gear now, which means Linux
could be the standard interface all providers strive toward. As a Debian
developer, I can't help but smile at the thought; if there's one thing
we have not been able to do among Linux distributions, it's pretty
much standardize user space in a consistent interface. POSIX is old and
incomplete, the
FHS is
showing its age, and most distributions have abandoned
LSB. Yet, the idea
is certainly enticing: there is a base set of tools and applications,
especially for the Linux kernel, that are standardized: iproute2
,
ethtool
, and iptables
are generally consistent across distributions,
even though each distribution has its own way of using them.
Yet Linux is not dominant, why? Mukherjee identified the problem as "packaging issues" and listed a set of features he would like Linux to improve:
Standardization of the
ethtool
interface. The idea is to makeethtool
a de-facto standard to manage switches and ports. Mukherjee gave the example that data centers spend more money on cables than any other hardware and explained that making it easier to identify cables is therefore a key functionality. Getting consistent interface naming was also a key problem identified by numerous participants at the conference. While systemd tried to fix this with the predictable network interface names specification, the interface names are not necessarily predictable across virtual machines or on special hardware; in fact, this was the topic of the first talk of the conference.ethtool
also needs to support interfaces that run faster than 1 Gbps, something that still has limited support in Linux at the moment.Scaling of the Linux bridge. Through the rise of "software defined networking", we are likely to see multi-switch virtual environments that need to scale to hundreds of interfaces and devices easily. This is something the Linux bridge was never designed to do and it's showing scalability issues. During the conference, there was hope that the new XDP and eBPF developments could help, but also concerns this would create yet another bridge layer inside the kernel.
Cumulus's goal seems to be to establish Linux as the industry standard for this new era of networking and it is not alone. Through its Open Compute project, Facebook is sharing open designs of data center products and, while we have yet to see commercial off-the-shelf (COTS) 24 and 48-port gigabit switches trickle down to the consumer market, the company is definitely deploying new hardware based on those specifications in its own data centers, and those devices are often based on Linux.
The Linux switch implementation
So how exactly do switches work in Linux?
The Linux kernel manipulates switches with three different operation
structures: switchdev_ops
, which we previously
covered, ethtool_ops
, and netdev_ops
. Certain
switches, however, also need distributed switch architecture
(DSA)
features to be properly handled. DSA is a more obscure part of the
network stack that allows Linux to represent hardware switches or chains
of switches using regular Linux tools like bridge
, ifconfig
, and so
on. While switchdev is a new layer, DSA has been in the kernel since
2.6.28 in 2008. Originally developed to support
Marvell switches, DSA is now a generic layer deployed in WiFi access
points, set-top boxes, on-board flight entertainment systems, trains,
and other industrial equipment. Switches that have an Ethernet
controller need DSA, whereas the kernel can support switches without
Ethernet controllers directly with switchdev drivers.
The first years of DSA's development consisted only of basic maintenance but, in the last three years, DSA has seen a resurgence of contributions, as part of Linux networking push to support hardware offloading and network switches. Between 2014 and 2015, DSA added support for Broadcom hardware, wake on LAN, and hardware port bridging, among other features.
DSA's development was parallel to swconfig, written by the OpenWrt project to support the small office and home office (SOHO) routers that the project is targeting. The main difference between swconfig and DSA is that DSA-supported switches show one network interface per port, whereas swconfig-configured switches show up as a single port, which limits the amount of information that can be extracted from the switch. For example, you cannot have per-port traffic statistics with swconfig. That limitation is what led to the creation of the switchdev framework, when swconfig was proposed (then refused) for inclusion in mainline. Another goal of switchdev was to support bridge hardware offloading and network interface card (NIC) virtualization.
Also, whereas swconfig uses virtual LAN (VLAN) tagging to address ports, DSA enables the use of device-specific tagging headers to address different ports, which enables DSA to have better control over the switches. This allows, for example, DSA to do internet group management protocol (IGMP) snooping or implement the spanning tree protocol, whereas swconfig doesn't have those features. Some switches are actually connected to the host CPU through an Ethernet interface instead of regular PCI-Express interface, and DSA supports this as well.
One advantage that remains in the swconfig approach is that it treats the internal switch as a simple external switch, and addresses ports with standard VLAN tags. This is something DSA could do, as well, but no one has bothered implementing this just yet. For now, DSA drivers use device-specific tagging mechanisms that limit the number of supported devices. Other areas of future improvement for DSA are better multi-chip support, IGMP snooping, and bonding, as well as firewall, NAT, and TC offloading.
Where is the freedom?
Given all those technical improvements, you might rightfully wonder if your own wireless router or data center switch runs Linux.
In recent years, we have seen more and more networking devices shipped with Linux and sometimes even OpenWrt (e.g. in the case of the Turris Omnia, which we previously covered), and especially on SOHO routers, but it sometimes means a crippled operating system that only offers you a proprietary web interface. But at least those efforts make it easier to deploy free operating systems on those devices.
Based on my experience running OpenWrt on wireless routers to build the Montreal mesh network, deploying Linux on routers and switches has always been a manual process. The Ubiquiti hardware being used in the mesh network comes with an OpenWrt derivative, but it includes proprietary drivers and a proprietary web interface. To use the mesh networking protocol that was chosen, it was necessary to deploy custom OpenWrt images by hand. For years, it was a continuous struggle for OpenWrt developers to liberate generation after generation of proprietary hardware with companies like Cisco locking down the venerable WRT platform in 2006 and the US Federal Communications Commission (FCC) rules that forced TP-Link to block free software on its routers, a change that was later reverted.
Most hardware providers are obviously not dedicated to software freedom: deploying Linux on their hardware is for them an economic, not political choice. As you might expect, a "Linux router" these days often means a closed device and operating system, using Linux as the only free component. I had the privilege of doing some reverse engineering on the SmartRG SR603n VDSL modem, which also doubles as a WiFi router and VoIP phone adapter. I was able to get a shell on that machine, and what I found was a custom operating system built on top of the Linux kernel. I wrote a detailed report about this two years ago and the situation then was pretty grim.
Another experience I had was working over a decade in data centers, which tells an even worse story: most switches and routers there are not running free software at all. We have deployed HP ProCurve switches that provide free (as in beer) software updates and have struggled for years to find free (as in speech) software alternatives for those. We built our own routers using COTS server hardware, at a significant performance cost over the dedicated application-specific integrated circuits (ASICs) built into commercial routers, which do not offer us the trust and flexibility we were looking for.
But Linux is definitely making some headway, and has been for a while. When we covered switchdev in February 2016, it was just getting started, but now vendors like Mellanox, Broadcom, Cumulus, and Intel are writing and shipping code using the framework. Cumulus, in particular, is developing a Debian-based distribution (Cumulus Linux) that it deploys for clients on targeted hardware. Most of the hardware in that list, however, is not actually open in the more radical sense: they are devices that can run free software but are not generally open-source hardware. There are some exceptions, but they sit at the higher end of the spectrum: most organizations probably don't need a 100 Gbps ports, let alone the 128 ports in the Backpack switch that Cumulus is shipping.
How much this translates to actual freedom for the end-user is therefore questionable. While I have seen encouraging progress on the high end of the hardware spectrum at Netdev, I'm not sure this will translate into immediate wins in the data center or even for home users in the short term. In the long term, however, we will hopefully see some progress in Linux's rise in general-purpose networking hardware following its dominance in general-purpose computing.
The author would like to thank the Netdev organizers for travel assistance. Also, thanks to Andrew Lunn for a technical review of this article.
Note: this article first appeared in the Linux Weekly News.
thanks for the heads up! should always run linkchecker on my articles...
and I should automate this further...