Notes about virtual machine and container hosting.

  1. hosting
  2. KVM bootstrap with libvirt
    1. Bridge configuration
    2. Base image build
      1. Autopkg builders
    3. Virtual machine creation
      1. IP address discovery
    4. Maintenance
    5. Remaining tasks
    6. References
  3. Container notes
    1. Docker
    2. Rocket

KVM bootstrap with libvirt

I got tired of dealing with VirtualBox and Vagrant: those tools work well, but they are too far from datacenter-level hosting primitives, which right now converge towards KVM (or maybe Xen, but that didn't seem to recover from the Meltdown attacks). VirtualBox was also not shipped in stretch because "upstream doesn't play in a really fair mode wrt CVEs" and simply ship updates in bulk.

So I started looking into KVM. It seems a common way to get started with this without setting up a whole cluster management system (e.g. Ganeti) is to use libvirt. The instructions here also include bridge setup information for Debian stretch since that makes it easier to host services inside the virtual machines than a clunky NAT setup.

Bridge configuration

Assuming the local Ethernet interface is called eno1, the following configuration, in /etc/network/interfaces.d/br0, enables a bridge on the host:

iface eno1 inet manual

auto br0
iface br0 inet static
    # really necessary?
    #hwaddress ether f4:4d:30:66:14:9a
    address 192.168.0.7
    netmask 255.255.255.0
    gateway 192.168.0.1
    dns-nameservers 8.8.8.8

    bridge_ports eno1

iface br0 inet6 auto

Then disable other networking interfaces and enable the bridge:

ifdown eno1
service NetworkManager restart
ifup br0

Finally, by default Linux bridges disable forwarding through the firewall. This works independently of the net.ipv[46].conf.all.forwarding setting, which should stay turned off unless we actually want to route packets for the network (as opposed to the guests). This can be tweaked by talking with iptables directly:

iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT

Or, preferably, by disabling the firewall on the bridge completely. This can be done by adding this to /etc/sysctl.d/br0-nf-disable.conf:

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

This was discovered in the libvirt wiki.

Base image build

Then we can build an image using virt-builder:

virt-builder debian-9 --size=10G --format qcow2 \
  -o /var/lib/libvirt/images/stretch-amd64.qcow2 \
  --update \
  --firstboot-command "dpkg-reconfigure openssh-server" \
  --network --edit /etc/network/interfaces:s/ens2/ens3/ \
  --ssh-inject root:string:'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7CY6+aTLlk6epl1+TK6wIaHg1fageEfmKFgn+Yov+2lKFIhNRkcWznQVcyViVmC7iaZkEIei1gP9+0lrsdhewtTBjvkDNxR18aIORJsiH95FFjFIuJ0HQjrM1jOxiXhQZ0xLlnhFkxxa8j9l52HTutpYUU63e3lvY0CBuqh7QtkH3un7iT6EaqMR34yFa2ym35ag8ugMbczBwnTDJYn3qpL8gKuw3JnIp+qdSQb1sGdLcC4JN02E2/IY7iw8lzM9xVab1IgvemCJwS0C/Bt9LsmhCy9AMpaVFaAYjepgdBpSqIMa/8VcoVOrhdJWfIc7fLtt+njN1qojsPmuhsr1n' \
  --hostname stretch-amd64 --timezone UTC

This is not ideal, as it fetches the base image from libguestfs.org, in the clear (as opposed to debian.org infrastructure):

[   1.9] Downloading: http://libguestfs.org/download/builder/debian-9.xz

There is, fortunately, an OpenPGP signature on those images but it might be better to bootstrap using debootstrap (although bootstrapping using the above might be much faster).

Also notice how we edit the interfaces file to fix the interface name. For some reason, the interface detected by virt-builder isn't the same that shows up when running with virt-install, below. The symlink trick does not work: adding --link /dev/null:/etc/systemd/network/99-default.link to the virt-builder incantation does not disable those funky interface names. So we simply rewrite the file.

Finally, we inject our SSH key in the root account. The build process will show a root password but we won't need it thanks to that.

If the build fails with this error:

[ 156.9] Resizing (using virt-resize) to expand the disk to 10.0G
virt-resize: error: libguestfs error: /usr/bin/supermin exited with error 
status 1.

It might be that you ran out of space in /var/tmp. You can use TMPDIR to switch to a larger directory.

Autopkg builders

Images can also be built thanks to autopkgtest which itself delegates the job to vmdb2, with something like:

sudo autopkgtest-build-qemu stable /var/lib/libvirt/images/debian9-amd64-autopkgtest.qcow2

There are obviously many, many more options for building such images, that's just the ones I found the most practical.

Virtual machine creation

Then the virtual machine can be created and started with:

virt-install --virt-type kvm --name stretch-amd64 --memory 512 \
  --import --disk path=stretch-amd64.qcow2 \
  --os-variant=debian9 --network bridge=br0 --noautoconsole

The path argument can be simplified by using existing volume pools, which can be listed with:

# virsh pool-list
 Name                 State      Autostart 
-------------------------------------------
 boot-scratch         active     yes
 default              active     yes

Notice how the virsh command is called as root. That's not absolutely necessary, but by default when called as a user, it will connect to the user-specific session (qemu:///session) instead of the system-level one (qemu:///system). This can be worked around by using the --connect qemu:///system argument or by changing the default URI.

The actual path of the volume pool can be found with:

# virsh pool-dumpxml default | grep path
<path>/var/lib/libvirt/images</path>

Then a machine can be created in the pool with the --disk vol=default/debian9-amd64-autopkgtest.qcow2 argument.

Note that the virtual machine will directly write to the qcow image file. To work on a temporary file, you can create one with:

cd /var/lib/libvirt/images/
qemu-img create -f qcow2 -b debian9-amd64-autopkgtest.qcow2 overlay.img

Then start the machine with:

virt-install --virt-type kvm --name stretch-amd64 --memory 512 \
  --import --disk vol=default/overlay.qcow2 \
  --os-variant=debian9 --network bridge=br0 --noautoconsole

IP address discovery

The VM will be created with an IP address allocated by the DHCP server. The latter logs (or tcpdump -n -i any -s 1500 '(port 67 or port 68)') will show the IP address, otherwise the root password will be necessary to discover it.

Alternatively, the IPv6 address of the guest can be deduced from the IP address of the host's vnet0 interface. For example, here's the interface as viewed from the host:

45: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:1e:c2:48 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe1e:c248/64 scope link 
       valid_lft forever preferred_lft forever

And from the guest:

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:1e:c2:48 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.216/24 brd 192.168.0.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fd05:5f2d:569f:0:5054:ff:fe1e:c248/64 scope global mngtmpaddr dynamic 
       valid_lft 7054sec preferred_lft 1654sec
    inet6 2607:f2c0:f00f:8f00:5054:ff:fe1e:c248/64 scope global mngtmpaddr dynamic 
       valid_lft 7054sec preferred_lft 1654sec
    inet6 fe80::5054:ff:fe1e:c248/64 scope link 
       valid_lft forever preferred_lft forever

Notice how the MAC addresses are almost identical? Only the prefix differ: fe on the host and 52 on the guest. This might be used to guess the IPv6 IP of the guest to administer the machine. The local segment IPv6 multicast address (ff02::1) can be used to confirm the IP address:

# ping6 -I br0 ff02::1
ping6: Warning: source address might be selected on device other than br0.
PING ff02::1(ff02::1) from :: br0: 56 data bytes
[...]
64 bytes from fe80::5054:ff:fe1e:c248%br0: icmp_seq=1 ttl=64 time=0.281 ms (DUP!)
[...]
^C
--- ff02::1 ping statistics ---
1 packets transmitted, 1 received, +4 duplicates, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.049/0.339/0.515/0.166 ms

That latter MAC address is also known by libvirt so this command will show the right MAC:

# domiflist stretch-amd64
Interface  Type       Source     Model       MAC
-------------------------------------------------------
vnet0      bridge     br0        virtio      52:54:00:55:44:73

And obviously, connecting to the console and running ip a will show the right IP address, see below for console usage.

Maintenance

List running VMs:

virsh list

To start a VM:

virsh start stretch-amd64

Get a console:

virsh console stretch-amd64

To stop a VM:

virsh shutdown stretch-amd64

Start a VM already created:

virsh start stretch-amd64

To kill a VM that's hung:

virsh destroy stretch-amd64

To reinstall a VM, the machine needs to be stopped (above) and the namespace reclaimed (source):

virsh undefine stretch-amd64

Remaining tasks

References

Container notes

Those are notes and reminders of how to do "things" with containers, regardless of technology. The are not a replacement for the official documentation and may only be useful for myself.

Docker

To build an image:

docker build --tag foo

That will create an image named "foo" (even if it says --tag, that's actually the image name, whatever).

To enter a container:

docker run --tty --interactive foo /bin/bash

To map volumes to containers, which images pre-define certain VOLUME, first create a volume:

docker volume create foo

Then use it in the container:

docker run --volume foo:/srv/foo /bin/bash

Containers are basically a directory stored in /var/lib/docker/volumes which can be copied around normally.

To restart a container on reboot, use --restart=unless-stopped or --restart=always, as documented.

Rocket

Running docker containers:

$ sudo rkt run --insecure-options=image --interactive docker://busybox -- /bin/sh

Those get resolved using the rkt image resolution.

Re-running:

$ sudo rkt run registry-1.docker.io/library/debian:latest --interactive --exec /bin/bash --net=host

Building images requires using the separate acbuild command which builds "standard" ACI images and not docker images. Other tools are available like Packer, umoci or Buildah, although only Buildah can use Dockerfiles to build images.

Created . Edited .