random complexity

This is part 2 of the story (part 1 is here). The existing repos I found for a/b image management I focused on were all forked from an initial one that was not updated for a few years. But it was literally a single script and readme, so that’s ok. It was a bit more manual than I liked (edit some environment vars before use) so started off with making it possible to auto detect that.

Probably worth stopping here and saying I’m using GPT partitioning, since we live in the present, not the past. So the base image I start my build from is the generic-ext4-combined-efi.img.gz. Partition layout of this image is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


$ parted ./ext4-combined-efi.img --script --fix unit MiB print
Disk ./ext4-combined-efi.img: 120MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start    End      Size     File system  Name  Flags
128     0.02MiB  0.25MiB  0.23MiB                     bios_grub
 1      0.25MiB  16.2MiB  16.0MiB  fat16              legacy_boot
 2      16.3MiB  120MiB   104MiB   ext4

We do have the dead simple “boot” on partition 1, and “root” on partition 2. We also have the bios boot partition with id 128 right at the start of the disk (and you can pretty much just ignore it). When you write this to a real device you’d normally grow out partition 2 and resize the filesystem and go on with your life. This free space after partition 2 is where we’d put our “other” root partition.

The upgrade script (from the various repos) basically figures out which root you’re using, and then wipes and unpacks the provided rootfs into the other partition. Rewriting the grub.cfg then makes us boot the other partition. The kernels and grub share the same boot partition (so I resized it just in case). It runs sysupgrade and then unpacks the result into the other partition to maintain the config.

Some of the original issues remained; packages need to be in the rootfs used, sysupgrade depends on backup working

Obvious solution for the first is to use the image builder (which isn’t a build root, it’s just to make images). I used this for the non-x86 builds previously, so I was familiar with it (and had scripts already). This let me bake in the packages I wanted, add in the upgrade script, and also resize boot partition a little bigger. While doing this I opted to label the partitions and filesystems to make the upgrade script able to safely auto detect. The official image uses “kernel” for the boot label and “rootfs” for the root. I changed to “rootfs_a” and “rootfs_b”. As I’m using GPT I also labelled the partitions “root_a” and “root_b” (different in case they get mixed up later). Changing the names let me handle “migrating” from the official image for the upgrade script (maybe useful to someone else).

The sysupgrade backup thing I’d tackle by using sysupgrade’s configuration properly (or validating it was working already).

At this point some quick testing showed this worked great. Base deployment worked, “upgrading” (to the same kernel+rootfs) deployed to the other partition, updated grub and booted into it. The configuration persisted. I then noticed /boot was not mounted, but only on image B. Booting back to image A it mounts again.

Down the rabbithole I went

rabbits dining

OpenWRT is a very cut down simplified linux distribution. It has a text file based config system (probably legacy from the original GPL code roots) and very simple init system. It doesn’t use an initrd, so should be possible to follow the init sequence and see where boot gets mounted (or not). Obviously /etc/fstab was empty, but it was on both root’s so that wasn’t it. I noticed /boot the directory was not in the rootfs tarballs, so something makes the directory. Digging into the boot scripts docs I discovered the system runs /etc/preinit before normal full init system and found a bunch more stuff to look at. Grepping around for /boot in the pre-init stuff I found /lib/preinit/79_move_config and that makes the /boot dir and mounts it - all I had to do was fix that (or understand why it’s breaking to work around it).

79_move_config - doesn’t sound like anything to do with mounting filesystems. This seems to exist for handling a backup file saved to the boot partition, and moving it to root. Maybe part of config-restore for squashfs based systems, so the config gets injected before init starts. This script also performed a mount --bind /boot/boot /boot which explained why previously a df showed /boot twice. Clearly the conditions in this script were not working for image B. Later I found I wasn’t the first person to hit this issue.

Reading into what the functions called from 79_move_config do, I discovered they are for finding the device and partition that boot is on (makes sense). To find the device we use export_bootdevice() and parse the output from /proc/cmdline to get the root= value and depending on the style it’s handled different ways. We have root=PARTUUID=<a guid> but it’s only matching UUID’s ending in 02, and then even more curiously it’s truncating that to end in 00 AND MATCHING ON THAT. Taking a step back and checking ALL the partition UUID’s I see that they all start with the same thing, root ends in 02, boot ends in 01 and the GPT table itself in 00. WTAF. Digging into the image builder I see the partition GUID is set to a reproducable value (based on the version string and date) but it’s using a special tool to make the partition table ptgen. (Side note: this probably makes perfect sense for MBR partition table, since the ID there is the tables, and each partition is a number suffixed on.)

How to work around this? Well a duplicate partition UUID won’t work as the kernel wouldn’t find the right root partition. Not wanting to modify a core file, I had to find another way around - which in this case was reverting back to using root=/dev/sda2 style device names. A quick test of editing grub.cfg showed that booting image B with root=/dev/sda3 successfully mounted boot (twice due to the bind), so we had a workaround that was ok. That can’t be done in the image builder as the target systems boot disk type isn’t known (eg nvme or msata), so I opted to fix that in the upgrade script.

Remember mentioning “migrating” from official image, well that’s exactly how we can fix this too. The grub.cfg the upgrade script writes is using the disk type based device name, since the first time the issue would present is after the first upgrade applied to a system (when you want to boot from image B) it would be fine. Subsequent upgrades would continue with this device naming. For the migration to work smoothly I also had to be careful with kernel naming as initially there is no partition suffix in the name.

The result was this. In theory (and from testing so far) you can safely start with the ext4-combined.img or ext4-combined-efi.img on an empty disk (so GPT or MBR). You can resize the original root partition to be a bit bigger and add another after it. Then this takes care of the rest. One remaining possible issue is both partitions share a boot partition, but the only things there are the kernel images and grub.cfg. For my use though, I’m using image builder which spits out the already customised image with 2nd root, and named partitions and filesystem labels. I will be testing this as each new release comes out and see how it goes.

So I’m still using Openwrt after many years due to a variety of reasons but mostly due to dns ad blocking and vpn solution support working well. Over the years this has been either a physical (router station pro, edgerouter poe) or VM, or most recently back to bare metal physical on x86. Due to my interest (need) to script/automate everything I’ve had exposure to the various build systems and how to bake config into the image to make easy deployment/replacement, or in times of moving platforms, easy migration too.

Moving back to x86, initially in a VM, allowed me to simplify the build and try out other ways of automating the upgrade. I was using a packer built image, with an ansible playbook to deploy and swap the running router to the new version with minimal network drop. Via ansible I could deploy “router new” from template while “router” was running, power off “router”, power up “router new” and 30sec later had a working router again (then if successful, delete the old, rename the new to be ready for next time). Using the same mac address made the wan side switchover very fast too. It worked well enough on esxi, even without vsphere server running for both the image build and the deploy.

However with the future of vmware being on the rocks, me not using it for anything else anymore and the hardware I was using being no longer on the HCL motivated me to go back to bare metal, on a newer lower power box.

hard drive and router

Going to a bare metal x86 box I was able to use the same image building methodology to build a disk image (in virtualbox now), but the initial deploy of it would change a lot. Writing a raw image over the network to another machine without ipmi/ilo/idrac to help was a bit more manual. Initially I used a usb boot drive and NVMEoTCP to write it to the raw block device - this was great, except the usb boot needs dhcp and if dhcp is down that’s not going to work (yes I could bake a static ip version, but didn’t). Another option was to pull the m2 out to image it, but again the disruption would be longer that I’d like. There was also a risk of it not coming back up with these approaches, and without a router, being unable to build a “new” fixed image.

Unless I wanted to have a cold standby router (or temporary travel router) to facilitate upgrades, I needed to handle it within the system not as an image deployment style upgrade. This is fine though, since OpenWRT has been around for ages, runs on embedded devices so would have this all sorted… right? On embedded devices it usually uses a squashfs root with read-write overlay over the top (saves space, saves flash endurance) - and makes upgrades a bit easier as the whole base of the system is a single file you can swap over. On x86 there is a squashfs image or an ext4 image to choose from. Since write endurance isn’t a concern I defaulted to the ext4 image, and it’s just a basic kernel + rootpart linux box. Crazy simple and easy to work with.

However, reviewing the built in upgrade methods I hit a few snags that I didn’t like.

Sysupgrade

The original method, which is well suited to flash devices running from ram. This method relies on the backup tool to capture all config/changes and blows away the root and redeploys from a rootfs image, then restores config. The snag is the rootfs image has base packages, so you need to add your packages back in after the new version boots.

There are various scripts to help with that, nothing too official. Also as this is replacing the running root with a new system it sounds more risky compared to swapping out the squashfs image might have been (think what happens if it gets interupted part way though).

Attended Sysupgrade

This is the solution to the packages in image problem. The tool requests from a build server an image with your package list baked in, and then downloads it. It’s attended as you need to click go between downloading and upgrading. This still relies on the backup capturing all the files. Unfortunately in my testing, this had a long queue on the build server (over 2 hours) and timed out building it, so gave me nothing. Unusable, sad pandas.

The idea was pretty cool of course. But I already knew how quick and easy the image builder was from previous iterations.

So thinking more broadly of the problem I saw a few options.

Consider swapping to the squashfs based x86 image and go via sysupgrade. For the package list I’d have to build my own image. This doesn’t account for rollback either.
Thinking more like a commercial appliance, they all use A/B image partitions for this exact reason - to give you rollback, but also gives you a work area for the upgrade - the not in use partition.

Looking for this I found a few blogs and some repos on github which looked promising. The system being so basic was helpful in this type of hack.

(Continues in part 2)

OpenWRT with a/b image

Down the rabbithole I went

OpenWRT; the journey

Sysupgrade

Attended Sysupgrade