OpenWRT with a/b image

This is part 2 of the story (part 1 is here). The existing repos I found for a/b image management I focused on were all forked from an initial one that was not updated for a few years. But it was literally a single script and readme, so that’s ok. It was a bit more manual than I liked (edit some environment vars before use) so started off with making it possible to auto detect that.

Probably worth stopping here and saying I’m using GPT partitioning, since we live in the present, not the past. So the base image I start my build from is the generic-ext4-combined-efi.img.gz. Partition layout of this image is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ parted ./ext4-combined-efi.img --script --fix unit MiB print
Disk ./ext4-combined-efi.img: 120MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start    End      Size     File system  Name  Flags
128     0.02MiB  0.25MiB  0.23MiB                     bios_grub
 1      0.25MiB  16.2MiB  16.0MiB  fat16              legacy_boot
 2      16.3MiB  120MiB   104MiB   ext4

We do have the dead simple “boot” on partition 1, and “root” on partition 2. We also have the bios boot partition with id 128 right at the start of the disk (and you can pretty much just ignore it). When you write this to a real device you’d normally grow out partition 2 and resize the filesystem and go on with your life. This free space after partition 2 is where we’d put our “other” root partition.

The upgrade script (from the various repos) basically figures out which root you’re using, and then wipes and unpacks the provided rootfs into the other partition. Rewriting the grub.cfg then makes us boot the other partition. The kernels and grub share the same boot partition (so I resized it just in case). It runs sysupgrade and then unpacks the result into the other partition to maintain the config.

Some of the original issues remained; packages need to be in the rootfs used, sysupgrade depends on backup working

Obvious solution for the first is to use the image builder (which isn’t a build root, it’s just to make images). I used this for the non-x86 builds previously, so I was familiar with it (and had scripts already). This let me bake in the packages I wanted, add in the upgrade script, and also resize boot partition a little bigger. While doing this I opted to label the partitions and filesystems to make the upgrade script able to safely auto detect. The official image uses “kernel” for the boot label and “rootfs” for the root. I changed to “rootfs_a” and “rootfs_b”. As I’m using GPT I also labelled the partitions “root_a” and “root_b” (different in case they get mixed up later). Changing the names let me handle “migrating” from the official image for the upgrade script (maybe useful to someone else).

The sysupgrade backup thing I’d tackle by using sysupgrade’s configuration properly (or validating it was working already).

At this point some quick testing showed this worked great. Base deployment worked, “upgrading” (to the same kernel+rootfs) deployed to the other partition, updated grub and booted into it. The configuration persisted. I then noticed /boot was not mounted, but only on image B. Booting back to image A it mounts again.

Down the rabbithole I went

rabbits dining

OpenWRT is a very cut down simplified linux distribution. It has a text file based config system (probably legacy from the original GPL code roots) and very simple init system. It doesn’t use an initrd, so should be possible to follow the init sequence and see where boot gets mounted (or not). Obviously /etc/fstab was empty, but it was on both root’s so that wasn’t it. I noticed /boot the directory was not in the rootfs tarballs, so something makes the directory. Digging into the boot scripts docs I discovered the system runs /etc/preinit before normal full init system and found a bunch more stuff to look at. Grepping around for /boot in the pre-init stuff I found /lib/preinit/79_move_config and that makes the /boot dir and mounts it - all I had to do was fix that (or understand why it’s breaking to work around it).

79_move_config - doesn’t sound like anything to do with mounting filesystems. This seems to exist for handling a backup file saved to the boot partition, and moving it to root. Maybe part of config-restore for squashfs based systems, so the config gets injected before init starts. This script also performed a mount --bind /boot/boot /boot which explained why previously a df showed /boot twice. Clearly the conditions in this script were not working for image B. Later I found I wasn’t the first person to hit this issue.

Reading into what the functions called from 79_move_config do, I discovered they are for finding the device and partition that boot is on (makes sense). To find the device we use export_bootdevice() and parse the output from /proc/cmdline to get the root= value and depending on the style it’s handled different ways. We have root=PARTUUID=<a guid> but it’s only matching UUID’s ending in 02, and then even more curiously it’s truncating that to end in 00 AND MATCHING ON THAT. Taking a step back and checking ALL the partition UUID’s I see that they all start with the same thing, root ends in 02, boot ends in 01 and the GPT table itself in 00. WTAF. Digging into the image builder I see the partition GUID is set to a reproducable value (based on the version string and date) but it’s using a special tool to make the partition table ptgen. (Side note: this probably makes perfect sense for MBR partition table, since the ID there is the tables, and each partition is a number suffixed on.)

How to work around this? Well a duplicate partition UUID won’t work as the kernel wouldn’t find the right root partition. Not wanting to modify a core file, I had to find another way around - which in this case was reverting back to using root=/dev/sda2 style device names. A quick test of editing grub.cfg showed that booting image B with root=/dev/sda3 successfully mounted boot (twice due to the bind), so we had a workaround that was ok. That can’t be done in the image builder as the target systems boot disk type isn’t known (eg nvme or msata), so I opted to fix that in the upgrade script.

Remember mentioning “migrating” from official image, well that’s exactly how we can fix this too. The grub.cfg the upgrade script writes is using the disk type based device name, since the first time the issue would present is after the first upgrade applied to a system (when you want to boot from image B) it would be fine. Subsequent upgrades would continue with this device naming. For the migration to work smoothly I also had to be careful with kernel naming as initially there is no partition suffix in the name.

The result was this. In theory (and from testing so far) you can safely start with the ext4-combined.img or ext4-combined-efi.img on an empty disk (so GPT or MBR). You can resize the original root partition to be a bit bigger and add another after it. Then this takes care of the rest. One remaining possible issue is both partitions share a boot partition, but the only things there are the kernel images and grub.cfg. For my use though, I’m using image builder which spits out the already customised image with 2nd root, and named partitions and filesystem labels. I will be testing this as each new release comes out and see how it goes.

Powered by hampsters on a wheel.
Built with Hugo
Theme Stack designed by Jimmy