This is part 2 of the story (part 1 is here). The existing repos I found for a/b image management I focused on were all forked from an initial one that was not updated for a few years. But it was literally a single script and readme, so that’s ok. It was a bit more manual than I liked (edit some environment vars before use) so started off with making it possible to auto detect that.
Probably worth stopping here and saying I’m using GPT partitioning, since we live in the present, not the past. So the base image I start my build from is the generic-ext4-combined-efi.img.gz. Partition layout of this image is:
|
|
We do have the dead simple “boot” on partition 1, and “root” on partition 2. We also have the bios boot partition with id 128 right at the start of the disk (and you can pretty much just ignore it). When you write this to a real device you’d normally grow out partition 2 and resize the filesystem and go on with your life. This free space after partition 2 is where we’d put our “other” root partition.
The upgrade script (from the various repos) basically figures out which root you’re using, and then wipes and unpacks the provided rootfs into the other partition. Rewriting the grub.cfg then makes us boot the other partition. The kernels and grub share the same boot partition (so I resized it just in case). It runs sysupgrade and then unpacks the result into the other partition to maintain the config.
Some of the original issues remained; packages need to be in the rootfs used, sysupgrade depends on backup working
Obvious solution for the first is to use the image builder (which isn’t a build root, it’s just to make images). I used this for the non-x86 builds previously, so I was familiar with it (and had scripts already). This let me bake in the packages I wanted, add in the upgrade script, and also resize boot partition a little bigger. While doing this I opted to label the partitions and filesystems to make the upgrade script able to safely auto detect. The official image uses “kernel” for the boot label and “rootfs” for the root. I changed to “rootfs_a” and “rootfs_b”. As I’m using GPT I also labelled the partitions “root_a” and “root_b” (different in case they get mixed up later). Changing the names let me handle “migrating” from the official image for the upgrade script (maybe useful to someone else).
The sysupgrade backup thing I’d tackle by using sysupgrade’s configuration properly (or validating it was working already).
At this point some quick testing showed this worked great. Base deployment worked, “upgrading” (to the same
kernel+rootfs) deployed to the other partition, updated grub and booted into it. The configuration persisted. I then
noticed /boot
was not mounted, but only on image B. Booting back to image A it mounts again.
Down the rabbithole I went
OpenWRT is a very cut down simplified linux distribution. It has a text file based config system (probably legacy from
the original GPL code roots) and very simple init system. It doesn’t use an initrd, so should be possible to follow the
init sequence and see where boot gets mounted (or not). Obviously /etc/fstab
was empty, but it was on both root’s so
that wasn’t it. I noticed /boot
the directory was not in the rootfs tarballs, so something makes the directory.
Digging into the boot scripts docs I discovered the system runs
/etc/preinit
before normal full init system and found a bunch more stuff to look at. Grepping around for /boot
in
the pre-init stuff I found /lib/preinit/79_move_config
and that makes the /boot
dir and mounts it - all I had to do
was fix that (or understand why it’s breaking to work around it).
79_move_config
- doesn’t sound like anything to do with mounting filesystems. This seems to exist for handling a
backup file saved to the boot partition, and moving it to root. Maybe part of config-restore for squashfs based systems,
so the config gets injected before init starts. This script also performed a mount --bind /boot/boot /boot
which
explained why previously a df showed /boot
twice. Clearly the conditions in this script were not working for image B.
Later I found I wasn’t the first person to hit this issue.
Reading into what the functions called from 79_move_config
do, I discovered they are for finding the device and
partition that boot is on (makes sense). To find the device we use
export_bootdevice()
and parse the output from /proc/cmdline
to get the root=
value and depending on the style it’s handled different
ways. We have root=PARTUUID=<a guid>
but it’s only matching UUID’s ending in 02, and then even more curiously it’s
truncating that to end in 00 AND MATCHING ON THAT. Taking a step back and checking ALL the partition UUID’s I see
that they all start with the same thing, root ends in 02, boot ends in 01 and the GPT table itself in 00. WTAF. Digging
into the image builder I see the partition GUID is set to a reproducable value (based on the version string and date)
but it’s using a special tool to make the partition table ptgen
. (Side note: this probably makes perfect sense for MBR
partition table, since the ID there is the tables, and each partition is a number suffixed on.)
How to work around this? Well a duplicate partition UUID won’t work as the kernel wouldn’t find the right root
partition. Not wanting to modify a core file, I had to find another way around - which in this case was reverting back
to using root=/dev/sda2
style device names. A quick test of editing grub.cfg
showed that booting image B with
root=/dev/sda3
successfully mounted boot (twice due to the bind), so we had a workaround that was ok. That can’t be
done in the image builder as the target systems boot disk type isn’t known (eg nvme or msata), so I opted to fix that
in the upgrade script.
Remember mentioning “migrating” from official image, well that’s exactly how we can fix this too. The grub.cfg the upgrade script writes is using the disk type based device name, since the first time the issue would present is after the first upgrade applied to a system (when you want to boot from image B) it would be fine. Subsequent upgrades would continue with this device naming. For the migration to work smoothly I also had to be careful with kernel naming as initially there is no partition suffix in the name.
The result was this. In theory (and from testing so far) you can
safely start with the ext4-combined.img
or ext4-combined-efi.img
on an empty disk (so GPT or MBR). You can resize
the original root partition to be a bit bigger and add another after it. Then this takes care of the rest. One remaining
possible issue is both partitions share a boot partition, but the only things there are the kernel images and grub.cfg.
For my use though, I’m using image builder which spits out the already customised image with 2nd root, and named
partitions and filesystem labels. I will be testing this as each new release comes out and see how it goes.