OpenWRT; the journey

So I’m still using Openwrt after many years due to a variety of reasons but mostly due to dns ad blocking and vpn solution support working well. Over the years this has been either a physical (router station pro, edgerouter poe) or VM, or most recently back to bare metal physical on x86. Due to my interest (need) to script/automate everything I’ve had exposure to the various build systems and how to bake config into the image to make easy deployment/replacement, or in times of moving platforms, easy migration too.

Moving back to x86, initially in a VM, allowed me to simplify the build and try out other ways of automating the upgrade. I was using a packer built image, with an ansible playbook to deploy and swap the running router to the new version with minimal network drop. Via ansible I could deploy “router new” from template while “router” was running, power off “router”, power up “router new” and 30sec later had a working router again (then if successful, delete the old, rename the new to be ready for next time). Using the same mac address made the wan side switchover very fast too. It worked well enough on esxi, even without vsphere server running for both the image build and the deploy.

However with the future of vmware being on the rocks, me not using it for anything else anymore and the hardware I was using being no longer on the HCL motivated me to go back to bare metal, on a newer lower power box.

hard drive and router

Going to a bare metal x86 box I was able to use the same image building methodology to build a disk image (in virtualbox now), but the initial deploy of it would change a lot. Writing a raw image over the network to another machine without ipmi/ilo/idrac to help was a bit more manual. Initially I used a usb boot drive and NVMEoTCP to write it to the raw block device - this was great, except the usb boot needs dhcp and if dhcp is down that’s not going to work (yes I could bake a static ip version, but didn’t). Another option was to pull the m2 out to image it, but again the disruption would be longer that I’d like. There was also a risk of it not coming back up with these approaches, and without a router, being unable to build a “new” fixed image.

Unless I wanted to have a cold standby router (or temporary travel router) to facilitate upgrades, I needed to handle it within the system not as an image deployment style upgrade. This is fine though, since OpenWRT has been around for ages, runs on embedded devices so would have this all sorted… right? On embedded devices it usually uses a squashfs root with read-write overlay over the top (saves space, saves flash endurance) - and makes upgrades a bit easier as the whole base of the system is a single file you can swap over. On x86 there is a squashfs image or an ext4 image to choose from. Since write endurance isn’t a concern I defaulted to the ext4 image, and it’s just a basic kernel + rootpart linux box. Crazy simple and easy to work with.

However, reviewing the built in upgrade methods I hit a few snags that I didn’t like.

Sysupgrade

The original method, which is well suited to flash devices running from ram. This method relies on the backup tool to capture all config/changes and blows away the root and redeploys from a rootfs image, then restores config. The snag is the rootfs image has base packages, so you need to add your packages back in after the new version boots.

There are various scripts to help with that, nothing too official. Also as this is replacing the running root with a new system it sounds more risky compared to swapping out the squashfs image might have been (think what happens if it gets interupted part way though).

Attended Sysupgrade

This is the solution to the packages in image problem. The tool requests from a build server an image with your package list baked in, and then downloads it. It’s attended as you need to click go between downloading and upgrading. This still relies on the backup capturing all the files. Unfortunately in my testing, this had a long queue on the build server (over 2 hours) and timed out building it, so gave me nothing. Unusable, sad pandas.

The idea was pretty cool of course. But I already knew how quick and easy the image builder was from previous iterations.

So thinking more broadly of the problem I saw a few options.

Consider swapping to the squashfs based x86 image and go via sysupgrade. For the package list I’d have to build my own image. This doesn’t account for rollback either.
Thinking more like a commercial appliance, they all use A/B image partitions for this exact reason - to give you rollback, but also gives you a work area for the upgrade - the not in use partition.

Looking for this I found a few blogs and some repos on github which looked promising. The system being so basic was helpful in this type of hack.

(Continues in part 2)