random complexity

Splat.cx: a long winded history lesson

So after writing a few posts in February the only feedback I received was it doesn’t render well on a phone. Nothing about the content or the opinions expressed, just that. Oh well. That jab in the guts lead me to revisit migrating to a newer codebase and more importantly, a codebase someone else maintains. As that will more easily enable deploying a new theme, which was the primary complaint.

This feels like something I’ve done before, so lets cast an eye back into the history books.

dictionary page showing history

Part 1: Archaeology of a code base

So here we go for a history lesson. I’ve moved between a fair few content systems over the years. Some home grown, some from elsewhere and even that pattern seems cyclical.

From	To	Name	Notes
Jan 2000	Oct 2000	Manually edited	Manual html and basic php templater
Oct 2000	Nov 2001	SIPS	Polls, user accounts, comments etc
Nov 2001	(at least) Apr 2005	SIPS majorly refactored	User accounts and comments removed, added lists, story trees in 2004
(not earlier than) Apr 2005	Oct 2009	bBlog	(export tool from SIPS(refactor) to sql targets wrong schema)
Oct 2009	May 2011	Wordpress	Tags not in use before wordpress
May 2011	Apr 2022	StaticBlog	My C# based tool
Apr 2016		Hugo	Initial migration attempt, not completed
Jun 2017		Hugo	Another migration attempt, not completed
Apr 2022	Future	Hugo	Another migration attempted and cut over

SIPS (Simple Internet Publishing System)

It was a php based blog tool along the line of what the slash engine could do, except it stored data in flat text files. Last release was in 2002 but I had majorly refactored/rewritten it before that release (referenced in a post 11 Nov 2001).

The rewrite ripped out user accounts and comments, added ordered lists, and later on (Jan 2004) added support for a alt/parallel blog (story tree). That idea never continued on to future versions.

The code includes an exporter script which output sql insert lines. This showed up in Aug 2004 so a migration was being worked on at the time, last alt/parallel post was Sep 2004. I haven’t isolated the cutover date precisely but I know it was after April 2005.

bBlog

bBlog was also a php based blog too, which used smarty templating and mysql for data storage. It had a plugin capability and was fairly full featured. Not much remains from this era in my backup archives.

There’s a gap on archive.org also for this entire time, so there was probably an overly aggressive robots.txt present. The SIPS exporter produced sql inserts but column names and order didn’t match any version of bBlog I could get today. It’s possible it was exported to b2/cafepress which is a direct predecessor of both bBlog and wordpress or I just manually massaged the data in.

An sql dump from Oct 2009 indicates bBlog 0.7.6 was in use at the time, which was the final release (back in 2005 however). The same backup contains an sql dump from wordpress with the same posts present and prior to this point in time posts were not tagged, so I’m fairly certain of the end date for bBlog. A post drawing a line in the sand marked this transition, 24 Oct 2009.

Wordpress

Wordpress was the in platform at the time. Lots of themes and plugins was the idea, but the reality of wordpress is you’re a honeypot for all the malware of the world. You had to keep on top of that, and auto updates couldn’t be relied on either. This lead me to move away from a dynamic site entirely as the maintenance it required was more time consuming than I wanted, so it didn’t justify the dynamic features. I can remember two themes in use at this time, but haven’t found any archives containing the complete site code. Around this time (2011) I decided to move to a static file based solution, and ended up writing my own.

StaticBlog (private)

This home grown tool, uses markdown for file storage on disk (with metadata headers per file) and a C# command line app to generate all the files. Templates are compiled at runtime and executed. One advanced feature of this codebase was the tag cloud. The data was imported from wordpress’s mysql schema via a little tool I knocked up (also in C#) which spat out the text files with various wordpress-ism’s formatted out and the metadata headers at the top. This codebase saw a few themes over the years, the original pixel perfect of the previous wordpress theme - which was done intentionally so no one would notice the change. Later (2013) a bootstrap based theme was built from scratch and that is essentially what is present right through into 2022.

This tool ran in mono for cross platform support, and was self contained (exe and a few dlls in a folder). The biggest issue with it was the templates. It was written to be like a T4 template (so dotnet native), but compiled at runtime. The exact library was TemplateMaschine. This was fine (and fast) when they worked, but any syntax error was a total pain to track down - and worse, any parser errors necessitated step through debugging to isolate the issue (I found a new one just last month when banging away at a new theme).

Based on the converter scripts I found in my source tree, I had two separate initial attempts at migrating to hugo, both quite a while ago though, so the itch has been there for ages.

Hugo

In 2016 and 2017 I looked at migrating to hugo as it ticked many boxes of what I wanted in a static blog tool - single binary with no dependencies. That ruled out tools written in python/ruby or any other similar language which is an abomination of package garbage on a modern OS. I wrote new converters for each attempt to translate the flavours of markdown and dump files on disk, however I didn’t swap over either time because of niggling issues in the tool or theme. It just couldn’t do what I wanted - and having not learned go at that stage I wasn’t going to dive in and extend it.

The tool migration was not without issues - needing to upgrade the current layout and not fully grasping the apparently very complex templating environment in use I opted to go and find one that I liked instead. Then I only had to bend it to suit exactly what I wanted in the site. In doing this I came across a handful of bugs (either still open or auto-closed as stale even though they’re still present). Fortunately I was able to resolve or work around these issues by editing the template source. This is slightly streamlined by hugo’s ability to inherit/stack themes so you can use someone else’s theme and then layer on top a subset for your personalisation of the theme. Doing this with hugo modules makes it even more streamlined.

Part 2: Post formatting clean up

So the C# code lasted until today when I’ve actually cut over to hugo. In the years between when the C# tool was written and now the style of markdown in use has evolved substantially, which made a lot of what the old code parsed actually not work in a modern (strict) markdown parser. The one major thing I especially wanted to maintain was the ability to keep post images and text together (or very close), rather than having them in parallel structure a few levels deep - to me it mattered that the content was not split. The content is king, and it’s layout on disk should make sense to the author - and in my opinion - not necessarily relate to the published layout in any way (because the rendering engine is responsible for and controls the finished layout).

Hugo can do this with page bundles - however the compromise is your post has to be a folder and it’s filename has to be index.md. To migrate I had to do some translation on the markdown dialect too. So for this third attempt I wrote another converter from scratch (but did find the 2016 and 2017 ones after it was working) - which was a good exercise as the methodology I used changed over the years. The bulk of the markdown format changes I was able to clean up with some regular expressions and some content aware parsing (where changes implied formatting intent). The headers were a simple translation just to turn the old key=value pairs into yaml. All that remained at the end was a bunch of one off corrections to be made so I just did those manually. It became apparent too that the previous migrations were not perfect either, so I fixed up a bunch of mangled html inside markdown and half translated bits which caused rendering errors. I did translate all of the posts, not just what is visible on the site today, and the bulk of these major errors were in the much older posts.

Fixing up these posts meant skimming or reading them. It meant checking included images showed up and captions were correct. It meant changing previously ambiguous/custom markdown syntax into commonmark’s more defined syntax. One thing I didn’t want to do was to reformat posts. Most of the limitations I faced were due to old markdown allowing html pass through for language deficiencies, and now having html pass through disabled (by default) I had to make some adjustments. A quick braindump of these were:

Superscript. Mainly used for footnotes, so they were updated to footnote syntax. Other superscript use I went to more traditional markup as the standard isn’t supported. New standard is to wrap in caret’s x^2^ would render as x^2^ (x squared) if it worked.
Subscript. Used in fewer places and to indicate number base, so really needs math markup. Reverted to more basic markup.
Strikeout. Fortunately wrap in double tilde is now wide spread for strikeout. ~~example~~ renders as ~~example~~
Underline. Would you believe this isn’t defined? Traditionally markdown uses single and double asterisk for emphasis, single is italics, double is bold. Alternatively you were allowed to use underscore the same way. The Markdown parser I used in C# varied this, single underscore meant underline. Double was still bold. So I had to fix them up. These days using underscore triggers a markdown lint warning.
Code fences have improved a lot, so html inlined pre blocks had to be removed. But I also had inconsistent use of block tabbed indents for preformatted text. Sometimes without blank lines before or after. Bunch of combinations to clean up but easy enough in the converter.
Emphasis on a link. Do you put the *’s inside or outside the []’s. #OCDthings

A more complete list of the markdown features supported by hugo is here.

Part 3: Post content clean up

Old posts are a window to a previous simpler time.

Probably a more interesting thought than the codebase or markdown syntax changes, aren’t you glad you skimmed over all that down here?

The simplest statement would be the posts varied in quality, size and frequency. But the real eye opening was the posts content. Ranging from late night gibberish during uni assignment cram time, though some deep and meaningful but probably misinterpreted or misunderstood feelings, a fair amount of semi technical challenges (with some MacGyverisms) and too many one liners about film or tv. If I had to select a word to sum it up, it would be crap.

At times there were huge gaps between posts too, when interest just wasn’t there or was just simply too busy; For a long time the longest quiet stretch was Jun 2007 through Oct 2009 (28 months), but that’s now been blown away by the latest 55 month gap from Jul 2017 to Feb 2022. But rewinding from there makes it worse - Jul 2017 was a burst of 7 posts in a week following a 14 month gap from a single post in May 2016, which itself followed a 22 month gap. That sequence was Jul 2014 to Feb 2022 and included 8 posts in the 91 months (7 years 7 months). Hopefully I can improve on that posting average without a terrible drop off in quality. Outside of those times it was common for a post gap to be from a few weeks to 5-10 months. I’m not the posting every day type (and when I tried that in Apr 2001 the quality and subject matter died faster than, umm perhaps this download of q3test over dialup)

q3test download failing

A few years back I did a great purge and hid the least professional posts, which hopefully improved the average - what you’re seeing now is the bit above the water on this iceberg. There’s 30 visible posts today, and 246 that didn’t make the cut.

Part 4: How deep does this rabbit hole go?

To come up with this garbage I actually put a boat load of time into figuring out when various cutovers occurred. I even considered building a git repo showing the evolution over time from all of the backups. Similar to (and inspired by) the ancient unix history repo. I didn’t go quite that far (yet) but did manage to locate 29 snapshots of the site from Aug 2000 through to Feb 2005. From there through Feb 2011 the site was dynamic and needed the database dump also to be useful - of those I found only a single backup containing both wordpress and bblog dumps from 2009. This gap I can only attribute to me moving from haphazard and disorganised backups to more structured backups. But also between moving hosting providers.

For a static file based site, the disorganized random zip of a folder and stashed somewhere survived really well. The only snag was some of them lost dot files in the top level folder. These archives also preserved the file timestamps really well. However they were random and disorganised, so there was huge gaps in their dates and they were not located in a single well organised place (data management is hard). With these dumps I can also derive which upstream codebase was still in use and any custom modifications or bug fixes I might have done.

The database driven dynamic sites were hosted on a more professional platform, where I had cron’d up backup jobs shipping tarballs and sql dumps off box regularly, with basic retention rules. They were backup and not archive so not kept long term (but also because there were paying customers at the time the size was much bigger). When that hosting platform shutdown and I moved servers again those backups aged off. Eventually the final backups from there were dumped, possibly to save disk space and all that remains is a single snapshot from 2012 which still had the files from 2009 in it. So although disk space is a solved problem, it can be an expensive problem, and is no replacement for archives.

Looking forward perhaps I should pay closer attention to things like archive.org to ensure snapshots are saved semi-regularly, and that I keep a regular archive tarball in a sensible place. Since the static blog C# tool was in use, I can re-generate the site with any number of the posts, in any of the themes. I’ve carefully archived that tool, all the posts, it’s source repo and some notes into a sensible place just in case. The content itself was not kept in a repository though, so didn’t have version history attached (1 post per file sort of doesn’t need it.)

With hugo I’m also keeping the site content in a git repo, so as long as I can keep some kind of archive for it maybe in 10 years time there won’t be a gaping black hole of nothing in this decade. Oh but snafu, using hugo modules for the theme include means it depends on an external repo. An alternative there was git-submodule which has the same limitation. Another hope is with a more modern markup language the posts appearance can be better formatted for a more professional look, who knows, maybe I’ll post something worth saving one day.

No new mail alert

Or perhaps not. This did distract me from writing about all the things that have changed over the past few years. Oh well, there will be another time for that. 🤷‍♂️

Packer hcl2 configuration migration

Maybe before I diverge into an old story I should barf up a recent one.

Packer HCL2 configuration migration

But first I should set the scene - I went full retard into using packer to build all the things. Some work very well (Fedora and Centos), some work pretty good (Windows2019 and OmniOS), some you wouldn’t even expect to work can actually be coaxed into behaving quite well (Netapp Ontap), and others are just a basket case needing a nice dose of hack (openwrt). I’ve even kicked the tyres on using multiple builders (targets) in the same file - totally supported mind you. I used Virtualbox initially, then added KVM/QEMU and eventually I needed to target Vmware (which has a host builder esxi and a vsphere builder just to keep you on your toes - incompatible with each other and all the variables are named differently).

So you could say I’ve tried a lot of things in there, and for whatever reasons ended up with some out there configuration in the files. When I started with Packer, the files were json only with no comments. :( json is fine, but not having a comment tag in the schema meant even adding a name:value pair for a comment resulted in a validation error and sad pandas all around. Once I got over that (comments in a file next to the json, and a variable for comments at the top of the file) I added more and more standardisation to my files, to make it easier to up rev something, say Centos 7 to 8. Each packer config had it’s own directory (and comments file), so a new rev was a copy/paste and some renaming and change a few lines in the file. Variables at the top made this easy for all the common things (like iso URL and iso checksum, and guest name) which the multiple builders in the file used - no duplication was important. This worked well for a while, and I built up quite the catalog of a variety of OS’s and versions. Each spitting out a packer box, locally named spaced and versioned so they can all exist and be consumable concurrently and vagrant allows the same box name with different providers too.

copy pasta

I got quite advanced with the Netapp Ontap one, as I wanted to produce a vagrant environment of a multiple node simulator. Step one was taking the simulator that targets vmware, and converting it to run in VirtualBox. Step two was to packer build it to a point that ansible could talk to it (add an IP and password). Then repeat step two for multiple nodes (different IP’s and serial numbers). I was quite pleased with the result and was able to iterate very rapidly on some zero day building playbooks. It eventually got to the point I outgrew what virtualbox could do (I wanted more than 8 nics) and it’s networking was limiting me, so I started moving that to QEMU. However that’s not what this post /rant is about. BACK TO HCL2.

So fast forward some packer versions (I started with 1.1 and 1.7.10 is current now). This new HCL2 thing comes along in beta, and I mostly ignore it. I’m using jq to query my packer json files in the tooling that I’ve built up around it (to better handle the vagrant and later libvirt parts). But by the time 1.7 comes out the writing is on the wall - HCL2 is the future of the packer config format and I’ll need to migrate eventually. FORTUNATELY, it’s been made easy with a migration tool LUCKY!

But in usual hashicorp fashion it seems - 80/20 rule is alive and well. The 80% that everyone has used will work fine, but the 20% that not so many use might not. As is my usual tradition - around this tool I build up some scraps (shell script in this case) to handle the rest. See I want my output files to be orderly and tidy. Not a code generated mess, after all the input files were curated with json fields in a sensible order (say all the disk things grouped together, and the network things together). But as you’d expect the migration tool is the result of a config parse and then export. So the unimportant details like field order get lost and they come out alphabetical.

Other deficiencies in the migration also show up as the templating/variable handling has changed. HCL2 allows for typed variables (good), and has locals which are like a constant (handy). It also lets you name a source (formally builder), and gives you an autogenerated one by default. But the way it handled some of the magic hasn’t made this conversion yet. So now it’s a mixture of HCL2 variables and templating, with a reasonable chunk of the old skool go templating still there. Some specific examples might help.

Builder json "cpus": "{{user `cpus`}}", becomes source hcl2 cpus = "${var.cpus}" though I prefer the cleaner cpus = var.cpus. That’s fixable with sed.

Builder json "iso_checksum": "{{user `iso_checksum`}}", becomes source hcl2 iso_checksum = "${vars.iso_checksum}" though lets make it a local since it’s constant iso_checksum = local.iso_checksum. That’s fixable with sed (and the above one would run first to simplify). The actual edit to the header of the file to turn the variable into a local also needs doing, but thats copy/paste in so skip over that one. Also repeat this for a few vars.

So far so good. Cooking with gas.

Builder json "guest_additions_path": "VBoxGuestAdditions_{{.Version}}.iso", becomes source hcl2 guest_additions_path = "VBoxGuestAdditions_{{ .Version }}.iso". Now that’s surprising, maybe the internal variables that start with dots get left alone.

Builder json "boot_command": ["<up><tab> inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/kickstart.cfg<enter>"], becomes source hcl2 boot_command = ["<up><tab> inst.ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/kickstart.cfg<enter>"]. Yeah maybe that’s it.

Post processor json "output": "../boxes/{{user `codename`}}_{{ .Provider }}.box", becomes post processor hcl2 output = "../boxes/${var.codename}_<no value>.box". grumbles back to the documentation to try and find the answer, and that link saves you the trouble. But the trick is to know it’s not the build.type you want but the source.type. So what I wanted in hcl2 was output = "../boxes/${local.codename}_${source.type}.box". Fortunately sed can save the day here too. Strictly speaking these are not the same though. .Provider returned virtualbox or libvirt where source.type returns virtualbox-iso and qemu. Ta da!

Also while on the topic of the builder/source differences there, it also changes how you call from the cli. I was calling packer with the desired builder on the cli, say packer build -timestamp-ui -only=virtualbox-iso blah.json for virtualbox but now I have to use the name of the source and it’s identifer (or wildcard), so packer build -timestamp-ui -only=virtualbox-iso.centos7 blah.pkr.hcl for example, or generic using a wildcard which then means quoting -only="virtualbox-iso.*" blah.pkr.hcl. Oh dear. These other changes were not mentioned anywhere in the migration article. The 80% out there must only use a single builder anyway, so it wouldn’t ever come up.

So then you end up with a migration script that looks a bit like this (and still has some manual bits that aren’t scripted, like the vars to locals at the top)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


#!/bin/bash
if [ "$#" -ne 1 ]; then
  echo parameter is json file to convert
  exit
fi
FILENAME=$1
FILENAMEHCL=${FILENAME/.json/.pkr.hcl}
BUILDNAME=${FILENAME/.json/}
echo About to convert ${FILENAME} into ${FILENAMEHCL}

packer hcl2_upgrade ${FILENAME}
mv ${FILENAME}.pkr.hcl ${FILENAMEHCL}

sed -i -e "s/var.brand/local.brand/" ${FILENAMEHCL}
sed -i -e "s/var.codename/local.codename/" ${FILENAMEHCL}
sed -i -e "s/var.iso_checksum/local.iso_checksum/" ${FILENAMEHCL}
sed -i -e "s/var.iso_url/local.iso_url/" ${FILENAMEHCL}
sed -i -e 's/_<no value>/_\${source.type}/' ${FILENAMEHCL}

#and turn "${var.disk_size}" into var.disk_size
sed -i -r 's/"\$\{([^}]*)\}"$/\1/' ${FILENAMEHCL}

#replace autogenerated tag with buildname
sed -i -e "s/autogenerated_1/${BUILDNAME}/" ${FILENAMEHCL}
sed -i -e "s/autogenerated_2/${BUILDNAME}/" ${FILENAMEHCL}

But really this is just scratching the surface. Now to rewrite my wrapper scripts to handle the vagrant box add/remove bits, and the libvirt storage pool delete bits. It feels like make work.

It’s just part of the feeding and watering maintenance of the automation journey. Who knows what the next flavour of the month tool will be?

1 2 3 4 … 17