Containers and boxes

A long time ago in a blog not very far away, I wrote about containers and at the time struggling to see the actual point to it. For a one man band type deployment it brings a lot of additional places to update (inside the container, and outside), and complexity to handle. The new abstraction was a new way of thinking, and thinking of it like a VM can only get you so far. It also pushes new tooling and ways of doing things, which might not stand the test of time. It seemed to be a solution looking for a problem and had a mental shift needed to get on the learning curve.

However I also missed a large part of the benefit - by trying to boil the ocean as I usually do - and that is delegation of responsibilities. If you take an off the shelf container, someone else is (hopefully) managing the contents of that container. You can also use the same container image for multiple environments, which is better for testing (and even test on demand). You also get clear separation of application from data (and better still, temporary and persistent data separate too). It also lets you get nice version rollback, as the previous container image is likely still on the system, if an upgrade fails you can just restart with the previous image (oh but). You hopefully gain some security benefits from only allowing certain ports to be opened, between certain containers (or the world) and the same for disk access, no more apps sharing the same files for poorly thought out integration. These additional layers to the security stack help.

That’s assuming your container comes from a trusted source, or a source you trust. Unless you build it yourself (boil the ocean) that is. Oh and restarting from a previous container image for rollback isn’t quite so simple with docker as you have to redeploy it. Oh and not all containers seem to like it when the underlying OS has selinux enabled. But tooling can help there right?

The new tooling is always changing too. I went very heavily into ansible for configuraton management, and therefore had to make container (docker at the time) work in that framework. This forced me to go further down that rabbit hole. As docker-compose couldn’t do everything I wanted per container, I had to do my own and either wrap compose, or just do what it does. I ended up not wrapping compose as it was another dependency and I could build everything directly with ansible modules anyway - one tool is better than two. Later on I did see many different projects go the other route - using ansible jinja templates to render docker compose files and then building that with the docker_compose module. Between this and a later project I very much leveled up in ansible and still use it for most automation type tasks. Possible side rant about yaml (I started with the more condensed short form in ansible, which got deprecated forcing migration, but eventually embraced the full form and now everything just works. All those yaml haters haven’t reached enlightenment yet).

Of course I had to embrace other tools too. I swapped to using vagrant for throwaway test machines (way faster than vmware), and even for a heavily customised ontap lab (for automation work). Not that vagrant is free of issues - it has plenty of gotchas (and ruby). It’s fine if you stick within the 80% that everyone uses, but if you try to stray towards the edge of the path (not even off it) you could hit issues which are just not possible to work around. Oh so you want to swap virtualbox out for qemu, well you can’t do multiple disks in a box anymore. Oh but if you want multiple machines behind a nat gateway style network config you can’t do that with virtualbox. TA DA!

Then to improve the vagrant tooling (and return to boiling the ocean of course) I started using Packer to build the base images, and again you hit snags. The config file format has even changed during the time I’ve used it (1.1.3 through current 1.7.10). From json (which sucked as no comment fields were in the schema) to their own language (urg!) hcl2 - but the transition is incomplete and not all features work yet! TADA! Packer can build vagrant images nicely and lets you push to the cloud (their cloud, not a s3 bucket), but hosting locally means more hacky bits (no versioning or local repository). So you end up wrapping that in more shell scripting to handle the extra bits (two providers same box), like I used with ansible to do the extra bits for containers. It’s all just circular cycles of the same. Each loop just changes subtly from the previous.

So you end up with a box running containers and another virtual box running containers. Sounds almost like a meme. Oh wait, it’s been done LIKE EVERYTHING.

cat plus cat boxed boxed

But so many other things have changed too, which should be the topics of future posts.


Containers or hell?

I’ve been looking at mixing it all up - completely - post 2 of 2.

Mixing things up

The other part to look at is the VM’s and app hosting. If I end up running linux on a box to serve it all, I can host many/all of the basic apps I use directly on there. But should they be isolated, and to what degree?

All the hipsters are into containers today but with them I see a common problem of software security/patching. On the operations side we’re trading in a full blown VM with a guest OS we support and patch, for a black box container running whatever the developer put in it. People also push/pull these container images from a global repository all the time; at least there is a reputation system but we know how they can be gamed. I’m just concerned about the contents of the image as it’s possible the container maintainer is not the same team as the software you want which is in it. The container could contain any code which will run on your network. You’re putting the trust in an additional party - or taking on the container packaging role yourself.

Hipster containers

If you take on the role yourself, then you need to ask yourself what are you gaining or protecting yourself from anyway. I run a few python based webapps each as a service out of systemd on a centos VM. One VM, several services, each as a separate user. This VM only has read only NFS access except for a folder where their config/database resides (or where they need to drop or write files). This level of isolation isn’t too dissimiliar to containers within docker. With one exception - docker you create a container per app. It is true application isolation.

This lead me to wonder how far should you take it. I run Observium to monitor everything and it uses mysql (mariadb) for some things. Should this database engine be within the container (self contained single app container) or should the database be separate and use linked containers so the app server can find it’s DB server if it ever moves to a separate host. The usual googleing turned up a few answers but none that made it totally clear one way or the other. It always depended on x, y or z.

If it’s all self contained, then the external attack footprint is smaller (fewer ports opened), but then you lose the ability to scale the app server separately to the DB server, or even run them on separate hosts. Not a huge issue for me to be honest - but lets do things properly and over engineer.

Putting the database in a container of it’s own has similar shortcomings to the integrated one. The database files need to be external to the container, which is fine, we want clear separation of data from code so that’s ok. Then what about backups? Is there a scheduled task to connect to the container and run the backup (or is the job inside the container), again writing outside the container. How it connects would vary if it’s integrated or separate container due to ports being opened. In this case, do we share this DB container with another application which might also need the same database engine. Suggestions say no, due to version dependecies possibly changing between the applications. Yikes. Now we’re running multiple DB instances on potentially the same hardware. It’s also not clear to what degree memory deduplication works on docker - if at all. This quote sealed that deal for me: If you have a docker host running on top of a hypervisor, then it should be possible for either the docker host or the hypervisor to do memory deduplication and compression. So we’re back to running a hypervisor to make up for a kernel feature which exists but doesn’t work with docker due to process isolation (cgroups), oops.

Docker also seems to go against my ansible quest. Since the docker way of updating is to throw the instance away and start a new one - the data you need to keep is not touched as it’s outside of the container. I do like this bit, but I’ve already done that by having the apps sit on an NFS export. This approach does have merit, as the dockerfile is a top down script on how to build the container content. Being focused on a single goal some I’ve looked at are quire concise, however others are hugely complicated. YMMV.

Oh and then don’t forget you can run containers on ESX now with Vsphere Integrated Containers.

I’ve said many times before, the plot chickens.

The Plot Chickens


Powered by hampsters on a wheel.
Built with Hugo
Theme Stack designed by Jimmy