random complexity

with a side of crazy

Containers or hell?

I've been looking at mixing it all up - completely - post 2 of 2.

Mixing things up

The other part to look at is the VM's and app hosting. If I end up running linux on a box to serve it all, I can host many/all of the basic apps I use directly on there. But should they be isolated, and to what degree?

All the hipsters are into containers today but with them I see a common problem of software security/patching. On the operations side we're trading in a full blown VM with a guest OS we support and patch, for a black box container running whatever the developer put in it. People also push/pull these container images from a global repository all the time; at least there is a reputation system but we know how they can be gamed. I'm just concerned about the contents of the image as it's possible the container maintainer is not the same team as the software you want which is in it. The container could contain any code which will run on your network. You're putting the trust in an additional party - or taking on the container packaging role yourself.

Hipster containers

If you take on the role yourself, then you need to ask yourself what are you gaining or protecting yourself from anyway. I run a few python based webapps each as a service out of systemd on a centos VM. One VM, several services, each as a separate user. This VM only has read only NFS access except for a folder where their config/database resides (or where they need to drop or write files). This level of isolation isn't too dissimiliar to containers within docker. With one exception - docker you create a container per app. It is true application isolation.

This lead me to wonder how far should you take it. I run Observium to monitor everything and it uses mysql (mariadb) for some things. Should this database engine be within the container (self contained single app container) or should the database be separate and use linked containers so the app server can find it's DB server if it ever moves to a separate host. The usual googleing turned up a few answers but none that made it totally clear one way or the other. It always depended on x, y or z.

If it's all self contained, then the external attack footprint is smaller (fewer ports opened), but then you lose the ability to scale the app server separately to the DB server, or even run them on separate hosts. Not a huge issue for me to be honest - but lets do things properly and over engineer.

Putting the database in a container of it's own has similar shortcomings to the integrated one. The database files need to be external to the container, which is fine, we want clear separation of data from code so that's ok. Then what about backups? Is there a scheduled task to connect to the container and run the backup (or is the job inside the container), again writing outside the container. How it connects would vary if it's integrated or separate container due to ports being opened. In this case, do we share this DB container with another application which might also need the same database engine. Suggestions say no, due to version dependecies possibly changing between the applications. Yikes. Now we're running multiple DB instances on potentially the same hardware. It's also not clear to what degree memory deduplication works on docker - if at all. This quote sealed that deal for me: If you have a docker host running on top of a hypervisor, then it should be possible for either the docker host or the hypervisor to do memory deduplication and compression. So we're back to running a hypervisor to make up for a kernel feature which exists but doesn't work with docker due to process isolation (cgroups), oops.

Docker also seems to go against my ansible quest. Since the docker way of updating is to throw the instance away and start a new one - the data you need to keep is not touched as it's outside of the container. I do like this bit, but I've already done that by having the apps sit on an NFS export. This approach does have merit, as the dockerfile is a top down script on how to build the container content. Being focused on a single goal some I've looked at are quire concise, however others are hugely complicated. YMMV.

Oh and then don't forget you can run containers on ESX now with Vsphere Integrated Containers.

I've said many times before, the plot chickens.

The Plot Chickens

Storage and or Cloud

I've been looking at mixing it all up - completely - post 1 of 2.

For a storage box I've been running OmniOS for a while now (previously opensolaris) however OmniOS is now at a critical junction in the community. The corporate sponsor has decided to no longer support it, which was to force the community to step up and participate more. They took a gamble on the project dying or regrouping and moving on; so far we're yet to see the outcome. There is a new group trying to continue on as OmniOSCE which I'll keep an eye on.

I have just upgraded to the latest (last) release of the original so will be ok for a while, and I build off a local repo anyway, it just means no more upstream updates until the fork takes off. However for the mean time, I've been looking at switching distros for it. I don't run any add ons, and just have a basic (ansible'd) setup applied to a pxeboot installed system. This does mean a few VM's - package server and test nas box VM, just to support the two physical boxes which are "production" in comparison.

Originally running opensolaris was to ensure I get the latest ZFS release possible, initially in a VM due to hardware compatibility, and then natively bare metal. Since then the ZFS landscape has changed a lot. OpenZFS is a thing now, zfs on linux is packaged for most if not all distros and considered stable. It's also running the current version that the other platforms run. FreeBSD also has the lastest too, which is in FreeNAS. So now I have a fair few potential NAS OS's to select from for my modest requirements (NFS and cifs is all I use now, but I have used luns in the past). Supporting one fewer bespoke distros would benefit my ansible problem - this is a big plus for running Centos for the NAS box.

One thing that hasn't changed, is the ZFS reshaping problem. If you want to expand your raidz2 stripe size you're still screwed with a dump and reload operation. I've solved this problem by having an offline second copy, so reshaping means update the mirror, destroy, create, and sync it back. Now however, the mirror is approaching the capacity limit where consuming any more space will destroy performance - it's nearly too full, so it will need reshaping. But that means new drives and the chassis is full, so it's a painful problem. This needs out of the box thinking. There are some tricks people use around smaller raid groups and inplace size upgrades but this consumes more parity disks, so actually makes the problem worse not better.

Out of the box thinking starts with rationalize (delete/cleanup) and ends up with wanting to stash it in the cloud with a local cache. The data access pattern is quite predictable. Some areas are hot (new stuff), some are cold with predictable warming (rewatching an old show for example) and it's mostly write one read sometimes. You could say the data is fairly cold in nature. The idea of leveraging the cloud would be for one of the copies, most likely the online one, and then just have an offline second copy locally just in case.

Ages ago I looked at and played with S3QL storing data in S3 and GCS, at the time I had latency issues (cloud instance too) and the author didn't want to even consider supporting backblaze b2 (which was heaps cheaper) purely because they only had one datacenter (now they have 2). Now it looks like someone else has completed the b2 code but has yet to be merged. I'll have to look at it again - though I noticed the rpm's have fallen out of the repo due to neglect. I might have to see if I can clean that up.

Playing with this idea was good timing as PlexCloud happened, and it works quite well. The only gotcha is you have to store the data unmodified - raw and clear text in the cloud. Not encrypted and chunked. So that's a risk to be honest. It also is limited to the types of cloud, Dropbox/Drive vs object stores. So pricing is different and more likely to be closed due to excessive data stored rather than just charging more like the bucket storage does with utility pricing. Ignoring that, I tested with some test files hosted in google drive, and was very impressed with the plex cloud server streaming back down, and even via 4G to a phone. This could work, so will need further research.

I have already mentioned NetApp Altavault in a previous post. I'm quite happy with this product as long as I have Vmware running, except for the memory consumption for the 40TiB model. It's a commercial solution to the same problem that s3ql tries to solve - big file system to write stuff into, dedupe and encrypt it. It does work on backblaze B2 but not directly, so I had to use s3proxy to interface it. With this setup I had terrrible performance which I never got to the bottom of, it was either a threading issue or ISP throttling. Upstream was fine, just downstream was unusable.

For the second copy, I've considered using SnapRAID as it would work acceptably with my irregularly synced infrequently accessed second copy. No reason to not run it on centos too. This also solves the drive upgrade issue, as it doesn't require all drives to be the same size (parity drives need to be the largest, thats the only restriction). It would be possible to add a few cheap 10TB archive drives in for the parity drives and gain some capacity that way.

This is just part of the problem - it's a big part, but still just a part.

Youre part of the problem

Virtual Complexity Insanity

Over the years my environment has grown in leaps and bounds for various reasons. Many years ago everything just ran off one linux box. NAS, downloading and backups. Everything. Over time this has swelled up and is now beyond a joke.

There was at time when I ran a single esx host with passthrough PCIe card to an opensolaris VM for NAS, and a linux vm for everything else. Maybe a windows VM for the vsphere client and that was it.

Now I'm at a stage where two decent speced hosts are over loaded (always RAM) and a collection of supporting VM's are eating up a substantial amount of these resources. Part of this reason is to keep my skills current and ahead of my workplace - since I don't get adequate time to learn at work, and the environments available aren't suited to some of the experimenting that's really needed. Also anything cloud related is impossible due to network security and network performace.

However I have labbed up vmware vsan and learned a heap over the 13 months I've been running it - yeah it's been that long. 2 node ROBO deployment with witness appliance (on a third host). This has improved in leaps and bounds from the 6.1 vsan release I was on, up to the 6.6 I'm on today. It's not without issue of course. I've corrupted vmdk's and in at least one instance lost a vmdk entirely. I would NOT recommend running the 2 node ROBO design on a business site. But compared to a stand alone host it's probably still worth a go, but be aware of the limits and stick to the HCL and watch the patch releases closely - many have been for data corruption issues. Fortunately patching is simple with the vCenter server appliance (VCSA) now having update manager built in. For now though, the VSAN UI is entirely in the old flash UI, and not the new HTML5 UI. Vsphere 6.5 is a great improvement in every way on the versions before it.

I've also labbed up OnCommand Insight which is an amazing product. It's only issue is it's way too expensive. This product has a front end real time UI and back end data warehouse for scheduled or adhoc reports. I've only labbed up the front end, as it's great for identifying issues in the Vmware stack and just general poking around at where your resources have gone. For home though, the VM does eat heaps of resources - 24GB ram and 8 cores for the main server, and 24GB ram and 2 cores for the anomaly detection engine (I should see if I can lower that ram usage).

OCI vsan

vRealize Log Insight is similar to Splunk but free (ish) from vmware (depending on your licensing). This eats up lots of resources at home too - 20% cpu all the time (2 cores assigned). It's default sizing manages nearly 12 months of logs in it, which is way more than could ever need.

Other Netapp bits I labbed up is the Netapp simulator and associated bits - OnCommand unified manager and workflow automation. A handful more vm's there, and I've got 2 versions of the simulator too due to testing upgrading and compatibility. Just not running both at once except when I need to to test something specific.

NetApp Altavault is also one I've been playing with. This gives you a CIFS/NFS target locally with a cache and stashes it all in the cloud (s3 bucket style storage). For a while I was keen to utilise this for a cloud backup of my data, however the VM is pretty heavy (24GB ram, and minimum 2TiB cache disk (8TiB recommmended for the 40TiB model)) and the egress pricing out of the cloud is still higher than I'd like. Still, it's a great product and works fine.

At one stage I had labbed up Vmware NSX too, but due to some issues (which I now believe have been addressed) I had to remove it. Since then I haven't returned to have another go.

Obviously not all of this needs to run all the time, but in many ways it's less useful to have them not running constantly due to gaps in the data, or even the time necessary to start up the environment to test again before shutting it down. Or daily tasks within the tools which wouldn't run if it's not left running. Yeah yeah, another automation problem.

Automation Problem

Ok so far thats just a numbers game. Too many VM's, too much disk. Trying to do too much at once. Can't fault that logic.

The down side is this situation has occurred only because I had the capacity for it to. If I didn't have 2 decent ESX hosts or a few TB spare for VM's this would have never occurred. The ongoing challenge now is to rationalise the size of the VM's down to their minimum and keep things up to date (more pets again by the looks).

Or do I just toss it all in the bin and go back to basics in the interests of costs, time and overheads?

Dumpster Option

Pets vs Cattle vs complexity

So way back in November 2014 I started on an experiement around configuration management. It might have started earlier, but that's the first commit date in the git repo. Basically I was motivated (somehow) by the realisation that the pets vs cattle analogy worked really well to me. My handful of machines were more bespoke and unique (pets) than they could have been, and it would be a good idea to make them more throw away (cattle).

At the time I was already using pxe booting kickstart scripts - with a fairly complete base build coming out of the kickstart process. My media PC was entirely configured this way, and could be rebuilt - on demand - in about 15 minutes elapsed time. So if anything went bad with a package update, it was already cattle and not a pet. Other machines (desktop and vm's) were built with kickstart but less cattle and more pet like. So the pets vs cattle thing had room for improvement, and the other thing this methodology needed was a configuration management tool. Kickstart scripts was not it, as they didn't work for cloud where you build from a cloned image.

After some reading around and talking to people. Some people loved Puppet, others liked Chef, a newcomer (at the time) was Salt which was gaining some interest. All of these needed agents installed on the destination, and I think all needed a server (application) to drive it. This is ignoring one was written in Java, one in Ruby (and erlang) and one in Python. So they also needed their base language installed on the destination to function too. This meant to me, I couldn't escape the kickstart script completely as it would need some software installed beyond the minimum and the agent software too.

Then I found Ansible which Redhat was sponsoring and Fedora was using. Ansible only needed ssh on the destination to work - no agent at all. However it did benefit from having python on the destination for most of it's functionality.

The methodology of each of these tools varied a bit.

  • Puppet worked on a model approach and tries to make the destination realize the model. Scripts were written in a custom language and called plays.
  • Chef used the model idea too and applied the recipe to fit the target to the model. Recipies were in a custom ruby style language.
  • Salt I think was the same again, so I didn't look too closely.
  • Ansible was pretty much a top down script of custom modules. The modules (mostly) had checks so they can flag if they need to do anything, and track success - idempotent scripts was the key. Your stuff is written in yaml documents called plays and they are arranged into playbooks.

So I started off with Ansible and trying to translate my kickstart scripts into ansible roles and playbooks. Splitting out common bits which apply to all machines into a common role which even worked across software versions and distributions (various releases of centos and fedora, and later debian). Each system type then had several roles assigned which then apply the steps in the playbook in a top down fashion. My kickstart script shurnk to a totally minimal centos/fedora install which adds a user and ssh key only. From there ansible could connect and run the playbooks to turn a machine into any system type.

Early teething issues annoyed me, like not being able to have multiple things done in a task step. So you end up with heaps of tasks in a playbook, each doing one thing - the exception was anything that could be done repetitively from a list (so multiple calls to same module could be parameterised from a list/dict of items). Playbooks could be included and passed variables, so some high level automation was possible. Ultimately it was a very verbose way of doing things.

You end up having to do this

- name: setup privoxy
  lineinfile: dest=/etc/privoxy/config state=present regexp="^listen-address" line="listen-address {{ ansible_default_ipv4.address }}:8118"

- name: insert firewalld rule for privoxy
  firewalld: service=privoxy  permanent=yes state=enabled immediate=yes

- name: enable privoxy
  service: name=privoxy state=started enabled=yes

rather than what made more sense

- name: setup privoxy
  lineinfile: dest=/etc/privoxy/config state=present regexp="^listen-address" line="listen-address {{ ansible_default_ipv4.address }}:8118"
  firewalld: service=privoxy  permanent=yes state=enabled immediate=yes
  service: name=privoxy state=started enabled=yes

though there's a new keyword since 2.x "block" which I need to look at. It might let me do this.

As time went on, I think I started with ansible 1.6, I hit issues where modules lacked the one ability I needed, or changed in behaviours. Then other system things changed - yum to dnf, iptables to firewalld. These necessitated using conditions on tasks to check distribution or release version (which was easy, but meant doubling up of tasks, one for each way of doing it). It seemed ok, and I plodded on. Each release of ansible got better, 1.9 was good, 2.0 was a big improvement and now I'm on 2.3. Each iteration more modules have been added, issues have been fixed and it's got more powerful which is great.

I expanded my playbooks to include my omnios host server and the package repositories on there. I created a parameterised play which was given the release name and a tcp port, and it would create the source repo, populate it and start the service for it. Rerunning the playbook would update the repo. Happy days.

Ipxe worked really well. Simply include another playbook and the ipxe boot menu was updated. Change a variable for what Fedora release I wanted and it would download the pxeboot files (kernel+initrd) to the appropriate web server (ipxe rocks by booting from http) and update the menu. Easy. Except it wouldn't clean up the old files unless you wrote a task to do that - disk is cheap anyway.

Then I tried my router - a vyos VM. I had a templated config for this, so looked at applying a script by playbook. Some initial success spurred me on, but eventually I hit an issue with changes. The script just couldn't apply in an idempotent way. The router needed to delete ALL firewall rules and run the script, inside one transaction to handle deletes or changes. This meaned EVERY run would dump and reload the firewall, even if no change was present. So I stopped there and kept on elsewhere.

Ansible modules had changed over this time (2 years) so I could clean up some old hacks that were there. I'd marked them so they were easy to find. Firewalld now didn't need to reload the service, the change was immediate. Clean up here and there. Now I wasn't using "old" centos, so I could dump some old hacks I had present for centos6 now everything worked on centos7. This still left centos on yum and fedora on dnf for packages. The "unified" package module didn't exist yet.

More apps came along and it was easy to automate them. OnCommand Insight was easy to automate the install without interaction on centos. I even got the playbook to hit the API to install the license key.

Sounds like a great success. Except now I have a mess of playbooks written in yaml which need testing regularly to ensure upstream changes don't break them. Changes both in ansible modules and distribution packages. So I setup a good way to test them on vmware; clone a base image and apply the playbook, over and over. This way I didn't need to pxe boot the vm manually to test. I never got to the point that I felt comfortable that rerunning the playbook had no risk at damaging/trashing the proper machine, so testing was required. I'm not sure how close I got either, it might have been just one round of cleanup more and happy days, or it could have been heaps - I just didn't have any datapoints to draw a conclusion from.

And I still netboot and kickstart ESX host builds.

I'd failed. I'd automated my pets again.

Automatic cat feeder

Copyright © 2001-2017 Robert Harrison. Powered by hampsters on a wheel. RSS.