random complexity

with a side of crazy

Storage and or Cloud

I've been looking at mixing it all up - completely - post 1 of 2.

For a storage box I've been running OmniOS for a while now (previously opensolaris) however OmniOS is now at a critical junction in the community. The corporate sponsor has decided to no longer support it, which was to force the community to step up and participate more. They took a gamble on the project dying or regrouping and moving on; so far we're yet to see the outcome. There is a new group trying to continue on as OmniOSCE which I'll keep an eye on.

I have just upgraded to the latest (last) release of the original so will be ok for a while, and I build off a local repo anyway, it just means no more upstream updates until the fork takes off. However for the mean time, I've been looking at switching distros for it. I don't run any add ons, and just have a basic (ansible'd) setup applied to a pxeboot installed system. This does mean a few VM's - package server and test nas box VM, just to support the two physical boxes which are "production" in comparison.

Originally running opensolaris was to ensure I get the latest ZFS release possible, initially in a VM due to hardware compatibility, and then natively bare metal. Since then the ZFS landscape has changed a lot. OpenZFS is a thing now, zfs on linux is packaged for most if not all distros and considered stable. It's also running the current version that the other platforms run. FreeBSD also has the lastest too, which is in FreeNAS. So now I have a fair few potential NAS OS's to select from for my modest requirements (NFS and cifs is all I use now, but I have used luns in the past). Supporting one fewer bespoke distros would benefit my ansible problem - this is a big plus for running Centos for the NAS box.

One thing that hasn't changed, is the ZFS reshaping problem. If you want to expand your raidz2 stripe size you're still screwed with a dump and reload operation. I've solved this problem by having an offline second copy, so reshaping means update the mirror, destroy, create, and sync it back. Now however, the mirror is approaching the capacity limit where consuming any more space will destroy performance - it's nearly too full, so it will need reshaping. But that means new drives and the chassis is full, so it's a painful problem. This needs out of the box thinking. There are some tricks people use around smaller raid groups and inplace size upgrades but this consumes more parity disks, so actually makes the problem worse not better.

Out of the box thinking starts with rationalize (delete/cleanup) and ends up with wanting to stash it in the cloud with a local cache. The data access pattern is quite predictable. Some areas are hot (new stuff), some are cold with predictable warming (rewatching an old show for example) and it's mostly write one read sometimes. You could say the data is fairly cold in nature. The idea of leveraging the cloud would be for one of the copies, most likely the online one, and then just have an offline second copy locally just in case.

Ages ago I looked at and played with S3QL storing data in S3 and GCS, at the time I had latency issues (cloud instance too) and the author didn't want to even consider supporting backblaze b2 (which was heaps cheaper) purely because they only had one datacenter (now they have 2). Now it looks like someone else has completed the b2 code but has yet to be merged. I'll have to look at it again - though I noticed the rpm's have fallen out of the repo due to neglect. I might have to see if I can clean that up.

Playing with this idea was good timing as PlexCloud happened, and it works quite well. The only gotcha is you have to store the data unmodified - raw and clear text in the cloud. Not encrypted and chunked. So that's a risk to be honest. It also is limited to the types of cloud, Dropbox/Drive vs object stores. So pricing is different and more likely to be closed due to excessive data stored rather than just charging more like the bucket storage does with utility pricing. Ignoring that, I tested with some test files hosted in google drive, and was very impressed with the plex cloud server streaming back down, and even via 4G to a phone. This could work, so will need further research.

I have already mentioned NetApp Altavault in a previous post. I'm quite happy with this product as long as I have Vmware running, except for the memory consumption for the 40TiB model. It's a commercial solution to the same problem that s3ql tries to solve - big file system to write stuff into, dedupe and encrypt it. It does work on backblaze B2 but not directly, so I had to use s3proxy to interface it. With this setup I had terrrible performance which I never got to the bottom of, it was either a threading issue or ISP throttling. Upstream was fine, just downstream was unusable.

For the second copy, I've considered using SnapRAID as it would work acceptably with my irregularly synced infrequently accessed second copy. No reason to not run it on centos too. This also solves the drive upgrade issue, as it doesn't require all drives to be the same size (parity drives need to be the largest, thats the only restriction). It would be possible to add a few cheap 10TB archive drives in for the parity drives and gain some capacity that way.

This is just part of the problem - it's a big part, but still just a part.

Youre part of the problem


Tags: ,,,

Copyright © 2001-2017 Robert Harrison. Powered by hampsters on a wheel. RSS.