with a side of crazy
Ever since I rewrote the code behind this site to be all static (oh you didn't notice) I'd been meaning to update the theme.
Part of the original ploy was to swap out the backend code from wordpress and replace it with my own backend without altering the site appearance at all. This meant slicing up the shocking HTML code which came out of the theme I was using and adapting it to my code.
Fortunately I wasn't using too many features in wordpress and ultimately it had become a maintenance issue. To reduce risk (from software issues leading to a compromise/defacing) and service side resources (I was running mysql only for wordpress) I wanted to produce all the site HTML code from simple simple text files offline and sync them to the web server. Rather than generating on the server or on demand. Ultimately I selected markdown as the formatting for these files (with some minor metadata) and knocked up a tool in C# to iterate over my post files and produce the site (from a template not too unlike T4). This was then synced via rsync-over-ssh to the web server. I considered allowing dropbox to sync it but didn't in the end (yes that was a referral link).
Fast forward over 12 months, and I finally knocked up a new BootStrap based theme. The final motivation to do this was the file server rebuild post which will follow this one. In the previous theme it looked like crap (mainly bulletted list issues). My only issue with building a theme from scratch again is colours. However after some messing around and studying other sites which work I decided on these. Unless someone points out it's utter crap I think I'll leave it as is. To me the colours aren't too strong or in your face which is what I prefer, and it's not far off the default theme colours in Sublime Text which is the editor I built it in. I have to admit, CSS has come a very long way since I last tried to build a clean site with content/presentation separation and a framework like BootStrap gives you an excellent flexible foundation to build on. I'll be sure to use it in a future project (the nas one).
So the zfs experiment continues. Upon the release of b129 I set off into the unknown on a voyage of dedupe. Which at first had the promise of lower disk usage, faster IO speeds and a warm fuzzy feeling deep down that you only get from awesome ideas becoming reality. ahem
Most sources say you need more ram, and that is true, what they don't say is how much ram for what size data set, which might be more useful to home users like me. My boxes have 2gb of ram each, and that is not enough for dedupe, no way near. Not if you have a 6 TB of randomish data. I might retry when I get to 8gb ram but not before. You see, if it can't keep the whole of the dedupe table in ram ALL the time, any write to a dedupe enabled volume will result in reads for the rest of the table, or at least seeks. So what I saw was a gradual slowdown while writing to the volume, I was determined to let it finish, to see what savings I would make, and then scrap it due to performance, but after waiting 16 days for the copy, I cancelled it.
The only way I found to even see the contents/size of the dudupe table (DDT) is: zdb -DD
DDT-sha256-zap-duplicate: 416471 entries, size 402 on disk, 160 in core DDT-sha256-zap-unique: 47986855 entries, size 388 on disk, 170 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 45.8M 5.69T 5.66T 5.66T 45.8M 5.69T 5.66T 5.66T 2 394K 43.0G 40.3G 40.3G 821K 89.0G 83.0G 83.1G 4 9.90K 527M 397M 402M 47.0K 2.35G 1.76G 1.79G 8 2.06K 125M 82.4M 83.4M 21.1K 1.20G 795M 806M 16 391 13.7M 8.54M 8.76M 7.26K 272M 162M 166M 32 69 1.17M 776K 822K 3.08K 51.3M 32.7M 34.8M 64 17 522K 355K 368K 1.43K 36.9M 25.1M 26.2M 128 6 130K 7K 11.2K 1.07K 31.3M 1.50M 2.23M 256 2 1K 1K 2.48K 833 416K 416K 1.01M 512 4 2K 2K 4.47K 2.88K 1.44M 1.44M 3.32M 2K 1 512 512 1.24K 2.79K 1.39M 1.39M 3.46M Total 46.2M 5.73T 5.70T 5.70T 46.7M 5.78T 5.74T 5.74T dedup = 1.01, compress = 1.01, copies = 1.00, dedup * compress / copies = 1.01
Saving's of around 80gb with dedupe and compression (backup box so no real world performance requirement) is just not worth the need for 3-n times the ram and possibly an ssd for the l2arc cache to speed things up. Yep, the suggestion and observed behaviour was to hook up a cheap small (30gb) SSD for cache to accelerate it. I don't mind that so much for a primary but this is my backup/2nd copy box so it's not really ideal. Certainly not for 80gb of savings, or at current prices around $5 of disk.
My second attempt is now underway, this time I've sliced up my data sets into more volumes, and by more that means smaller average size, so this time around 2TB max per volume, which from experience at work I've learned is a good rule of thumb. So now I can enable compress+dedupe on only specific bits, hopefully where the most savings is to be made, and then the rest is just stored raw. This way the savings might be similar, but without the major write speed penalty. I've also realised for the production box if I want screaming performance, I'll throw an ssd on there, but that means more sata ports, which means a major change. I also need to work on power management too.
One thing that has gone right this time, is I'm now using CF->IDE adaptors and booting off that. This way the OS think's it's on a 2gb hdd, so booting doesn't have the complexity of usb boot and also uses less power and doesn't take up a sata port. Of course new boards don't have pata anymore so I might need to get a CF->sata one in future.
Another thing that must be said, Solaris's CIFS server is fast.