random complexity

with a side of crazy

Site theme update

Ever since I rewrote the code behind this site to be all static (oh you didn't notice) I'd been meaning to update the theme.

Oh You!

Part of the original ploy was to swap out the backend code from wordpress and replace it with my own backend without altering the site appearance at all. This meant slicing up the shocking HTML code which came out of the theme I was using and adapting it to my code.

Fortunately I wasn't using too many features in wordpress and ultimately it had become a maintenance issue. To reduce risk (from software issues leading to a compromise/defacing) and service side resources (I was running mysql only for wordpress) I wanted to produce all the site HTML code from simple simple text files offline and sync them to the web server. Rather than generating on the server or on demand. Ultimately I selected markdown as the formatting for these files (with some minor metadata) and knocked up a tool in C# to iterate over my post files and produce the site (from a template not too unlike T4). This was then synced via rsync-over-ssh to the web server. I considered allowing dropbox to sync it but didn't in the end (yes that was a referral link).

Fast forward over 12 months, and I finally knocked up a new BootStrap based theme. The final motivation to do this was the file server rebuild post which will follow this one. In the previous theme it looked like crap (mainly bulletted list issues). My only issue with building a theme from scratch again is colours. However after some messing around and studying other sites which work I decided on these. Unless someone points out it's utter crap I think I'll leave it as is. To me the colours aren't too strong or in your face which is what I prefer, and it's not far off the default theme colours in Sublime Text which is the editor I built it in. I have to admit, CSS has come a very long way since I last tried to build a clean site with content/presentation separation and a framework like BootStrap gives you an excellent flexible foundation to build on. I'll be sure to use it in a future project (the nas one).

I see what you did there

Project week one wrap up

So recently I took a whole week off to work on personal projects. I started by creating a list of things I wanted to achieve in that week, mainly things I'd not quite got to in my normal downtime and then expanded from there. Despite trying to not let this list grow too big - for fear of building an insurmountable task list - I managed to keep it to two main projects and a handful of smaller separate tasks.

The two main projects were deliberately not related, with the idea of being able to spend alternate days on each, or simply swapping when I got stuck or simply needed a change.

Unfortunately I never quite got started on these two projects but did make major progress on nearly everything else. These included rebuilding my file server, regular exercise, rebuilding my desktop pc, catching up on recent TV and a laundry list of random odd jobs around the house.

The file server rebuild deserves a whole post so I'll keep that separate.

The desktop PC was simply an overdue thing. My previous main desktop died (north bridge failure) probably about 2 years ago. Since then I used an Asrock Atom330 as a desktop which then became my media pc and then an AMD based desktop (which began life as a bitcoin mining box). The AMD machine had to run windows due to the linux ati drivers being an epic pile of crap.

So to cut to the chase, I replaced the whole machine;

I think the thing I was most impressed with was at idle the machine uses 40W and at 100%CPU and 100%GPU (CUDA) it was pulling 220W. Oh and I'm back on linux again too which is nice.

The graphics card barely fit in too, which was funny.

Gigabyte GTX670 OC

That'll do for now. The file server rebuild post is going to be huge!

Some quick Amazon Glacier numbers

Amazon Glacier. A very nice idea, a nice storage price (1 cent / GB month). Pricing up of restores is a bit more complicated but seems to work ok for large backups with small infrequent restores. Perhaps using it as a full DR target is a bit of a stretch due to the huge cost of restoring everything quickly.

As great as it sounds, I needed to get my head out of the cloud. So to bring it back to earth, how would it apply to me.

Hypothetically if you stored 10TB in it, that would be roughly $100/month in storage.

5% of that could be restored for free in any month, so 500GB. But that's daily prorated, so 16.6GB/day for 30 days. If you exceed this you pay based on the maximum hourly transfer volume you achieved (less the hourly prorated free allowance (0.694GB in this example)) multiplied by the number of hours in the month (720 for 30 days) multiplied by the excess fee of $0.01/GB. Gulp.

Also each restore job is only available for 24hours, so that would be 30 restore jobs not just one. I didn't see anything saying a limit to how many jobs can be done in a month or a per restore fee.

If you delete something within 3 months of uploading you pay a prorated fee per GB for deleting it. Best off leaving it there for 3 months because it'll cost you the same either way.

Now for some real world more relevant figures. Australian ADSL2 is marketed as up to 24Mbps downstream (and is 1Mbps upstream unless you pay for Annex-M which is about 2Mbps upstream).

Glacier uploads are free (nearly), and can be multipart uploads to consolidate into one archive. Nearly free because transfer is free but requests are $0.05 per 1000.

With 1Mbps upload, say you can sustain 110KB/s (90% of theoretical max) for a whole 30 days, that would be nearly 272GB uploaded, which would cost $2.72/month to store. So to perform an initial seeding of an off site 10TB replica into Glacier it would take my DSL connection 36 months of continuous uploading. This would run up a Glacier storage bill of $1811 (y=0.5ax^2+0.5ax where a = monthly additional charge, and x is month).

On 2Mbps the duration to upload would halve (18 months) so the running costs don't quite accrue as high, it comes to $930.

Now consider NBN with 40Mbit upload. Assuming the same 90% utilisation it should be good for 4394KB/s, or 362GB per day. Assuming Amazon can sustain that from an Aussie source IP. Now assuming the same 30 day months, that's 10860GB in the first month, which would cost $108/month to store. Now that is a realistic baseline seeding duration.

However, internet quotas would still come into play. With 1TB plans available it would still take 10 months. 10 Months uploading $10/month storage addition per month, comes to $550 storage fees for the initial seed.

So bottom line, even if NBN comes to my house, I wouldn't be able to backup to Glacier unless quotas increased dramatically (even temporarily).

Double facepalm

Disclaimer: my numbers might be off, probably because this whole post was knocked together in about 45 minutes. Record timing!

Migrating to Git

I've been meaning to move from Subversion to Git for a while. Most people agree that Git is superior and now there are few if any Windows related compatibility issues. Windows compatibility for me is a must have because a large part of the code I write is developed on and targeted for Windows.

Historically I've used SVN with TortoiseSVN for the interface on Windows, and plain old svn command line too on Linux. My single repository also contained all of my projects, split out at the root by language; csharp, cpp, python, web and most recently solaris. This structure has served me well, as it allowed me to have shared libraries located out of a project directory in a common location. I also had some binaries in here for shared libraries of third party origin - log4net, mysql connector, NHibernate. Any project that also required files of binary only type were also present - icons, images and so on. Apart from these few exceptions I was not keeping binaries in the repository at all. I'd learned that from my CVS days.

I was running SVN on a remote VPS and working directly with that. This allowed me a common way to work on my own code from various locations as I could work directly with my repository. It was secured with username/password authentication and ran over SSL. This was backed up regularly to my main file server. So I had off site backup for the master repository which itself was off site to where I was working from.

When looking at Github it became clear that I needed to have all of my projects in separate repositories. This would make sense in the Git world because it appears you can't do a partial checkout. Under SVN it's possible to checkout a subdirectory of the repository and work on it like it's self contained. I used this when working with solaris code so I didn't need the whole repository checked out on that VM. It works fine. However under Git it won't.

Another obvious reason for reorganising my repository is the top level language directories. It made sense when most of my code was C++ or C# only as projects within there wouldn't likely use each other. I later added Web as a catch all for what remained of my web projects (primarily php4 based), and much later solaris was added to contain my NAS project (which is written in bash and python). It was separate to allow partial checkout. As time went on I also had other projects which broke this clear separation - my static blog code was written in C# but was ultimately a web project. Clearly the structure had outgrown it's purpose.

The end goal of the migration was to bring across my entire version history into multiple per project repositories. This way I could selectively publish a repository to the public if/when I wanted to. So the first step was to migrate into Git and then split out each project into it's own repository, with full history.

While looking for migration tools, I found many projects (most on github) named svn2git. Some were forks of one another and others were totally different. Several I tried didn't even compile so were quickly discarded. I ended up settling on a (sigh) ruby based one; svn2git.

After some initial failed attempts to get it to run on my VPS directly (private SSL cert issues and user auth) I spun up a Centos 6 VM at home, copied my SVN repo onto it and configured mod_dav_svn on there. From here I was able to work in isolation to fix up any issues before doing the real migration.

Ultimately the readme was correct, from a minimal Centos 6.2 installation with RPMForge configured I ran approximately the following. It's highly likely I did extra things not listed here.

    yum install git git-svn subversion mod_dav_svn ruby rubygems httpd elinks

I copied my repository on from a recent backup, into /data/svnrepo and then configured mod_dav_svn by adding the following to /etc/httpd/conf.d/subversion.conf

    <Location /svnrepo>
       DAV svn
       SVNPath /data/svnrepo
    </Location>

Started apache and browsed to the location to verify it was working.

    service httpd start
    elinks http://127.0.0.1/svnrepo/

Don't forget to setup ssh key's for authentication with your remote site. This is so git won't prompt for a password when connecting to the remote site.

    ssh-keygen
    ssh-copy-id -i ~/.ssh/id_rsa.pub remotehost

Now I was confident the SVN side of it was working and I was able to log into the remote site without a password. Now I was able to proceed with the Git side of it.

    gem install svn2git

I created the authors.txt file as suggested in the guide, into it's default location ~/.svn2git/authors

    robert = robert <email@here>

Then after some trial and error to get the settings working just right;

    mkdir ~/stagingrepo
    cd ~/stagingrepo
    svn2git http://127.0.0.1/svnrepo/ --rootistrunk -v

This ran though and imported all of my revisions into a new git repository. From here I needed to split it out into new per project repoistories.

    mkdir ~/gits

To do this, I had to perform the following. Clone the repo and then use the filter-branch command to throw out everything except a subdirectory I choose, in this case a specific project. This will cause issues for projects that reference items out of their repository, so I have to be aware of that for later.

    mkdir ~/gits/project-x
    cd ~/gits/project-x
    git clone ~/stagingrepo .
    git filter-branch --subdirectory-filter csharp/project-x HEAD

And now the new remote server repository;

    remote$ mkdir /data/gits/project-x
    cd /data/gits/project-x
    git init --bare

    local$ git remote rm origin
    git remote add origin ssh://remotehost/data/gits/project-x
    git push --all

Now from my normal workstation I can simply clone the repo and continue working. Pushing to the remote site when ready.

    git clone ssh://remotehost/data/gits/project-x

The temporary repositories can be deleted as they aren't needed. That is the stagingrepo and the per project ones created to push up to the remote site. After the filter-branch step, that single project git repository is still as large a the original cloned one. I incorrectly thought running git gc on it would prune out the no longer referenced files but this is not the case. However the data that is pushed up to the remote bare repository only contains the files and history referenced. This is important to me because when I open source a project I don't want my other projects leaking out.

Once I'd figured out what I needed to do to achieve this, I was able to script up my list of projects to automate the migration.

Remote site script (to run in the directory hosting the git repositories, /data/gits/)

    #!/bin/bash
    GITS="csharp/project-a csharp/project-b solaris/project-c"

    for repopath in ${GITS}
    do
        repo=${repopath##*/}.git
        mkdir -p ${repo}
        cd ${repo}
        git --bare init
        cd ..
    done

Local script for migration (to run in directory hosting the git temporary repositories);

    #!/bin/bash
    GITS="csharp/project-a csharp/project-b solaris/project-c"
    SVNBASE="http://127.0.0.1/svnrepo"
    GITBASE="ssh://remotehost/data/gits"

    echo Performing first migration
    mkdir stagingrepo
    cd stagingrepo
    svn2git ${SVNBASE} --rootistrunk
    cd ..

    echo Now each project
    for repopath in ${GITS}
    do
        repo=${repopath##*/}
        echo "- ${repo}"
        mkdir ${repo}
        cd ${repo}

        git clone ../stagingrepo .
        git filter-branch --subdirectory-filter ${repopath} HEAD
        git remote rm origin
        git remote add origin ${GITBASE}/${repo}.git
        git push --all
        cd ..
        #rm -rf ${repo}
    done
    #rm -rf stagingrepo

Then finally the script to clone these back down to the workstation

    #!/bin/bash
    GITS="csharp/project-a csharp/project-b solaris/project-c"
    GITBASE="ssh://remotehost/data/gits"

    for repopath in ${GITS}
    do
        repo=${repopath##*/}
        git clone ${GITBASE}/${repo}.git ${repo}
    done

So there you have it. That's how I migrated from a combined SVN repository into 28 new Git repositories. As an added bonus, Git for windows comes with a bash shell, so I was able to use that script to clone everything down onto my windows pc.

Tomato is fruit

Copyright © 2001-2016 Robert Harrison. Powered by hampsters on a wheel. RSS.