Some quick Amazon Glacier numbers

Amazon Glacier. A very nice idea, a nice storage price (1 cent / GB month). Pricing up of restores is a bit more complicated but seems to work ok for large backups with small infrequent restores. Perhaps using it as a full DR target is a bit of a stretch due to the huge cost of restoring everything quickly.

As great as it sounds, I needed to get my head out of the cloud. So to bring it back to earth, how would it apply to me.

Hypothetically if you stored 10TB in it, that would be roughly $100/month in storage.

5% of that could be restored for free in any month, so 500GB. But that’s daily prorated, so 16.6GB/day for 30 days. If you exceed this you pay based on the maximum hourly transfer volume you achieved (less the hourly prorated free allowance (0.694GB in this example)) multiplied by the number of hours in the month (720 for 30 days) multiplied by the excess fee of $0.01/GB. Gulp.

Also each restore job is only available for 24hours, so that would be 30 restore jobs not just one. I didn’t see anything saying a limit to how many jobs can be done in a month or a per restore fee.

If you delete something within 3 months of uploading you pay a prorated fee per GB for deleting it. Best off leaving it there for 3 months because it’ll cost you the same either way.

Now for some real world more relevant figures. Australian ADSL2 is marketed as up to 24Mbps downstream (and is 1Mbps upstream unless you pay for Annex-M which is about 2Mbps upstream).

Glacier uploads are free (nearly), and can be multipart uploads to consolidate into one archive. Nearly free because transfer is free but requests are $0.05 per 1000.

With 1Mbps upload, say you can sustain 110KB/s (90% of theoretical max) for a whole 30 days, that would be nearly 272GB uploaded, which would cost $2.72/month to store. So to perform an initial seeding of an off site 10TB replica into Glacier it would take my DSL connection 36 months of continuous uploading. This would run up a Glacier storage bill of $1811 (y=0.5ax^2+0.5ax where a = monthly additional charge, and x is month).

On 2Mbps the duration to upload would halve (18 months) so the running costs don’t quite accrue as high, it comes to $930.

Now consider NBN with 40Mbit upload. Assuming the same 90% utilisation it should be good for 4394KB/s, or 362GB per day. Assuming Amazon can sustain that from an Aussie source IP. Now assuming the same 30 day months, that’s 10860GB in the first month, which would cost $108/month to store. Now that is a realistic baseline seeding duration.

However, internet quotas would still come into play. With 1TB plans available it would still take 10 months. 10 Months uploading $10/month storage addition per month, comes to $550 storage fees for the initial seed.

So bottom line, even if NBN comes to my house, I wouldn’t be able to backup to Glacier unless quotas increased dramatically (even temporarily).

Double facepalm

Disclaimer: my numbers might be off, probably because this whole post was knocked together in about 45 minutes. Record timing!


Migrating to Git

I’ve been meaning to move from Subversion to Git for a while. Most people agree that Git is superior and now there are few if any Windows related compatibility issues. Windows compatibility for me is a must have because a large part of the code I write is developed on and targeted for Windows.

Historically I’ve used SVN with TortoiseSVN for the interface on Windows, and plain old svn command line too on Linux. My single repository also contained all of my projects, split out at the root by language; csharp, cpp, python, web and most recently solaris. This structure has served me well, as it allowed me to have shared libraries located out of a project directory in a common location. I also had some binaries in here for shared libraries of third party origin - log4net, mysql connector, NHibernate. Any project that also required files of binary only type were also present - icons, images and so on. Apart from these few exceptions I was not keeping binaries in the repository at all. I’d learned that from my CVS days.

I was running SVN on a remote VPS and working directly with that. This allowed me a common way to work on my own code from various locations as I could work directly with my repository. It was secured with username/password authentication and ran over SSL. This was backed up regularly to my main file server. So I had off site backup for the master repository which itself was off site to where I was working from.

When looking at Github it became clear that I needed to have all of my projects in separate repositories. This would make sense in the Git world because it appears you can’t do a partial checkout. Under SVN it’s possible to checkout a subdirectory of the repository and work on it like it’s self contained. I used this when working with solaris code so I didn’t need the whole repository checked out on that VM. It works fine. However under Git it won’t.

Another obvious reason for reorganising my repository is the top level language directories. It made sense when most of my code was C++ or C# only as projects within there wouldn’t likely use each other. I later added Web as a catch all for what remained of my web projects (primarily php4 based), and much later solaris was added to contain my NAS project (which is written in bash and python). It was separate to allow partial checkout. As time went on I also had other projects which broke this clear separation - my static blog code was written in C# but was ultimately a web project. Clearly the structure had outgrown it’s purpose.

The end goal of the migration was to bring across my entire version history into multiple per project repositories. This way I could selectively publish a repository to the public if/when I wanted to. So the first step was to migrate into Git and then split out each project into it’s own repository, with full history.

While looking for migration tools, I found many projects (most on github) named svn2git. Some were forks of one another and others were totally different. Several I tried didn’t even compile so were quickly discarded. I ended up settling on a (sigh) ruby based one; svn2git.

After some initial failed attempts to get it to run on my VPS directly (private SSL cert issues and user auth) I spun up a Centos 6 VM at home, copied my SVN repo onto it and configured mod_dav_svn on there. From here I was able to work in isolation to fix up any issues before doing the real migration.

Ultimately the readme was correct, from a minimal Centos 6.2 installation with RPMForge configured I ran approximately the following. It’s highly likely I did extra things not listed here.

1
yum install git git-svn subversion mod_dav_svn ruby rubygems httpd elinks

I copied my repository on from a recent backup, into /data/svnrepo and then configured mod_dav_svn by adding the following to /etc/httpd/conf.d/subversion.conf

1
2
3
4
<Location /svnrepo>
   DAV svn
   SVNPath /data/svnrepo
</Location>

Started apache and browsed to the location to verify it was working.

1
2
service httpd start
elinks http://127.0.0.1/svnrepo/

Don’t forget to setup ssh key’s for authentication with your remote site. This is so git won’t prompt for a password when connecting to the remote site.

1
2
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub remotehost

Now I was confident the SVN side of it was working and I was able to log into the remote site without a password. Now I was able to proceed with the Git side of it.

1
gem install svn2git

I created the authors.txt file as suggested in the guide, into it’s default location ~/.svn2git/authors

1
robert = robert <email@here>

Then after some trial and error to get the settings working just right;

1
2
3
mkdir ~/stagingrepo
cd ~/stagingrepo
svn2git http://127.0.0.1/svnrepo/ --rootistrunk -v

This ran though and imported all of my revisions into a new git repository. From here I needed to split it out into new per project repoistories.

1
mkdir ~/gits

To do this, I had to perform the following. Clone the repo and then use the filter-branch command to throw out everything except a subdirectory I choose, in this case a specific project. This will cause issues for projects that reference items out of their repository, so I have to be aware of that for later.

1
2
3
4
mkdir ~/gits/project-x
cd ~/gits/project-x
git clone ~/stagingrepo .
git filter-branch --subdirectory-filter csharp/project-x HEAD

And now the new remote server repository;

1
2
3
4
5
6
7
remote$ mkdir /data/gits/project-x
cd /data/gits/project-x
git init --bare

local$ git remote rm origin
git remote add origin ssh://remotehost/data/gits/project-x
git push --all

Now from my normal workstation I can simply clone the repo and continue working. Pushing to the remote site when ready.

1
git clone ssh://remotehost/data/gits/project-x

The temporary repositories can be deleted as they aren’t needed. That is the stagingrepo and the per project ones created to push up to the remote site. After the filter-branch step, that single project git repository is still as large a the original cloned one. I incorrectly thought running git gc on it would prune out the no longer referenced files but this is not the case. However the data that is pushed up to the remote bare repository only contains the files and history referenced. This is important to me because when I open source a project I don’t want my other projects leaking out.

Once I’d figured out what I needed to do to achieve this, I was able to script up my list of projects to automate the migration.

Remote site script (to run in the directory hosting the git repositories, /data/gits/)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/bash
GITS="csharp/project-a csharp/project-b solaris/project-c"

for repopath in ${GITS}
do
    repo=${repopath##*/}.git
    mkdir -p ${repo}
    cd ${repo}
    git --bare init
    cd ..
done

Local script for migration (to run in directory hosting the git temporary repositories);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
GITS="csharp/project-a csharp/project-b solaris/project-c"
SVNBASE="http://127.0.0.1/svnrepo"
GITBASE="ssh://remotehost/data/gits"

echo Performing first migration
mkdir stagingrepo
cd stagingrepo
svn2git ${SVNBASE} --rootistrunk
cd ..

echo Now each project
for repopath in ${GITS}
do
    repo=${repopath##*/}
    echo "- ${repo}"
    mkdir ${repo}
    cd ${repo}

    git clone ../stagingrepo .
    git filter-branch --subdirectory-filter ${repopath} HEAD
    git remote rm origin
    git remote add origin ${GITBASE}/${repo}.git
    git push --all
    cd ..
    #rm -rf ${repo}
done
#rm -rf stagingrepo

Then finally the script to clone these back down to the workstation

1
2
3
4
5
6
7
8
9
#!/bin/bash
GITS="csharp/project-a csharp/project-b solaris/project-c"
GITBASE="ssh://remotehost/data/gits"

for repopath in ${GITS}
do
    repo=${repopath##*/}
    git clone ${GITBASE}/${repo}.git ${repo}
done

So there you have it. That’s how I migrated from a combined SVN repository into 28 new Git repositories. As an added bonus, Git for windows comes with a bash shell, so I was able to use that script to clone everything down onto my windows pc.

Tomato is fruit


Powered by hampsters on a wheel.
Built with Hugo
Theme Stack designed by Jimmy