random complexity

with a side of crazy

Migrating to Git

I've been meaning to move from Subversion to Git for a while. Most people agree that Git is superior and now there are few if any Windows related compatibility issues. Windows compatibility for me is a must have because a large part of the code I write is developed on and targeted for Windows.

Historically I've used SVN with TortoiseSVN for the interface on Windows, and plain old svn command line too on Linux. My single repository also contained all of my projects, split out at the root by language; csharp, cpp, python, web and most recently solaris. This structure has served me well, as it allowed me to have shared libraries located out of a project directory in a common location. I also had some binaries in here for shared libraries of third party origin - log4net, mysql connector, NHibernate. Any project that also required files of binary only type were also present - icons, images and so on. Apart from these few exceptions I was not keeping binaries in the repository at all. I'd learned that from my CVS days.

I was running SVN on a remote VPS and working directly with that. This allowed me a common way to work on my own code from various locations as I could work directly with my repository. It was secured with username/password authentication and ran over SSL. This was backed up regularly to my main file server. So I had off site backup for the master repository which itself was off site to where I was working from.

When looking at Github it became clear that I needed to have all of my projects in separate repositories. This would make sense in the Git world because it appears you can't do a partial checkout. Under SVN it's possible to checkout a subdirectory of the repository and work on it like it's self contained. I used this when working with solaris code so I didn't need the whole repository checked out on that VM. It works fine. However under Git it won't.

Another obvious reason for reorganising my repository is the top level language directories. It made sense when most of my code was C++ or C# only as projects within there wouldn't likely use each other. I later added Web as a catch all for what remained of my web projects (primarily php4 based), and much later solaris was added to contain my NAS project (which is written in bash and python). It was separate to allow partial checkout. As time went on I also had other projects which broke this clear separation - my static blog code was written in C# but was ultimately a web project. Clearly the structure had outgrown it's purpose.

The end goal of the migration was to bring across my entire version history into multiple per project repositories. This way I could selectively publish a repository to the public if/when I wanted to. So the first step was to migrate into Git and then split out each project into it's own repository, with full history.

While looking for migration tools, I found many projects (most on github) named svn2git. Some were forks of one another and others were totally different. Several I tried didn't even compile so were quickly discarded. I ended up settling on a (sigh) ruby based one; svn2git.

After some initial failed attempts to get it to run on my VPS directly (private SSL cert issues and user auth) I spun up a Centos 6 VM at home, copied my SVN repo onto it and configured mod_dav_svn on there. From here I was able to work in isolation to fix up any issues before doing the real migration.

Ultimately the readme was correct, from a minimal Centos 6.2 installation with RPMForge configured I ran approximately the following. It's highly likely I did extra things not listed here.

    yum install git git-svn subversion mod_dav_svn ruby rubygems httpd elinks

I copied my repository on from a recent backup, into /data/svnrepo and then configured mod_dav_svn by adding the following to /etc/httpd/conf.d/subversion.conf

    <Location /svnrepo>
       DAV svn
       SVNPath /data/svnrepo

Started apache and browsed to the location to verify it was working.

    service httpd start

Don't forget to setup ssh key's for authentication with your remote site. This is so git won't prompt for a password when connecting to the remote site.

    ssh-copy-id -i ~/.ssh/id_rsa.pub remotehost

Now I was confident the SVN side of it was working and I was able to log into the remote site without a password. Now I was able to proceed with the Git side of it.

    gem install svn2git

I created the authors.txt file as suggested in the guide, into it's default location ~/.svn2git/authors

    robert = robert <email@here>

Then after some trial and error to get the settings working just right;

    mkdir ~/stagingrepo
    cd ~/stagingrepo
    svn2git --rootistrunk -v

This ran though and imported all of my revisions into a new git repository. From here I needed to split it out into new per project repoistories.

    mkdir ~/gits

To do this, I had to perform the following. Clone the repo and then use the filter-branch command to throw out everything except a subdirectory I choose, in this case a specific project. This will cause issues for projects that reference items out of their repository, so I have to be aware of that for later.

    mkdir ~/gits/project-x
    cd ~/gits/project-x
    git clone ~/stagingrepo .
    git filter-branch --subdirectory-filter csharp/project-x HEAD

And now the new remote server repository;

    remote$ mkdir /data/gits/project-x
    cd /data/gits/project-x
    git init --bare

    local$ git remote rm origin
    git remote add origin ssh://remotehost/data/gits/project-x
    git push --all

Now from my normal workstation I can simply clone the repo and continue working. Pushing to the remote site when ready.

    git clone ssh://remotehost/data/gits/project-x

The temporary repositories can be deleted as they aren't needed. That is the stagingrepo and the per project ones created to push up to the remote site. After the filter-branch step, that single project git repository is still as large a the original cloned one. I incorrectly thought running git gc on it would prune out the no longer referenced files but this is not the case. However the data that is pushed up to the remote bare repository only contains the files and history referenced. This is important to me because when I open source a project I don't want my other projects leaking out.

Once I'd figured out what I needed to do to achieve this, I was able to script up my list of projects to automate the migration.

Remote site script (to run in the directory hosting the git repositories, /data/gits/)

    GITS="csharp/project-a csharp/project-b solaris/project-c"

    for repopath in ${GITS}
        mkdir -p ${repo}
        cd ${repo}
        git --bare init
        cd ..

Local script for migration (to run in directory hosting the git temporary repositories);

    GITS="csharp/project-a csharp/project-b solaris/project-c"

    echo Performing first migration
    mkdir stagingrepo
    cd stagingrepo
    svn2git ${SVNBASE} --rootistrunk
    cd ..

    echo Now each project
    for repopath in ${GITS}
        echo "- ${repo}"
        mkdir ${repo}
        cd ${repo}

        git clone ../stagingrepo .
        git filter-branch --subdirectory-filter ${repopath} HEAD
        git remote rm origin
        git remote add origin ${GITBASE}/${repo}.git
        git push --all
        cd ..
        #rm -rf ${repo}
    #rm -rf stagingrepo

Then finally the script to clone these back down to the workstation

    GITS="csharp/project-a csharp/project-b solaris/project-c"

    for repopath in ${GITS}
        git clone ${GITBASE}/${repo}.git ${repo}

So there you have it. That's how I migrated from a combined SVN repository into 28 new Git repositories. As an added bonus, Git for windows comes with a bash shell, so I was able to use that script to clone everything down onto my windows pc.

Tomato is fruit

Tags: ,,,

Copyright © 2001-2016 Robert Harrison. Powered by hampsters on a wheel. RSS.