migrating coderanch from svn to git

CodeRanch has been using SVN for a long time for the forum software. It’s high time to switch. We have just over 20 projects in our one SVN repository. Most are small/inactive so it wouldn’t be terrible to have the history in only the biggest project. However, I wanted to try to do it “right” and migrate the history of each project into a separate git repository.

Also see:

Choosing GitLab as a provider

We chose to use GitLab instead of GitHub for a few reasons

  1. GitLab has free organizations (security grouping) for private repositories. It also allows multiple admins. (which you can’t do in a free github repo)
  2. GitLab has a built in continuous integration tool (we aren’t using that yet, but want to leave the option open.)
  3. A couple moderators use GitLab professionally and have had good experiences with it.


Getting a local dump of the SVN database

Migrating a remote repository involves a large number of network roundtrips. It’s far faster to export the SVN repository/database to a local dump file. Starting with SVN 1.7, this is a completely client side operation.

However, I’ve been using SVN through Eclipse so I didn’t actually have a SVN 1.7 command line installed. The first thing I did was install a command line of SVN 1.7:

brew install subversion

Then I got a dump of the whole repository. I used svnrdump so I didn’t need to sign on to the machine with the repo. I ran:

svnrdump dump '<url>' > full.dmp

We have just under ten thousands revisions so the dump took about 20 minutes. The full dump was just under 800MB. (A lot of this is duplication in tags. The GitLab repository is about a quarter the size.)

Create authors.txt

I decided to do it by hand. I looked at the conf/htaccess-projects.acl SVN file since I have admin access to the server. There were 90 users. I wound up migrating them all (which was a poor decision.)

This hit or miss approach works (slowly) because “git svn clone” complains if it encounters an author that isn’t defined. This let me go back and add that person without having to manually do a lot of analysis. Luckily, “git svn clone” does let you resume from where you left off.

Author: xxx not defined in authors.txt file

Use git-svn bridge

I used the git-svn bridge for the actual migration. This was only a few steps for the small projects that didn’t have branches or tags:

  1. Use the git-svn bridge to clone just the part of the repo we want: git svn clone <url>/svn/project/ -A authors.txt new-git-repo
  2. Add an origin on my machine: git remote add origin git@gitlab.com:coderanch/new-repo.git (fun fact, you don’t need to create the repository on gitlab. When you push, it gets created for you.)
  3. Push all: git push -u origin –all
  4. Push tags (we don’t have any for most projects so skipped this): git push –tags
  5. Repeat for all our projects. (We have about 20)

It turned out some of the projects in our repo have trunk/branches/tags inside. So I needed the command to use the standard layout (the -s flag):

git svn clone -s https://svn.javaranch.com/svn/project -A authors.txt new-repo

For branches

We hardly have any branches so I didn’t search for the optimal way. I used a three step procedure.

  1. I used the UNIX script from sailmaker to convert the SVN tags to Git tags.
    for branch in `git branch -r | grep "branches/" | sed 's/ branches\///'`; 
    do   
       git branch $branch refs/remotes/$branch 
    done
  2. Checked out each branch
    git checkout -b <branch_name>
  3. Sent it on to the remote repo: git push –all

This approach leaves stray remote branches. They are visible using git branch -a, but not in the GitLab UI. They seem to go away if you clone the repository to another directory so I’m thinking they are local branches and were never pushed to GitLab.

For the big repo with tags

We have 396 tags for the forum software and zero tags for all our other projects. I tried migrating our entire repository including tags using “git svn clone.” It took too long. 9K+ commits 800MB, 300+ tags is really slow. After running (against my local SVN) for over 24 hours, it was less than half done.

I realized the bulk of the time was going to migrating tags. So I did the “git svn clone” migration for just the trunk and branches. Then I dealt with the tags using a script.

First I created a local repo since this was the big one. This took about 20 minutes to import:

  • svnadmin create jforum-local-svn
  • svnadmin load jforum-local-svn < full.dmp

Then I did the clone:

git svn clone file:///<local repo> -A authors.txt 
  --trunk=JForum --branches=branches 
  jforum-from-local-dump-with-trunk-and-branches

Finally, I was ready to add gitlab as a remote. Again, I migrated the branches manually since there weren’t a lot.

Then I dealt with the tags. But that warrants a separate blog post.

Another possibility

There’s a number of tools called svn2git. The most promising one looks like this one per this post.

I didn’t try it because I was almost done at that point. Also, it requires you to build from source. Wasn’t worth the effort.

References

  • https://john.albin.net/git/convert-subversion-to-git
  • https://www.mugo.ca/Blog/Splitting-a-Subversion-repository-into-multiple-repositories
  • https://daneomatic.com/2010/11/01/svn-to-multiple-git-repos/
  • https://git-scm.com/docs/git-svn
  • https://www.getdonedone.com/converting-5-year-old-repository-subversion-git/ – uses master/trunk/branches structure – why want that?
  • https://github.com/nirvdrum/svn2git – uses git-svn bridge and does cleanup after. so presumably the same performance issue

One thought on “migrating coderanch from svn to git

Leave a Reply

Your email address will not be published. Required fields are marked *