switching the coderanch build to gitlab

After having some “trouble” upgrading Jenkins on the CodeRanch server, we concluded it would be easier to switch to GitLab for the build than fix it. After all, we are already using GitLab SaaS (software as a service) for source control. While I’ve done GitLab pipelines before, this was my first time using Ant in one so it was interesting. Which means a blog post.

Why we use Ant and our custom deployment model

We have a few CodeRanch moderators who work on the forum software. (Less than 5 which is convenient as that’s how many people can be in a GitLab org for free. One of those moderators lives in a country with less than reliable internet. This means using Maven (or even Ant with Ivy) is a problem because it expects more internet than he may have available at a given moment.

Additionally, uploading large files is sometimes a problem so we don’t deploy a .war file. We instead deploy a loosefiles.zip file which contains the code but not all the dependencies. The dependencies are uploaded only on change.

I don’t recommend any company operate like this but it meets our needs. And since it is a hobby, also gives us fun technical challenges.

Fun fact: when I started working on the forum software (17 years ago) I had dialup internet. It was reliable, but I also benefited from the no uploading a war file sized artifact personally.

The main build part of the pipeline

Ant isn’t supported for Auto DevOps so didn’t consider that approach. The main part of the build was fairly straightforward:

image: eclipse-temurin:21

variables:
  FF_TIMESTAMPS: 1

ant-dist:
  stage: build
  before_script:
    - apt-get update && apt-get install -y ant
  script:
    - ant dist
  artifacts:
    paths:
      - qa/
      - dist/
    reports:
      junit: qa/reports/*.xml
    expire_in: 1 week
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH == "master"'

The pipeline uses a Java 21 image (last LTS as of when set up). I added FF_TIMESTAMPS so the log output tells me how long everything takes. (The free plan gives you a certain number of build minutes per month so this is important. Using this information we decided not to have the build create the deployment artifact (which minifies files and zips them up) as that took a bunch of time and the people who deploy always run that locally anyway.

The apt-get takes about 15 seconds to install Ant (I checked this because I would have included Ant in the repo if it was slow). Next comes actually running the dist target of the build which compiles, runs the JUnit tests and PMD for static analysis.

Next the pipeline makes the qa (build reports) and dist (binaries) available for browsing/downloading. It also publishes the JUnit output which allows the merge request and pipeline to conveniently show test data.

Finally, the triggers are merge requests and master..

Setting up semgrep

Since SAST is free on GitLab I set that up as well. The remainder of the pipeline is

# based on https://semgrep.dev/docs/semgrep-ci/sample-ci-configs#sample-gitlab-cicd-configuration-snippet
semgrep:
  # A Docker image with Semgrep installed.
  image: semgrep/semgrep
  # Run the "semgrep scan" command on the command line of the docker image.
  script: semgrep ci --config auto --include src --gitlab-sast --output=gl-sast-report.json --text-output=semgrep.txt --json-output=semgrep.json --sarif-output=semgrep.sarif || true
 
  variables:
    # Upload findings to GitLab SAST Dashboard:
    SEMGREP_GITLAB_JSON: "1"
 
  artifacts:
    paths:
        - semgrep.txt
        - semgrep.json
        - semgrep.sarif
    reports:
      sast: gl-sast-report.json
    expire_in: 1 week

  rules:
  # Scan changed files in MRs, (diff-aware scanning):
  - if: $CI_MERGE_REQUEST_IID

  # Scan mainline (default) branches and report all findings.
  - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  

While most of this code came from the sample, semgrep was far more interesting. You can publish to semgrep.dev and see the results in a nice UI. It says that is free for up to 10 committers. Cool we have less than that. However, when the project comes from GitLab, it requires a GitLab group token with admin access. I was less enthusiastic about that. But even then, I still couldn’t use it because GitLab free product doesn’t allow you to set up group access tokens.

You might be wondering why there are so many output formats. Free GitLab basically tells you if there are new findings, but not a visual display of the full report. And I’m not sure what the other developers will want to use so I provided everything. I plan to use SARIF. There are two free visualizers:

migrating coderanch from svn to git

CodeRanch has been using SVN for a long time for the forum software. It’s high time to switch. We have just over 20 projects in our one SVN repository. Most are small/inactive so it wouldn’t be terrible to have the history in only the biggest project. However, I wanted to try to do it “right” and migrate the history of each project into a separate git repository.

Also see:

Choosing GitLab as a provider

We chose to use GitLab instead of GitHub for a few reasons

  1. GitLab has free organizations (security grouping) for private repositories. It also allows multiple admins. (which you can’t do in a free github repo)
  2. GitLab has a built in continuous integration tool (we aren’t using that yet, but want to leave the option open.)
  3. A couple moderators use GitLab professionally and have had good experiences with it.


Getting a local dump of the SVN database

Migrating a remote repository involves a large number of network roundtrips. It’s far faster to export the SVN repository/database to a local dump file. Starting with SVN 1.7, this is a completely client side operation.

However, I’ve been using SVN through Eclipse so I didn’t actually have a SVN 1.7 command line installed. The first thing I did was install a command line of SVN 1.7:

brew install subversion

Then I got a dump of the whole repository. I used svnrdump so I didn’t need to sign on to the machine with the repo. I ran:

svnrdump dump '<url>' > full.dmp

We have just under ten thousands revisions so the dump took about 20 minutes. The full dump was just under 800MB. (A lot of this is duplication in tags. The GitLab repository is about a quarter the size.)

Create authors.txt

I decided to do it by hand. I looked at the conf/htaccess-projects.acl SVN file since I have admin access to the server. There were 90 users. I wound up migrating them all (which was a poor decision.)

This hit or miss approach works (slowly) because “git svn clone” complains if it encounters an author that isn’t defined. This let me go back and add that person without having to manually do a lot of analysis. Luckily, “git svn clone” does let you resume from where you left off.

Author: xxx not defined in authors.txt file

Use git-svn bridge

I used the git-svn bridge for the actual migration. This was only a few steps for the small projects that didn’t have branches or tags:

  1. Use the git-svn bridge to clone just the part of the repo we want: git svn clone <url>/svn/project/ -A authors.txt new-git-repo
  2. Add an origin on my machine: git remote add origin git@gitlab.com:coderanch/new-repo.git (fun fact, you don’t need to create the repository on gitlab. When you push, it gets created for you.)
  3. Push all: git push -u origin –all
  4. Push tags (we don’t have any for most projects so skipped this): git push –tags
  5. Repeat for all our projects. (We have about 20)

It turned out some of the projects in our repo have trunk/branches/tags inside. So I needed the command to use the standard layout (the -s flag):

git svn clone -s https://svn.javaranch.com/svn/project -A authors.txt new-repo

For branches

We hardly have any branches so I didn’t search for the optimal way. I used a three step procedure.

  1. I used the UNIX script from sailmaker to convert the SVN tags to Git tags.
    for branch in `git branch -r | grep "branches/" | sed 's/ branches\///'`; 
    do   
       git branch $branch refs/remotes/$branch 
    done
  2. Checked out each branch
    git checkout -b <branch_name>
  3. Sent it on to the remote repo: git push –all

This approach leaves stray remote branches. They are visible using git branch -a, but not in the GitLab UI. They seem to go away if you clone the repository to another directory so I’m thinking they are local branches and were never pushed to GitLab.

For the big repo with tags

We have 396 tags for the forum software and zero tags for all our other projects. I tried migrating our entire repository including tags using “git svn clone.” It took too long. 9K+ commits 800MB, 300+ tags is really slow. After running (against my local SVN) for over 24 hours, it was less than half done.

I realized the bulk of the time was going to migrating tags. So I did the “git svn clone” migration for just the trunk and branches. Then I dealt with the tags using a script.

First I created a local repo since this was the big one. This took about 20 minutes to import:

  • svnadmin create jforum-local-svn
  • svnadmin load jforum-local-svn < full.dmp

Then I did the clone:

git svn clone file:///<local repo> -A authors.txt 
  --trunk=JForum --branches=branches 
  jforum-from-local-dump-with-trunk-and-branches

Finally, I was ready to add gitlab as a remote. Again, I migrated the branches manually since there weren’t a lot.

Then I dealt with the tags. But that warrants a separate blog post.

Another possibility

There’s a number of tools called svn2git. The most promising one looks like this one per this post.

I didn’t try it because I was almost done at that point. Also, it requires you to build from source. Wasn’t worth the effort.

References

  • https://john.albin.net/git/convert-subversion-to-git
  • https://www.mugo.ca/Blog/Splitting-a-Subversion-repository-into-multiple-repositories
  • https://daneomatic.com/2010/11/01/svn-to-multiple-git-repos/
  • https://git-scm.com/docs/git-svn
  • https://www.getdonedone.com/converting-5-year-old-repository-subversion-git/ – uses master/trunk/branches structure – why want that?
  • https://github.com/nirvdrum/svn2git – uses git-svn bridge and does cleanup after. so presumably the same performance issue

migrating tags from a large coderanch repository from svn to git

To review, this repository has just under ten thousand commits and just under 400 tags. Migrating with “git svn clone” would have taken over 48 hours. Since the majority of the time was going to migrating the tags, I decided to migrate just the trunk/branches and then re-create the tags. After all they are just labels. This post describes the procedure.

Also see

Looking at the commit comments

Each migrated commit in git has a comment like:

git-svn-id: https://&amp;lt;url&amp;gt;/svn/projectName@8848 9a30da7b-550c-0410-b2c9-c7485123b453

This is good. It lets me map SVN commits (ex: 8848) to Git has (9a30da). This means it is possible to create the tags with a script.

Some “light googling” didn’t find such a script so I wrote my own.

Step 1 – Create a file containing all the SVN tag names to migrate

This is easy. Go to https://<svn url>/svn/tags/ in a browser and it gives you such a list. Just copy this to a file. My file had 396 lines in it. But copy/paste doesn’t charge per line so this was ok!

Step 2 – Map SVN tag names to SVN commit numbers and Git commit hashes

I wrote a script to create CSV of these pieces of informatio. I didn’t include creating the tags in that script so I could review the output in between. I spot checked and it looked reasonable. (the script took about 10 minutes to run against a remote SVN. But I had almost 400 tags so that’s not bad.)

The script is long so I included it at the bottom of the post.I put in a file named generateSvnGitMapping.sh

Step 3 – Actually create the Git tags

I ran another script I wrote to actually create the github tags. We had one SVN tag with a single quote in it. This tag was created in error. Since it was recreated with the proper name, I choose not to migrate it. I put this script in a file named generateTags.sh:

for line in `cat ../tagMapping-oneLine.txt`
do
echo $line
echo ${line} | awk -F',' '{ print git tag -a $1 -m"re-creating tag for SVN @$2 commit" $3 }'
done

Then I

  1. ran generateTags.sh > tags.sh
  2. Placed script in git repo directory: mv tags.sh new-git-repo
  3. cd new-git-repo
  4. Ran script ./tags.sh
  5. Delete script rm tags.sh
  6. Listed tags to ensure seemed reasonable: git tags -l

Step 4 – Push tags to GitLab (or GitHub)

git push --tags

The script for step 2

#!/bin/bash

# To run, pass two parameters:
# 1) The location of the local git repository
# 2) The file containing the tag names you'd like to check
# 3) The URL of the base SVN (not including /tags)
# ------------------------------------------------

# gets svn revision number for a svn tag
# $1 = svn url (ex: https://svn.com/svn)
# $2 = tag (ex: my_tag)
# sets variable svn_revision
set_svn_revision() {&amp;lt;br /&amp;gt;&amp;lt;br /&amp;gt;  # On Windows, this fails with E720232 write error: The pipe is being closed.&amp;lt;br /&amp;gt;  #svn_revision=`svn log "$1/tags/\"/ --limit 1 | &amp;lt;br /&amp;gt;  #     head -2 | tail -1  | awk '{print $1}'`&amp;lt;br /&amp;gt;
  # On Windows, works if don't use head in pipe&amp;lt;br /&amp;gt;  svn_log_output=`svn log "$1/tags/\"/ --limit 1`  &amp;lt;br /&amp;gt;  svn_revision=`echo "$svn_log_output" | head -2 | tail -1 | awk '{print $1}'`

   # remove leading "r" so just returns the number
   svn_revision=${svn_revision#r}
}

# ------------------------------------------------

# sets the git commit number for a svn revision
# if no commits are found (or more than one comment has this tag), exits the program
# $1 = svn commit (ex: r1234)
# $2 = tag name (ex: my_tag)
# sets variable git_commit
set_git_commit() {

  # since didn't migrate tags, need commit right before the tag
  # this might be a few commits back in some cases so check 5
  max_to_try=5
  num_attempts=0
  commit_right_before_tag=""
  previous_commit=$1

  while [ "$commit_right_before_tag" == "" ];
  do
    # add character before/after in git-svn log message to avoid ambiguity
    candidate_commit="@$previous_commit "

    num_lines=`git log --grep "$candidate_commit" | grep commit | wc -l`

    # if found a commit, use it. otherwise try some more
    if [ "$num_lines" -ne "0" ]; then
      commit_right_before_tag=$candidate_commit
    fi 

    num_attempts=$(($num_attempts+1))
    previous_commit=$(($previous_commit-1))

    # if  tried too many times, give up
    if [ "$num_attempts" -gt "$max_to_try" ]; then
      echo "commit ($1) for tag name ($2) not found. Aborting. Please check git repo or remove this tag from the input"
    exit
    fi 

  done

  git_commit=`git log --grep "$commit_right_before_tag" | head -1 | awk '{print $2}'`
}

# ------------------------------------------------

if [[ $# -ne 3 ]]
then
    echo "Requires three parameters:"
    echo "1) Path to git repo"
    echo "2) name of file containing tags"
    echo "3) url to svn"
    exit
fi

git_dir="$1"
tag_names_file="$2"
svn_url="$3"

all_tag_names=`cat $tag_names_file`

current_dir=`pwd`
cd "$git_dir"

for tag_name in `echo $all_tag_names`
do&amp;lt;br /&amp;gt;    # On Windows, need to strip carriage returns. (Not needed on Mac, but no harm)&amp;lt;br /&amp;gt;    tag_name=`echo $tag_name | tr -d '\r'`
    tag_name_without_trailing_slash=${tag_name%/}
    set_svn_revision "$svn_url" "$tag_name"
    set_git_commit "$svn_revision" "$tag_name"
    echo "git tag -a "$tag_name_without_trailing_slash" -m \"re-creating tag for SVN @$svn_revision commit\" $git_commit"
done

cd $current_dir