[2023 kcdc] DRYing out your GItLab Pipeline

Speaker: Lynn Owens

For more, see the table of contents.


Intro/Problem

  • Every gitlab project has own .gitlab-ci.yml file. Great for getting started
  • Quickly have hundreds of projects
  • Goal is to eliminate copy/paste by centralizing in a few projects

What NAIC has

  • 200+ projects maintained by 11 teams in 2 dev orgs
  • Pipeline is inner source
  • Version 6 of pipeline; working on version 7
  • Reduced maintenance burden by making change once and not in each project
  • Hosted directly on gitlab.com

Milestone 1 – Hidden jobs for pipeline project

  • GitLab has “hidden” jobs
  • Start with a period
  • Don’t appear in any pipeline; just for the common code
  • The “pipeline” project has a .gitlab-ci-base.yml which contains common code
  • Common code makes no assumptions about teams and is configurable for all known use cases
  • v1 was about two dozen lines of common code
  • The client projects include the pipeline code (can include in any part of gitlab so doesn’t need to be yours)
include:
   -project: 'NAIC/pipeline' 
   -file './gitlab-ci-base.yml'
  • Then added jobs that extended the hidden jobs to call functions in the base code. Where deploy_foo is in the base code
deploy_foo:
  stage: deploy
  extend: .deploy_s3
  variables:
   ...

Suggested practices

  • Advises against pinning the pipeline to a tag because don’t get bug fixes and everyone has to upgrade manually
  • Don’t include stages in the pipeline as it forces one opinion on everyone. Many groups had written a pipeline for their use case and not all same.

Milestone 2 – Profiles

  • Found a half dozen use cases. ex: Maven for Java, NPM building Angular etc.
  • The .gitlab-ci.yaml was a copy/paste of the others in the use case.
  • Made profiles/maven-java.yml and the like in the common profile
  • Profiles are not one size fits all because there are a bunch of different ones and can still use the milestone v1 approach.

Milestone 3 – Pipeline scripts

  • Common code like logging, calling rest apis, etc
  • Switched from bash scripting to python so had common code in modules and could unit test the modules

Options to get scripts

  • Could have the pipeline create a tar.zip and upload to a repo. This is a little slow
  • Could have a global before_script that does a git clone of peipleine-scripts. Uses a network connection
  • Could bake the scripts into an image. Requires a pipeline

If was doing again, wouldn’t create separate pipeline-scripts because tightly coupled to pipeline. Doesn’t change problem of using the scripts though.

Testing

  • If client projects are all using the default branch, small changes will affect them all.
  • Use a testing framework for script code (ex: python/go)
  • Follow development practices
  • Write a sample app for each profile. Have the common pipeline trigger a downstream pipeline on this project. For any merge to master, the downstream jobs must pass.
  • Before major refactors, inventory profile jobs and audit afterwards,

Milestone 4 – Profile Fragments

  • Had about 24 profiles (ex: maven-java-jar, maven-java-pom, maven-java-k8s, etc)
  • Typically three components – build tool, language, deployment method
  • These profiles had a lot of copy/paste
  • Decomposed into fragments – ex: maven, npm, java, angular, k8s, s3)

Selling the idea

  • Needed to convince people to use this pipeline instead of writing own or another team.
  • Offer flexibility
  • Show value
  • Follow semantic versioning to the T (he tags every merge to master of the pipeline even though encourages use of the default branch. the tags are good rollback points or if the project needs something older)
  • Changelog everything
  • Document well
  • Train and evangelize
  • Record training so have library

My take

This was a good case study and useful to see concrete examples and techniques. I wish we could see the code, but I understand that belongs to their org.

how not to migrate from subversion to git

You know how you typically read blog posts of what to do that works. And not all the things people tried that didn’t work. This post is dedicated to what didn’t work.

Also see:

Don’t do this #1 – Migrate from a remote repository

Migrating from SVN to Git requires a large number of network roundtrips (for a large repository.) This slows things down greatly. It’s better to export/dump the repository and run everything locally.

See the main blog post for how to create a local dump/rep

Don’t do this #2 – Split the dump by project

I had the idea to split the SVN full dump file into smaller SVN dump files by project. I chose to preserve revision numbers and not use “renumber-revs”. We used the revision numbers in our release notes. Here’s a sample command:

svndumpfilter include "IntegrationTests" --drop-empty-revs < full.dmp 
  > project_IntegrationTests.dmp

We had one project that consists of the majority of the SVN code base (the forum software.) All of the tags were for this project. I thought to import this one as “full.dmp” and just delete the “trunk” projects afterwards for this one. That way I’d only be filtering the smaller/safer ones.

None of this was necessary! You can just point migration at the same full SNV dump with different paths to migrate projects into their own repositories.

Don’t do this #3 – Check out the entire repository including tags

Migrating using “git svn clone” requires an authors.txt to map SVN users to GitHub names/emails. I had the idea to check out the entire repository including tags and running svn log on it to get the committers. After 90 minutes, I gave up on this idea.

Don’t do this #4 – Assume that all authors/committers are people

There were a couple commits from Jenkins which seems reasonable. There were also a couple commits as “root”, “test” and other random users. Looking at the readme.txt from one of those commits, it looks like a command line import.

Don’t do this #5 – Guess at what should be in the authors.txt file

We have about 90 users in our authors.txt file. I thought I would save time by only putting the people I thought were committers in the authors.txt. This was a problem for a few reasons:

  • About 30 people committed to the main project
  • A few people committed who no longer have access to the code base.
  • We had some “funky” committers including “root” and “test”

This meant I kept running the “git svn clone” command, having it fail on missing users, adding them to authors.txt and resuming the run (re-running automatically resumes).

It would have better to us svn log on trunk to get all the authors or the –authors-prog flag to specify a command to fill in any defaults. This would have let me write “Unknown” for the funky ones and be done with it.

Don’t do this #6 – Make assumptions about project structure

At the top level, the repository had:

  • about 20 projects (directly at the root level, not under trunk)
  • a branches directory
  • a tags directory

I foolishly assumed that meant that the 20 projects had the code directly inside them. And sometimes that was true. However, for about 5 projects, there was a nested trunk/branches/tags structure under that project.

We all know that thing about standards. There are so many….

Don’t do this #7 – Migrate 300 large tags

This project uses Ant (and not Ivy) so there are a lot of jar files in the repository. This means tags are large. With just under ten thousand commits and just under 400 tags, this proved to be just too much.

Watching the “git svn clone” procedure, it goes through commit 1-n as it goes. This means the later commits/tags need to go through a large amount of work to make progress. Despite that, it was surprisingly linear.

After 12 hours, it had migrated 2700 commits and after 26 hours, it was up to commit 5446. At the 18 hour mark, it was up to commit 6926. (At the 24 hour mark, I decided to abandon this approach. I let it run until I needed to shut down my computer to see what would happen.)

Most of the wasted time was for the tags. Which in SVN are a copy. In Git, they are just a label so this is a lot of unnecessary duplication in a migration.

See another approach for migrating tags

getting started with gitlab

CodeRanch is talking about moving to Git for our source code. Some of the moderators expressed a preference for GitLab. I’ve used GitHub, but not GitLab so decided to try it out with one of my personal projects. Looking at it, GitLab has built in CI so I want to see if i can switch to that and get off Jenkins for my pet project.

Signing up for GitLab

You can register with a new account for GitLab or use your credentials for other services including Google, Twitter, GitHub and BitBucket. It feels weird to me to sign in for a version control system with the credentials for another version control system so I created a new account.

While signing up, it automatically imported my gravatar. I had an old picture on there so fixed that. I then added some basic information to my GitLab profile

I set up an SSH key to make it easy to commit from my home computer:

  1. Settings
  2. SSH Keys
  3. Paste in key

I also set up two factor:

  1. Settings
  2. Account
  3. The very first option is “Enable two factor authentication”
  4. It uses Google Authenticator which is my first choice of two factor. They also supply backup codes.

Migrating a repository from GitHub

GitLab has a page about migrating from GitHub. The most important pre-requisite is to make sure each committer to the GitHub project has an account on GitLab. Conveniently, I am the only person who has ever committed to this project!

I then migrated in:

  1. Create a personal access token on GitHub just for this migration
  2. Click “Create a project”
  3. Choose “Import a project” tab
  4. Enter personal access token for github and choose “List your repositories”. Note that this lists both your personal repositories and all of these for GitHub organizations you have access to
  5. Click “Import” on the row next to the repository to migrate. (There’s also an “Import all” on top. I’m not looking to migrate all my repositories though!). Nothing appeared to happen for a minute. I must have missed a status warning. But then the page refreshed and had a “done” checkbox.
  6. Delete the personal access token from GitHub. I don’t like to leave extra access laying around enabled

I confirmed the same number of commits (including the latest), branches and tags are all there.

Jenkins integration

I’ve been running this job on Jenkins each night to check for changes. Since this is a public repository, accessing it for polling is easy and worked on the first shot. While I’d like to switch to GitLab CI, I’m going with incremental progress. GitLab has a good page on interacting with Jenkins.

I temporarily made this a private project to re-test. I confirmed that I could commit and that Jenkins failed to pull. Then I tried to set up Jenkins to be able to interact with project.

When using my own account, I can set up a token with read access to all my projects, but not a specific one. I think I’d have to create an extra account on GitLab for Jenkins if I wanted it to have only access to specific projects. Since this is just an experiment, I’ll use my own token for now.

Failing with the GitLab plugin

  1. Installed the GitLab plugin on Jenkins
  2. In GitLab, went to Settings > Access Tokens
  3. Created a token with read_repository permissions
  4. In Jenkins, manage > Configure System
  5. Add a GitLab Connection. I like that it uses Jenkins Credentials for securing the token
  6. Click “Test Connection”
  7. Using https://gitlab.com/boyarsky gives Client error: HTTP 401 Unauthorized
  8. Using https://gitlab.com/ gives Client error: HTTP 403 Forbidden

I noticed that the Jenkins GitLab plugin is not well supported. The primary committer wrote that he doesn’t use GitLab daily anymore and this affects his time spent on this project.

At this point, I gave up and just set up polling. I created a credential with the username boyarsky and the password as my personal access token. (That I set up while attempting to get GitLab working.) This worked on the first shot.

Now time to start looking at GitLab Ci…