how not to migrate from subversion to git

You know how you typically read blog posts of what to do that works. And not all the things people tried that didn’t work. This post is dedicated to what didn’t work.

Also see:

Don’t do this #1 – Migrate from a remote repository

Migrating from SVN to Git requires a large number of network roundtrips (for a large repository.) This slows things down greatly. It’s better to export/dump the repository and run everything locally.

See the main blog post for how to create a local dump/rep

Don’t do this #2 – Split the dump by project

I had the idea to split the SVN full dump file into smaller SVN dump files by project. I chose to preserve revision numbers and not use “renumber-revs”. We used the revision numbers in our release notes. Here’s a sample command:

svndumpfilter include "IntegrationTests" --drop-empty-revs < full.dmp 
  > project_IntegrationTests.dmp

We had one project that consists of the majority of the SVN code base (the forum software.) All of the tags were for this project. I thought to import this one as “full.dmp” and just delete the “trunk” projects afterwards for this one. That way I’d only be filtering the smaller/safer ones.

None of this was necessary! You can just point migration at the same full SNV dump with different paths to migrate projects into their own repositories.

Don’t do this #3 – Check out the entire repository including tags

Migrating using “git svn clone” requires an authors.txt to map SVN users to GitHub names/emails. I had the idea to check out the entire repository including tags and running svn log on it to get the committers. After 90 minutes, I gave up on this idea.

Don’t do this #4 – Assume that all authors/committers are people

There were a couple commits from Jenkins which seems reasonable. There were also a couple commits as “root”, “test” and other random users. Looking at the readme.txt from one of those commits, it looks like a command line import.

Don’t do this #5 – Guess at what should be in the authors.txt file

We have about 90 users in our authors.txt file. I thought I would save time by only putting the people I thought were committers in the authors.txt. This was a problem for a few reasons:

  • About 30 people committed to the main project
  • A few people committed who no longer have access to the code base.
  • We had some “funky” committers including “root” and “test”

This meant I kept running the “git svn clone” command, having it fail on missing users, adding them to authors.txt and resuming the run (re-running automatically resumes).

It would have better to us svn log on trunk to get all the authors or the –authors-prog flag to specify a command to fill in any defaults. This would have let me write “Unknown” for the funky ones and be done with it.

Don’t do this #6 – Make assumptions about project structure

At the top level, the repository had:

  • about 20 projects (directly at the root level, not under trunk)
  • a branches directory
  • a tags directory

I foolishly assumed that meant that the 20 projects had the code directly inside them. And sometimes that was true. However, for about 5 projects, there was a nested trunk/branches/tags structure under that project.

We all know that thing about standards. There are so many….

Don’t do this #7 – Migrate 300 large tags

This project uses Ant (and not Ivy) so there are a lot of jar files in the repository. This means tags are large. With just under ten thousand commits and just under 400 tags, this proved to be just too much.

Watching the “git svn clone” procedure, it goes through commit 1-n as it goes. This means the later commits/tags need to go through a large amount of work to make progress. Despite that, it was surprisingly linear.

After 12 hours, it had migrated 2700 commits and after 26 hours, it was up to commit 5446. At the 18 hour mark, it was up to commit 6926. (At the 24 hour mark, I decided to abandon this approach. I let it run until I needed to shut down my computer to see what would happen.)

Most of the wasted time was for the tags. Which in SVN are a copy. In Git, they are just a label so this is a lot of unnecessary duplication in a migration.

See another approach for migrating tags

finding out when Oracle changes the certification objectives

As Scott and I noted in the introduction of our book, Oracle tends to fiddle with the duration, number of questions and passing score of their certification exams. They also fiddle with the exam objectives themselves on occasion. And as you might imagine, these aren’t well publicized.

First attempt

I originally thought that I would use the ChangeDetection service to subscribe to receive an email when the relevant certification pages changed. This turned out not to work as the HTML is always the same. Oracle uses AJAX to fill in the data. For example, the objectives page for the OCA 8 is rendered using the XML on this page. My next thought was that I’d use the ChangeDetection site to monitor the XML page. No such luck, they don’t support XML.

Writing something myself

A differences program isn’t hard to write, so I created my own in a public github repository. It stores a copy of the “current” data for each exam it follows and checks for differences. (I have it running as a Jenkins periodic build so it checks for updates once a day.) I considered using Travis CI since it supports Java/Maven. However, Travis doesn’t yet support periodic builds. There is a third party site that can trigger your builds for you, but CodeRanch has a perfectly good Jenkins server. And since one of the goals of this project is to be able to announce certification changes on a timely basis in the forums, it seems reasonable to run it there.

How do you find out if there is a change?

For significant changes, we tend to mention them in the relevant certification forum. For the OCA 8 exam, we will also list objective changes on the book page. You can also look at the text files for each exam on github. The last modified date shows the last change. You can also click on the file to see the history/diffs to see what changed and approximately cool. You could even use the ChangeDetection service on the github pages since they aren’t XML. For example, this page is for the OCA 8.

If there is an exam you’d find useful to be tracked and isn’t already, please add a comment on this blog post.

Technologies used

  1. Maven
  2. Selenium (a tiny bit)
  3. HTML Parser
  4. JUnit/Parameterized test

contrast security plugin for eclipse

I recently learned that Contrast Security has a free plugin that tests your application against the OWASP Top 10.  We’ve tried to fix these already. You can read about how we fixed Clickjacking, CSRF and XSS in JForum.

Installing

I started out by installing the Contrast plugin from the Eclipse Marketplace. After restarting Eclipse, a Contrast view automatically opens with instructions. It says to right click your server and choose “Start with Contrast.” Easy enough. I usually use Sysdeo so I can start the server in one click, but this is hardly onerous.

A Diversion: Fixing Tomcat Configuration

I got an error on startup. I then tried to start the server using the server view (without Contrast) and got the same NoSuchMethodError:

java.lang.NoSuchMethodError: sun.security.ec.NamedCurve.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/security/spec/EllipticCurve;Ljava/security/spec/ECPoint;Ljava/math/BigInteger;I)V

I fixed this by switching Tomcat 7 to use Java 7 instead of Java 8. (We aren’t using Java 8 yet for CodeRanch’s JForum software so this is fine.)

  • Workspace preference
  • Server
  • Runtime Environments
  • Click Tomcat and edit
  • Choose Java 7 as JRE

This had nothing to do with Contrast. I hadn’t encountered it because I was using Sysdeo to start Tomcat before this.

Actually testing

Now that the server starts up, I stopped it and restarted with Contrast. Then I clicked around the app a bit. (You can use Selenium tests or any other testing tool to automate this part.) The Contrast view starts to populate with its findings. I clicked around until I had about a dozen findings. They were:

Category Issue # Instances Details My analysis
Orange Insecure hash algorithms in XXX 3 Provides an explanation of what the problem is, why it might/might not be a problem along with the stack trace (showing how it is used) and the HTTP request/headers for the request(s) that triggered it. Two of the three findings refer to the exact same line of code. (Which was run on two different screens). The other appears to be in Tomcat itself. My configuration isn’t the same as the real server here. [The other two I need to look into further]
Yellow Anti-Caching Controls Missing in XXXX 6 Provides the HTTP request/headers, suggested remediation It’s annoying to have this reported on every page. Glad there is an :ignore this rule” option. We run a public website and want things to be cached. Client side caching makes the site faster for users and doesn’t leak information since 90% of our information is public to begin with. The only risk is if a moderator access the private forum on a public computer. We are technical users and know to clear data if this happens.
Yellow Forms without autocomplete prevention 3 Provides the HTTP request/headers, suggested remediation Again, we are a public site so not a big deal for browsers to retain information.
Warning CVE(s) in commons-httpclient-3-1.jar 1 Provides links to the two CVEs along with the manifest of the vulnerable library. I knew this from running Sonatype CLM Insight. The two CVEs are in functionality in the library that we don’t use. Still it is sweet to have this information available for free and with almost no effort. (Insight is a commercial project. We saw a one time result from the report.) I was concerned that information about the jars was being sent over the internet so I asked on Twitter. Jeff Williams replied that the CVE information is in a built in database updated via Eclipse Marketplace. Neat!

What to do with the results

When right clicking on any finding, you have four options:

  • Mark Resolved
  • Delete
  • Ignore (this instance) – useful for a false positive
  • Ignore rule – useful for a rule that doesn’t apply

My thoughts on the Contrast plugin

  • I like that the stack trace is included because it is easy to see context. I also like that lines belonging to the app is in blue in the stack trace.
  • It was very easy to use. And free. Which makes using it a no brainer.
  • While there aren’t false positives from unused code, there are false positives from context (which a tool can’t know).
  • Two of the rules triggered on a number of pages. (and would have triggered on a lot if I tested more)
  • While I don’t have a long list of things to follow up, it was a good thought exercise. And the reason I don’t have a long list is because we manually went through the OWASP top 10 in preparation for the “Iron Clad Java” promo recently. (so as not to have embarrassing issues pointed out)