clone a postgresql database for testing cleanly

I’m looking at writing integration tests for the back end of JavaRanch‘s JForum install.

A few “pesky” requirements/constraints

  • Multiple developers all over the word have their own local test databases filled with data in different states.  The tests must work for everyone.  Ideally they won’t leave data floating around either.
  • The tests must use PostgreSQL.  While the original JForum supported multiple databases, the JavaRanch version has been scaled down to just run with the one we need.  We do have some PostgreSQL specific SQL which rules out using an embedded database like HSQLDB or Derby.
  • Developers are using both Eclipse and IntelliJ.  Tests should care about the IDE anyway, so this isn’t a big constraint.
  • Developers are using a variety of operating systems and languages on their operating systems.  While code is in English, there can’t be assumptions as to the OS state.

Strategy

I think the best strategy is to create a second database just for testing.  The JForum database would remain untouched and a jforum_integration_test database can be created for the tests.  dbUnit can control the state of that special database.

The problem

Before I even start thinking about dbUnit, I did a proof of concept to ensure I could create a new database from scratch using the command line.  Creating a database is the easy part.  The “hard” part is that JForum doesn’t come with a schema.  It comes with an installation servlet that creates the schema.  While few people will be creating a schema for JForum, the technique I used applies elsewhere.

The procedure “before”

  1. Start up the JForum war
  2. Go to the JForum install URL and enter some information which creates the tables
  3. Run the JavaRanch customizations.

How to clone a database for which you only have a partial script

  1. Create an empty database
    createdb jforum_integration_test
  2. Arrive at the base schema
    1. Go the JForum installation URL
    2. Enter the information to create the tables
  3. Export the schema thus far
    pg_dump -U postgres jforum_integration_test > c:\temp\postgres.sql
  4. Provide instructions for the rest of the sql which were created by our developers.

How to import

Now for the easy part!

Importing this dump is a matter of a single command:

psql -U postgres jforum < "pathToWorkspace\JForum\javaranch-docs\deployment\file.ddl"

Lessons learned after

The next day I learned that this wasn’t enough.  We also needed some test data from the server.  I ran this a few times to get the relevant test data.

pg_dump --data-only --inserts -U user -W database --file roles  --table tableName

Conclusion

My next step will be to actually configure dbUnit against this new database and start writing tests.

javaranch & jforum – how we solved a threading issue

Don’t you just hate threading errors?  They are hard or impossible to reproduce.  It’s hard to figure out exactly what’s going on.  It’s hard to tell if you fixed it.

The problem

We’ve had a problem at JavaRanch for about a month where some posts mysteriously didn’t appear.  The problem was at its worst early in the business day in India pretty regularly.  When unfortunately pretty much everyone who worked on the system is sound asleep.  It did occur at other times as well.  We did come up with a workaround pretty quickly – reboot the server.  I know – not much of a solution, but it let operations continue in some fashion.

We had a few symptoms:

  • It often took 1-4 minutes to post.
  • The page refreshed without actually storing the post although the index page indicated the post was there.
  • Our e-mail server was acting up causing it to take hours to send out certain types of e-mails even after many retries.  (We didn’t realize this was a symptom until the problem was solved.  At the time, it was just another incredibly time consuming problem that needed addressing.)

What we did

We added metrics.  We monitored.  We did code reviews.

Attempt #1 – A week after the problem started occurring one moderator thought he found it.  There was an Executor class in JForum that added e-mails to send out to a queue.  If the queue was full it blocked causing the user to have to wait minutes for a post.  Which in turn tied up all the resources.  Eventually the database transaction would timeout waiting and give up.  Since the cache was already updated, it looked like the post partially went through and the database didn’t know about it.  So he added a call to Doug Lei’s concurrency library and deployed executor.discardOldestWhenBlocked();

The problem didn’t go away!  But we were so sure it was that.  Oh, no!  Now we have to find another root cause.  More metrics, more monitoring.  More trying to figure out what was going on with the mail server.  Another moderator started work on moving to a different mail server.

Attempt #2a -We moved to a different mail server.  It’s slower than the old one so still possible to back up the outgoing queue.  Just less likely.

Attempt #2b – A third moderator (me) wrote a standalone test against the executor to see if we could reproduce the problem.  We replaced the e-mail logic with Thread.sleep() for 10 seconds and then threw a bunch of threads at it.  Lo and behold the later ones all blocked.  At this point there were two possible solutions – upgrade to Java’s built in concurrency package  which doesn’t have this race condition – or call executor.discardWhenBlocked().  After consulting with Henry Wong of “Java Threads“, we decided to make the smallest possible change and call a different executor method.

Both attempts 2a and 2b were deployed over the same weekend so we don’t know for sure what fixed the problem.  We do know that 2a helped and 2b worked in simulated tests.  And most importantly, we know JavaRanch hasn’t lost posts or had long posting delays in a week.  That’s the important thing!

Lessons learned

  • Listing all the symptoms in one place helps draw connections.
  • Test a potential fix works in an isolated test case.  We were *so* close three weeks ago.
  • Having a threading expert on hand saves a lot of time in making sure a fix doesn’t have negative side effects.  Just in explaining to Henry we thought about the problem on a deeper level.

JForum 3 (in development) does not have this problem as it does not use Doug Lei’s concurrency library.  I will report the problem for JFOrum 2.1.8.  While it’s only a problem for large installation with e-mail or resource problems, it should be in their forums too.