process builder to run multi-platform postgres import

We saw how to clone a postgresql database for testing cleanly in an earlier blog post. The next step is to do so from Java so it can be part of an integration test.

How to run each of 4 DDLs:
Running a DDL file is easy which is good since we have four of them.

importDdlOrSqlFile("initialJForumSchema.ddl");
importDdlOrSqlFile("javaranchCustom.ddl");
importDdlOrSqlFile("users.ddl");
importDdlOrSqlFile("databaseChangesAfterMostRecentInstall.txt");

Ok. So maybe it is easy because all the functional code is in another method.

What is Process Builder?

Process Builder was introduced in Java 5.  The Runtime.exec JavaDoc says

ProcessBuilder.start() is now the preferred way to start a process with a modified environment.

Since I need to set two system properties, this is perfect.

How to create the Process Builder

On Windows, you just pass the command to run at the DOS prompt.  The actual command was shown in the closing a postgresql database blog post.  On UNIX/Mac/etc, you need to create a new shell and pass the command that way.  Luckily, it is easy to check for a Windows operating system:


private ProcessBuilder createProcessBuilder(String command) {
String os = System.getProperty("os.name").toLowerCase();
if (os.contains("win")) {
return new ProcessBuilder(command);
}
return new ProcessBuilder("sh", "-c", command);
}

How to set system properties

I need to set the user id and password as operating system properties so postgresql doesn’t prompt for them interactively.  Using process builder, I can call environment() to get a map of environmental properties for my process.  Then setting them is simply a matter of setting them in the map.


private void importDdlOrSqlFile(String fileName) throws Exception {
String command = importToPostgresCommand(fileName);
System.out.println(command);
ProcessBuilder processBuilder = createProcessBuilder(command);
Map<String, String> env = processBuilder.environment();
env.put("PGUSER",   SystemGlobals.getValue(DATABASE_CONNECTION_USERNAME));
env.put("PGPASSWORD",  SystemGlobals.getValue(DATABASE_CONNECTION_PASSWORD));
processBuilder.redirectErrorStream(true);
runProcess(fileName, processBuilder);
}

How to run the Process Builder

All ready to run it!   start() kicks it off.  I think read all the output from the commands via the process builder’s input stream and output it to the console.  (I do this since many commands are run and the developer can see what it is up to.)  Finally, I check the error code.  After all, if the database didn’t get created properly, there isn’t much point in having a whole pile of failing tests.


private void runProcess(String fileName, ProcessBuilder processBuilder) throws IOException, InterruptedException {
Process proc = processBuilder.start();
Scanner scanner = new Scanner(proc.getInputStream());
while (scanner.hasNext()) {
String line = scanner.nextLine();
System.out.println("importing data: " + line);
}
int exitVal = proc.waitFor();
System.out.println("Completed loading " + fileName + " Exit value: " + exitVal);
if (exitVal != 0) {
System.out.println("Please fix error in database script and re-run.");
System.exit(1);
}

}

Testing

This code was tested on multiple developer’s machines including Windows and Mac.

clone a postgresql database for testing cleanly

I’m looking at writing integration tests for the back end of JavaRanch‘s JForum install.

A few “pesky” requirements/constraints

  • Multiple developers all over the word have their own local test databases filled with data in different states.  The tests must work for everyone.  Ideally they won’t leave data floating around either.
  • The tests must use PostgreSQL.  While the original JForum supported multiple databases, the JavaRanch version has been scaled down to just run with the one we need.  We do have some PostgreSQL specific SQL which rules out using an embedded database like HSQLDB or Derby.
  • Developers are using both Eclipse and IntelliJ.  Tests should care about the IDE anyway, so this isn’t a big constraint.
  • Developers are using a variety of operating systems and languages on their operating systems.  While code is in English, there can’t be assumptions as to the OS state.

Strategy

I think the best strategy is to create a second database just for testing.  The JForum database would remain untouched and a jforum_integration_test database can be created for the tests.  dbUnit can control the state of that special database.

The problem

Before I even start thinking about dbUnit, I did a proof of concept to ensure I could create a new database from scratch using the command line.  Creating a database is the easy part.  The “hard” part is that JForum doesn’t come with a schema.  It comes with an installation servlet that creates the schema.  While few people will be creating a schema for JForum, the technique I used applies elsewhere.

The procedure “before”

  1. Start up the JForum war
  2. Go to the JForum install URL and enter some information which creates the tables
  3. Run the JavaRanch customizations.

How to clone a database for which you only have a partial script

  1. Create an empty database
    createdb jforum_integration_test
  2. Arrive at the base schema
    1. Go the JForum installation URL
    2. Enter the information to create the tables
  3. Export the schema thus far
    pg_dump -U postgres jforum_integration_test > c:\temp\postgres.sql
  4. Provide instructions for the rest of the sql which were created by our developers.

How to import

Now for the easy part!

Importing this dump is a matter of a single command:

psql -U postgres jforum < "pathToWorkspace\JForum\javaranch-docs\deployment\file.ddl"

Lessons learned after

The next day I learned that this wasn’t enough.  We also needed some test data from the server.  I ran this a few times to get the relevant test data.

pg_dump --data-only --inserts -U user -W database --file roles  --table tableName

Conclusion

My next step will be to actually configure dbUnit against this new database and start writing tests.

postgresql explain

I had an opportunity to do some tuning on postgresql and was pleasantly surprised at how smoothly it went.

The first thing I did was try to run an “explain” on the query under discussion.  (Explain is a tabular or graphical view of the detailed steps the database uses to execute your query.  By knowing what path it will take and what tables/indexes it will look at, you can tune your query appropriately.) Knowing this works differently in different databases, I looked up what to do.  Here are the steps:

To run explain at the command line:
1) Type “explain” followed by your query.  For example “explain select * from table”.

That’s right – one step!

To run explain graphically:
1) Install pgadmin if you haven’t already
2) Type query into editor
3) Choose query –> explain
This shows the graphical view of the query.  Clicking on the data output tab shws the text view generated by the command line.

Now it may have changed since then, but I needed to create a separate table the last time I ran an explain in Oracle.  This was extra steps that I have to look up each time.  db2 had a good graphical explain built into the tool.

What surprised me here was that I figured out postgresql’s explain much faster than Oracle’s.  Namely because it was so simple!  For the command line version, there is only one step – and it’s not one I am likely to forget.

It’s always nice when software works in such an intutive manner.