creating a groovy project with gradle in eclipse

Last month, I went to a talk on gradle.  Today I decided to give it a shot.  My goal was to create a simple groovy project with gradle.  I did it in less than 30 minutes so getting started was fast.

Setup

I already had the Groovy Eclipse plugin.  I then installed the Gradle plugin from the Eclipse marketplace.  Yes, this could be done at the command line.  I’m used to M2Eclipse IDE integration so wanted the same for Gradle.  This step went as smoothly as any other plugin.

Creating a new gradle project

Just like Maven, the first step is to create a new Gradle project.  Since Groovy Quickstart wasn’t in the list, I choose Java quickstart.   The create request appeared to hang, being at 0% for over five minutes.  This was the first (and really only) problem.  I killed Eclipse and started over.  There was no point in doing that.  It just takes long. Apparently, this is a known issue. I tried again and after 5 minutes Gradle did download dependencies from the maven repository.

Making the Java project a Groovy project

Java quickstart does exactly what it sounds like.  It creates a project using the “Maven way” directory structure for Java.  To adapt this to a Groovy project, I:

  1. hand edited build.gradle to add
    apply plugin: 'groovy'
  2. hand edited build.gradle dependencies section to add
    groovy group: 'org.codehaus.groovy', name: 'groovy', version: '1.7.10'

    (I actually missed this step on the first try and got the error “error “you must assign a Groovy library to the ‘groovy’ configuration”.  The code was documented here.)

  3. created src/main/groovy and src/test/groovy directories
  4. gradle > refresh source folders.  This is like Maven where you need to refresh dependencies and the like to sync the Eclipse workspace.
  5. gradle > build > click build (compile and test)

Impressions of the gradle plugin

  1. I’ve mentioned a few times that it is very similar to the Maven plugin.  This is great as the motions feel very familiar and only the part that is new is gradle itself.  (Well that and refreshing my groovy knowledge – it’s been a while.)
  2. You can run your GroovyTestCase classes through Eclipse without Gradle (via run as junit test)
  3. My first build (with one class and one test class) including some downloading the internet took 1 minute and 2 seconds.  My second build took literally two seconds.
  4. I like the “up to date check” so only some targets get run.
  5. I like that you get an Eclipse pop-up if any unit tests fail.
 
This blog post also motivated me to start using my github account to make it easy to show the code.  In particular, the build.gradle file or the whole project. (This class doesn't require any programming so I think it is ok to put this online.  If Coursera complains, I will take it down.)

production problems across time zones

A couple days ago, I blogged about the technical details of a production problem (not caused by me) at coderanch.  Now that the problem is resolved, is an interesting time to reflect on how time zones helped us.

Peak volume at the ranch

While we have users from 219 countries, roughly half our volume is from the US and India combined.  (source google analytics)  I also learned that our “peak time” is midnight to 6am Mountain Standard Time followed by 6am to 3pm.  This would be business hours in Asia and Europe followed by Europe and North America.  Peak time is misleading because bots count as users for hits.

As an added bonus, peak time for search engines/bots is 5am to 7am Mountain Standard Time.  Yes, these overlap.

When the problem occurred

Lucky for us, we have a moderator in India (Jaikiran Pai) who was able to investigate the problem real time.  Which mean those of us in the United States woke up to an almost daily email saying that site went down and an attempted fix.

Fixes for other problems

It turned out there were a couple resource leaks in the code that Jaikiran found/fixed.  One had been in the code for over a year.  One was new (due to an API being converted to JPA and the caller not adapting the open session filter.)  One was a less than desirable transaction setting.  All of these manifested because of the new, bigger problem – but were not the cause.  This is a common problem in software – finding the RIGHT problem.

Converging on the right problem

Another advantage of having someone who could look at the problem real time was that he was able to capture the database logs real time.  Right before going to sleep, Jaikiran found two queries taking a long time to run.  And by a long time, I mean one was taking OVER A MINUTE under load.  Which he found by running:

select current_query,now() - pg_stat_activity.query_start as duration from pg_stat_activity order by duration desc

He posted the two queries.  One took 200K explain plan units.  At this point, we had something that could be fixed without witnessing the problem firsthand and sql tuning work moved back to the United States. One thing the *right* solution had that the others didn’t was that it explained everything.  All the other fixes made sense, but relied on a “magic” step to get from the problem to the solution.

Tuning the hack

I created a hack that would limit the # threads shown in a forum to get us through another day or two until the weekend.  It required tuning during the production problem time.  Back to India.

Conclusion

Communication across time zones only worked because of email.  (Normally, we’d have used the forums.  But the forums weren’t a very reliable place given that the problem was the forums going down.)  I’ve never been on a team at work more than 3 time zones away.  It was a great experience working with a strong developer half the world away.  And while we’ve been developing features together, it is what you do in times of difficulty that shows your process.  It was wonderful to see ours working.

And finally: GREAT JOB JAIKIRAN!!!

postgres tuning – fixing a production problem

After a new feature was implemented (not by me), coderanch started crashing almost every day in the middle of the night.  In a few days, I’ll be blogging about the troubleshooting process and how timezones helped.  This post focuses on the end game – once we knew the problem was being caused by a large query – which is when I started being involved.

The tuning process

Like all good SQL tuning, I ran explain and iterated.  As background the new feature was a join table with half a million rows.

Explain cost What changed Observations
210,184 n/a No wonder the site is crashing.  For a web page (the forum list), this is forever!  While the query plan is using an index, it is using the index to join a table with half a million rows to a table with millions of rows.
40,590 Removed an unnecessary subquery.  (It was unnecessary because the column in populates isn’t used.) The problem is that the query isn’t using the index for a where clause.  Which is causing joins on very large tables to get a small amount of data.  Another problem is that the query limits the # rows returned to one page worth but does it at the end prohibiting the database from saving work.
1,807 Hack – we really want to  query the post time from the join table.  Since it wasn’t on there and it was too much work to add during the week, I introduced a hack.  I sorted by post creation (post id) and limited the query to sorting the most recent 100 records for the rest of the query. While this is much faster, it is functionally incorrect.  If an older post is still under discussion, it didn’t appear in the post list.  So broken, but fast enough to get us to the weekend.
288 Added the latest post time as a stored field on the join table. Ah.  Done

Learnings about postgres – locks

I ran a really simple statement to add a column to a table:

alter table jforum_topics_forums add column last_post_time TIMESTAMP without time zone not null default now();

Luckily I was on the test server because I had to kill it after 5 minutes.  At first, I thought the problem was setting the field to a value since it had to go through all the records.  That wasn’t the problem though.  The problem was that postgres was waiting on a lock.

SELECT * FROM pg_stat_activity;

select * from pg_locks where pid= 4503 and granted='f';

Running the above SQL, showed me postgres was waiting on an exclusive lock.  After I shut down the forum, the alter statement ran almost instantaneously.  The actual stored procedure to populate the new field (based on another table) took a few minutes.  But that makes sense as it was a stored procedure doing half a million queries.

Testing with lots of data

Everything went fine on my machine. On the test server (which does have lots of data), I realized that I forgot to add the index that uses the new last post time column.  That was the entire point of this exercise!  And it goes to show how important it is to have production volumes of test data.