Live from JaveOne: JDBC Spec Enhancements

Live from the JavaOne conference again, this time attending a session about enhancements to the JDBC specification 4.1. Four members of the JDBC specification team have presented a number of useful enhancements due in the upcoming version of JDBC. Some of these features may not make it into 4.1, but will be in future versions of JDBC down the road.

1. Named Prepared Statements
As many of you may already know, Prepared Statements support parameters referenced by numeric index such as ps.setString(1,”val1″) where “1” corresponds to the 1st index of “?” in the SQL string. Perhaps rightfully so, a lot of new developers to JDBC often complain that JDBC should support named parameters as well as numeric indexes. Well those developers are in luck, future versions of JDBC may support syntax such as the following:

NamedPStatement nsmt = con.createNPStatement("SELECT f1, f2, f3 FROM foo 
     WHERE f1 = :f1 AND f2 < :f2");
nstmt.setParameter("f1","val1");
nstmt.setParameter("f2","val2");
ResultSet rs = nstmt.executeQuery();

A quick screenshot of the slide covering this information:

2. Chaining Commands
Using the previous example, the team intends to support chaining commands such that setParameter() returns a reference to the object allowing single-lined commands such as the following.

ResultSet rs = nstmt.setParameter("f1","val1").setParameter("f2","val2").executeQuery()

3. Changing schemas on the fly
For databases that support the notion of database schemas, you will be able to change your current schema dynamically after the connection is established, using the following command:

Connection connection = ...
connection.setSchema("mySchema")

For those databases that do not support such commands, this would likely be a no-op.

4. Timezone support
Better support for time zones are due in distant future versions after Java 1.7 adds a lot of new features related to them.

More information from their slide:

Why JDBC + JSP = Bad

Over years of moderating at The JavaRanch, I’ve seen one type of question spring up on a weekly basis: that asked by people who need help with JDBC code inside of Java Server Pages (JSPs). As much as we may want to help this individual fix their particular problem, the overriding thought of “STOP WHAT YOU’RE DOING” often prevents us from doing so. The purpose of this post is to explain why putting JDBC code inside a JSP file is akin to shooting yourself in the foot. With a shotgun. While not wearing shoes.

Don't use JDBC inside of JSP pages

1. You cannot reuse the code
First and foremost is the issue of code reusability. While importing java classes is quite common, importing code from a JSP is not. While you can write JSP functions, although I never recommend doing so for reasons I won’t get into now, you’re basically writing code that you cannot be used anywhere else, particularly in non-JSP java classes. The most common counter response to this is “Well, I don’t need to use it anywhere else”. Yes, you do. Whether its just reusing code for making the connection to database or the code for performing a query and reading the results, it will be used again at some point in the future in a way you have not thought of yet. Unless you are being paid by the line and prefer this sort of thing, it’s a bad move, and I guarantee your code base will be much larger than someone who put all their JDBC code into normal Java classes. Larger code base means more difficulty to maintain and more headaches for you down the road.

2. You are mixing business logic with the presentation layer
Probably the most overlooked issue for inexperienced developers is the fact that you’re mixing the business/data layers with the presentation layer. I’ll put it another way, if your boss comes in one morning and says we’re throwing out the JSP front end and replacing it with a web service, Java Swing, Flash, or some other interface, there is virtually no way for you to reuse the database code without going through every line of every JSP file by hand. If the database code had been placed in plain java files, then you would have a path for packaging the JDBC code into a single JAR and making it available as a service to a different front-end client such as a web service, Flash, etc.

In enterprise development, the presentation JSP layer and the database are often separated by multiple layers of indirection such as described by the commonly used three-tier architecture pattern. Those who are just starting out programming often do not know why mixing these layers is bad, but I promise you if you stay with software development you’ll understand one day.

3. But it’s just this once!
Often times, JDBC code enters JSPs by developer lying to themselves saying “Well, it’s just this once” or “I just need to test something”. Instead of being removed when the developer is done ‘testing’, the code often persists for a long time afterward. Furthermore, putting your JDBC code inside of reusable Java classes makes testing go faster! Spending 10 minutes setting up a single reusable Java JDBC class will save you hours down the road. Then, if you want to test more than one JSP page with JDBC logic, you already have your Java class file to start with. Proponents of test-driven development tend to understand this better than anyone.

4. It’s really hard to maintain
Code maintenance is another topic that new developers do not fully appreciate since they have not spent years maintaining the same code base. Unless you write the most beautiful JDBC code imaginable, its very difficult to read through huge JSP files looking for bugs and/or making enhancements. It’s a lot easier if all the JDBC access is restricted to a set of files much smaller in size than the JSP code base.

5. It’s a really bad practice
If after reading this article you still do not fully understand why you should not put JDBC code inside of JSPs, let me simplify the issue by saying “Just Don’t Do It”. Whether or not developers understand the reasons against doing so is not as important as stopping them from doing so in the first place. In short, you create code someone else (possibly yourself) will have the misfortune of maintaining down the road.

production is slow – tuning jforum for javaranch – part 2

A moderator reported that moving topics took a long time in production and our sandbox.  I just tried it and got 70 seconds through the GUI!  It didn’t used to take this long.  I’m comparing our development sandbox a few months ago to our development sandbox today.  Clearly something has changed.  The question is what!

I heard a couple people liked reading about the thought process for troubleshooting slow queries. If you aren’t familiar with what a database explain is used for, see that entry before reading this one.

This time the path to success was a little more meandering.  I’ve left the dead ends in here so you get a feel for the actual thought process and not a sanitized version.  if you are a student, this is a decent feel for what problems look like in the “real” world.

  1. First step – find piece of code in JForum that handles moving threads.
  2. It’s GenericForumDAO.moveTopics.  I see this method was modified by us (JavaRanch) to support leaving a copy of the thread in the original forum or creating a new copy and locking the original (for use in moving to our private “Trash Can” forum.)  We made that change a long time ago though and it wasn’t slow then.
  3. Looking at the method, I see a number of database queries.  I’m adding logging (nothing fancy – just System.out’s) to see which query is using up so much time.
  4. The method I added logging to takes 5-8 milliseconds.  (across two trials).  Yet the browser plugs away for almost a full minute after that.
  5. Ah!  The web/action layer class ModerationHelper makes a bunch of calls to clear/reload the cache.  I bet that’s where all the time is going.  Adding logging on that level and redeploying.
  6. Now we are getting somewhere.  Everything goes very quickly until the part where the hotest topics get reloaded and then that takes almost a minute.
  7. Yuck.  Table scans left and right.  No wonder it takes so long.  Three tables scans, joining on the result and THEN discarding almost all the records.  (see “original query” and “original explain” in supplemental materials section below)
  8. If I was doing this by brute force, I would add an index on topic_views and refactor the query to search on the the topics table only joining with the others on result.  This is similar to the last post where I walked through an optimization.  However, I’m not sure that makes sense here.
  9. This definition of hottest posts uses the most page views forever.  This is ok for now because all the migrated posts started with zero views against them.  Making the hottest threads really “the hottest threads since our Saturday January 3rd migration.”  What happens down the road.  When a thread stops being viewed altogether, it’s not hot anymore.  I’m thinking it would be better to put a time based constraint on things.
  10. I’ll post in our Moderators Only forum to get some feedback on my proposed definition change and go have dinner.
  11. Unanimous agreement on my proposed definition from everyone who saw the question while I was eating.  Including the site owner.  Good news.
  12. Modify the query to add a where clause by thread last modified date
  13. This modified query takes 80 thousand units instead of 253 thousand units.  Better but still way too high.
  14. Next I’ll try adding index on topic_last_post_id, topic_views
  15. Running explain again gives the exact same output.  It isn’t using the index.
  16. Dropping the index since it didn’t work.  I may re-add it later but I like to test only one variable at a time.
  17. The database seems to be favoring the table scan of the forum table first.  Probably because this table is order of magnitudes smaller than the other two in the join.
  18. Now I’ll try removing the forum table from the query and run an explain to see what happens.  It’s still 78 thousand units, but at least it is using an index (post_time) as the first step.
  19. Time to try to add the index and run explain again.  Nothing.  Still 78 thousand units.
  20. All right.  This tweaking thing isn’t getting me anywhere.  New approach.  How would I build this query if I were writing it from scratch.
  21. The base query that I want to happen first is explain select p.post_id from jforum_posts p WHERE p.post_time >= ‘2008-01-01’.  Running an explain on just this says it takes 1 thousand cost units.  That sounds high – I’ll query some details.  Over 100 thousand posts in a few days?  Oh yeah!  It’s 2009 – a New Year.  I should know that – I celebrated recently.
  22. Re-running the original query with the post_time clause gives me under two thousand cost units.  Whew.  That’s much better.
  23. Now I’m uploading the changed query to our sandbox environment to see how the move performs.  Three to five seconds.  Good.
  24. Another moderator suggested using replies and not views.  This is a good idea, but has no effect on performance so I switched it. (see “Final query” in supplemental materials)
  25. And a good techie verifies everything so running explain one last time to confirm this last change didn’t affect performance.  It didn’t.  Performance is still ok. ( see “Final explain” in supplemental materials)

Summary

Two orders of magnitude performance improvement on the database query and a web action that isn’t painful to wait for.  I’m happy with that.  Not only is it faster, but we uncovered a better business definition for the query.

I also learned the last time we tested a move on our sandbox server was before we tested migrating large quantities of data.  Making it an invalid test.

Supplemental materials

Original query

SELECT t.*, p.user_id AS last_user_id, p.post_time, p.attach AS attach, f.forum_name \
FROM jforum_topics t, jforum_posts p, jforum_forums f \
WHERE p.post_id = t.topic_last_post_id \
AND p.need_moderate = 0 \
and p.forum_id = f.forum_id \
ORDER BY topic_views DESC \
LIMIT ?

Original explain

Row #         QUERY PLAN
1       Limit (cost=253457.29..253457.39 rows=40 width=132)
2       -> Sort (cost=253457.29..254467.05 rows=403905 width=132)
3       Sort Key: t.topic_views
4       -> Hash Join (cost=26807.80..136006.35 rows=403905 width=132)
5       Hash Cond: (“outer”.forum_id = “inner”.forum_id)
6       -> Hash Join (cost=26707.81..129847.79 rows=403905 width=113)
7       Hash Cond: (“outer”.post_id = “inner”.topic_last_post_id)
8       -> Seq Scan on jforum_posts p (cost=0.00..50770.74 rows=1775019 width=24)
9       Filter: (need_moderate = 0)
10       -> Hash (cost=19386.05..19386.05 rows=403905 width=93)
11       -> Seq Scan on jforum_topics t (cost=0.00..19386.05 rows=403905 width=93)
12       -> Hash (cost=99.79..99.79 rows=79 width=27)
13       -> Seq Scan on jforum_forums f (cost=0.00..99.79 rows=79 width=27)

Final query

SELECT t.*, p.user_id AS last_user_id, p.post_time, p.attach AS attach, f.forum_name \
FROM jforum_topics t, jforum_posts p, jforum_forums f \
WHERE p.post_time &gt;= ? \
and p.post_id = t.topic_last_post_id \
AND p.need_moderate = 0 \
and p.forum_id = f.forum_id \
ORDER BY topic_replies DESC \
LIMIT ?

Final explain:

Row #         QUERY PLAN
1       Limit (cost=1774.18..1774.23 rows=20 width=132)
2       -> Sort (cost=1774.18..1774.28 rows=40 width=132)
3       Sort Key: t.topic_replies
4       -> Hash Join (cost=99.99..1773.12 rows=40 width=132)
5       Hash Cond: (“outer”.forum_id = “inner”.forum_id)
6       -> Nested Loop (cost=0.00..1672.53 rows=40 width=113)
7       -> Index Scan using idx_posts_time on jforum_posts p (cost=0.00..678.22 rows=177 width=24)
8       Index Cond: (post_time >= ‘2009-01-05 00:00:00’::timestamp without time zone)
9       Filter: (need_moderate = 0)
10       -> Index Scan using idx_topics_lp on jforum_topics t (cost=0.00..5.61 rows=1 width=93)
11       Index Cond: (“outer”.post_id = t.topic_last_post_id)
12       -> Hash (cost=99.79..99.79 rows=79 width=27)
13       -> Seq Scan on jforum_forums f (cost=0.00..99.79 rows=79 width=27)