tuning jforum for javaranch

Main menu:

Topics

Recent Posts

Blog

December 2008
M T W T F S S
« Nov   Jan »
1234567
891011121314
15161718192021
22232425262728
293031  

Past Posts

Java/Java EE

JDBC

Other

tuning jforum for javaranch

December 23rd, 2008 by Jeanne Boyarsky

JavaRanch has been updating our forum software to an extension of JForum.  While testing, Ulf Dittmer noticed that the RSS feeds were incredibly slow – to the point of timing out.  He found that the database query itself was taking so long that the browser gave up.

JavaRanch Moose
Some of the queries built into JForum are written in a way that is fine for forums with a relatively small number of posts – even half a million posts total.  JavaRanch is much larger than that.  We imagine that we are the largest forum to use JForum.  (We have over 400 thousand threads/topics and over 1.7 million posts.)

I enjoy tuning queries, so I started looking at it.  When someone asked me about what I did, I thought it would be nice to blog about my thought process.  I’ve included the before/after queries and execution plans at the bottom so as not to break up the thought process.

1. Run original query at command line in our test environment (which is slightly smaller than the real data set, but close to realistic.)
2. Give up waiting after 5 minutes.
3. Run explain on the original query.
4. Cringe at the heap scan with a relative cost of 16628 – and worse within a nested loop.
5. Wonder why the query needs to do so much processing to get a limited number of rows.  The execution plan shows all that joining occurring on the entire table rather than just the 50 or so rows we care about (via the limit.)
6. Refactor the query to use a subquery to avoid this.  I had used this technique on another similar query the day before and gotten the query down to under 1000 cost units.
7. Run explain again.
8. See cost is 3807.  This is better, but there’s no logical reason it should be four times the cost of the query from yesterday.  Better keep tuning.
9. Hmm.  It’s using the index idx_topics_fp which is on topic_first_post_id.  We have almost 100 forums.  Wouldn’t it be more efficient to include the forum_id in the index.
10. Added index idx_topics_fp_forum on both topic_first_post_id and forum_id.
11. Run explain again.
12. See cost of 978.  Good.  This is about what I got on yesterday’s query.  I guess the built in indexes better matched what I needed on yesterday’s query.
13. Go back to GUI and try URL.  Takes a couple seconds.
14. Commit my changes.  I got a two order of magnitude increase on the query.  Very satisfying.


Original query:

PostModel.selectLatestByForumForRSS = SELECT p.topic_id, p.topic_id, p.post_id, p.forum_id, pt.post_subject AS subject, pt.post_text, p.post_time, p.user_id, u.username, u.user_first_name, u.user_last_name
FROM  jforum_topics t, jforum_posts p, jforum_posts_text pt, jforum_users u
WHERE p.post_id = t.topic_first_post_id
AND p.topic_id = t.topic_id
AND p.user_id = u.user_id
AND p.post_id = pt.post_id
AND p.need_moderate = 0
ORDER BY t.topic_first_post_id DESC

Modified query

PostModel.selectLatestByForumForRSS = SELECT p.topic_id, p.topic_id, p.post_id, p.forum_id, pt.post_subject AS subject, pt.post_text, p.post_time, p.user_id, u.username, u.user_first_name, u.user_last_name
FROM (select topic_first_post_id from jforum_topics where forum_id = ? order by topic_first_post_id desc limit ?) as nested,
jforum_topics t, jforum_posts p, jforum_posts_text pt, jforum_users u
WHERE p.post_id = nested.topic_first_post_id
AND p.topic_id = t.topic_id
AND p.user_id = u.user_id
AND p.post_id = pt.post_id
AND p.need_moderate = 0
ORDER BY t.topic_first_post_id DESC

Original explain

Row #         QUERY PLAN
1       Limit (cost=97566.69..97566.70 rows=1 width=504)
2       -> Sort (cost=97566.69..97566.70 rows=1 width=504)
3       Sort Key: t.topic_id
4       -> Nested Loop (cost=275.61..97566.68 rows=1 width=504)
5       -> Nested Loop (cost=275.61..97560.76 rows=1 width=473)
6       -> Nested Loop (cost=275.61..97556.05 rows=1 width=32)
7       Join Filter: (“inner”.topic_id = “outer”.topic_id)
8       -> Bitmap Heap Scan on jforum_topics t (cost=275.61..16628.72 rows=18174 width=8)
9       Recheck Cond: (forum_id = 1)
10       -> Bitmap Index Scan on idx_topics_forum (cost=0.00..275.61 rows=18174 width=0)
11       Index Cond: (forum_id = 1)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
16       Index Cond: (“outer”.post_id = pt.post_id)
17       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
18       Index Cond: (“outer”.user_id = u.user_id)

Explain on modified query
Row #         QUERY PLAN
1       Sort (cost=3807.84..3807.94 rows=40 width=504)
2       Sort Key: t.topic_id
3       -> Nested Loop (cost=0.00..3806.78 rows=40 width=504)
4       -> Nested Loop (cost=0.00..3579.24 rows=40 width=500)
5       -> Nested Loop (cost=0.00..3342.41 rows=40 width=469)
6       -> Nested Loop (cost=0.00..3164.39 rows=40 width=453)
7       -> Limit (cost=0.00..2975.39 rows=40 width=4)
8       -> Index Scan Backward using idx_topics_fp on jforum_topics (cost=0.00..1351866.51 rows=18174 width=4)
9       Filter: (forum_id = 1)
10       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
11       Index Cond: (pt.post_id = “outer”.topic_first_post_id)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
16       Index Cond: (“outer”.user_id = u.user_id)
17       -> Index Scan using jforum_topics_pkey on jforum_topics t (cost=0.00..5.68 rows=1 width=4)
18       Index Cond: (“outer”.topic_id = t.topic_id)

Explain after index
Row #         QUERY PLAN
1       Sort (cost=978.89..978.99 rows=40 width=504)
2       Sort Key: t.topic_id
3       -> Nested Loop (cost=0.00..977.83 rows=40 width=504)
4       -> Nested Loop (cost=0.00..741.01 rows=40 width=473)
5       -> Nested Loop (cost=0.00..513.47 rows=40 width=469)
6       -> Nested Loop (cost=0.00..335.45 rows=40 width=453)
7       -> Limit (cost=0.00..146.44 rows=40 width=4)
8       -> Index Scan Backward using idx_topics_fp_forum on jforum_topics (cost=0.00..66535.36 rows=18174 width=4)
9       Index Cond: (forum_id = 1)
10       -> Index Scan using jforum_posts_text_pkey on jforum_posts_text pt (cost=0.00..4.70 rows=1 width=449)
11       Index Cond: (pt.post_id = “outer”.topic_first_post_id)
12       -> Index Scan using jforum_posts_pkey on jforum_posts p (cost=0.00..4.44 rows=1 width=24)
13       Index Cond: (p.post_id = “outer”.topic_first_post_id)
14       Filter: (need_moderate = 0)
15       -> Index Scan using jforum_topics_pkey on jforum_topics t (cost=0.00..5.68 rows=1 width=4)
16       Index Cond: (“outer”.topic_id = t.topic_id)
17       -> Index Scan using jforum_users_pkey on jforum_users u (cost=0.00..5.91 rows=1 width=35)
18       Index Cond: (“outer”.user_id = u.user_id)

Comments

Comment from Ranch Hand
Posted: December 24, 2008 at 4:52 am

Huh! Very easy to understand!

Comment from Anarbendaccek
Posted: February 1, 2009 at 3:22 am

http://www.selikoff.net – cool sitename man)))

Write a comment