postgresql – selecting maximum for each group

I had a requirement to return the thread with the most replies in each forum at JavaRanch‘s Coderanch forums.  In Postgresql 8.4, this would be very easy – just use the window functions.  Unfortunately, we aren’t on Postgresql 8.4 yet.  The other common pattern is something like

select stuff
from mytable t1
where date = (select max(date)
from mytable t2
where t1.key= t2.key)

This doesn’t work well for me either because the value I am comparing (number of posts is dynamic.)  I decided to use a stored procedure to “simplify” things.  I’ve written some stored procedures in Postgresql to do updates before and stored procedures in other languages to do queries so this didn’t seem like a huge task.

Postgresql calls them stored functions, so let’s proceed.  First you need to create a type to represent a row that gets returned by the stored function.

CREATE TYPE highlighted_topic_per_forum_holder AS
(last_user_id integer,
post_time timestamp without time zone,
<other columns go here>);

Then you create the stored procedure.  The outer loop goes through each forum.  The inner loop is the SQL that finds the post with the most replies that was posted to in some time period.  It uses a nested query with a limit to return only 1 thread per forum.  The rest of the SQL adds the relevant data.  See postgresql and JDBC for why it takes varchar rather than timestamp.

CREATE or replace FUNCTION highlighted_topic_per_forum(character varying)
RETURNS SETOF highlighted_topic_per_forum_holder AS
$BODY$
DECLARE
forum RECORD;
r highlighted_topic_per_forum_holder%rowtype;
BEGIN
for forum in EXECUTE 'select forum_id from jforum_forums where categories_id in (1,7)' loop
for r in EXECUTE 'select p.user_id AS last_user_id, p.post_time, p.attach AS attach, t.* '
|| 'from jforum_topics t, jforum_posts p, '
|| '(select topic_id, count(*) from jforum_posts '
|| ' where post_time >= date '' ' || $1 || ' '' '
|| ' and forum_id = ' || forum.forum_id
|| ' AND need_moderate = 0 '
|| ' group by topic_id order by 2 desc limit 1) nested '
|| ' where p.topic_id = nested.topic_id '
|| ' and p.post_id = t.topic_last_post_id '
|| ' order by post_time desc' loop
return next r;
end loop;
end loop;
return;
END;
$BODY$
LANGUAGE 'plpgsql';

3 thoughts on “postgresql – selecting maximum for each group

  1. Pingback: postgresql and jdbc | Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

  2. Pingback: upgrading from postgres 8.3 to 8.4 | Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Leave a Reply

Your email address will not be published. Required fields are marked *