3

I have following working query:

select children.autor as child,  parents.autor as parent, count(*) from comments children
left join comments parents on (children.parentid = parents.commentid)
group by child, parent
order by count(*) desc
limit 4;

which produces following output:

    child           |       parent        | count
peter               | max                 |   154
alex                | peter               |   122
peter               | kARL                |    82
stephen             | alex                |    50

Now the comments table also has a column 'bodytext' that is the actual comment and I want to include the last comment in the selection for each pair of child and parent.

So in the first line I want the last comment that peter wrote in reply to max. So far I don't even know how to approach this. A subquery? Some sort of window function?

If I use (max)bodytext, it almost does exactly what I want.. just the longest comment which is not really want I want.

2
  • Do each of the comments have any incremental id field? If so, you might just select the comment body with the max id out of all the comments that peter wrote max. I'll give it a shot later if you're still having issues as I'm a bit short on time at the moment.
    – filpa
    Commented Apr 1, 2018 at 22:15
  • Yes each comment has an incremental id.
    – Alexander
    Commented Apr 2, 2018 at 7:44

5 Answers 5

12

Add just this 1 line to your select:

(array_agg(children.bodytext ORDER BY children.commentid DESC))[1] AS last_comment

It will create an array of all the comments for every group, comments will be ordered as specified (by children.commentid DESC), and then you take just the 1st element of the array = the last comment.

Whole code:

SELECT
  children.autor AS child,
  parents.autor AS parent,
  count(*),
  (array_agg(children.bodytext ORDER BY children.commentid DESC))[1] AS last_comment
FROM
  comments AS children
  LEFT JOIN comments AS parents
    ON (children.parentid = parents.commentid)
GROUP BY child, parent
ORDER BY count(*) DESC;
5
  • This works as intended. Can I understand array_agg as way to create your "custom" aggregate function? Is the [1] at the end limit 1? Thanks!
    – Alexander
    Commented Apr 2, 2018 at 7:58
  • @Alex array_agg is an aggregate function that creates an array from a specific field of all the records selected. [1] is array notation to get the first item in the array (PG array indexing starts at 1 rather than 0). Since you can put a sort order in array_agg, what the above does is create an array of bodytext ordered by commentid DESC, then gets the first item in that array.
    – 404
    Commented Apr 2, 2018 at 11:59
  • I found a lot of complexity in other solutions (not just on this question). This was the answer I was searching for, ans should be the correct one. Thanks man!
    – llobet
    Commented Oct 29, 2020 at 16:30
  • The only drawback of this answer is that it passes over the table more than once, so performance-wise it's not exactly ideal. But it's also much more readable! Commented Apr 19, 2021 at 12:50
  • Hey, thanks a lot for that one liner with explanation. Used it in quite a different and much more complex scenario, but after disecting it the proper place to put it - it worked like a charm! Commented Aug 5, 2022 at 17:12
1

There are two approaches to this. One uses window/analytic functions, particularly first_value(). The other uses arrays.

An example of the first approach:

select distinct c.autor as child,  cp.autor as parent,
       count(*) over (partition by c.autor, cp.autor) as cnt,
       first_value(c.bodytext) over (partition by c.autor, cp.auto order by ? desc)
from comments c left join
     comments cp
     on c.parentid = cp.commentid
order by cnt desc
limit 4;

The ? is for the column that specifies the ordering.

2
  • This also works as intended. I guess "partition by partition by" is simply a typo. I remove the second partition by. Either my dataset (400.000) is too small or the solution is about as fast as the one by Joe Samanek. Thanks!
    – Alexander
    Commented Apr 2, 2018 at 8:07
  • This is probably more efficient than array_agg because there could be a lot of text to agg
    – 404
    Commented Apr 2, 2018 at 12:03
0

This should do it if commentid is in chronological order. Your query, with max(c.commentid) added to it, is called query1 here and then used in an outer query where the last bodytext is joined in:

with query1 as (
  select c.autor child, p.autor parent, count(*) cnt, max(c.commentid) id_last
  from       comments c
  left join  comments p on p.commentid=c.commentid
  group by c.autor, p.autor
)
select q.child, q.parent, q.cnt, t.bodytext comment_last
from      query1   q
left join comments t on t.commentid = q.id_last
order by t.cnt desc
limit 4;

Not sure if the lefts are needed. Depends on your data and what you want. (And btw, autor should perhaps be written author)

0

Assume the latest bodytext has the highest commentid. Use window function FIRST_VALUE to get the latest bodytext and COUNT to get the count for within each group of child/parent. Use DISTINCT to eliminate the duplicates.

select distinct child, parent, 
first_value(bodytext) over (partition by child, parent order by commentid desc) bodytext,
count(*) over (partition by child, parent ) count
from
(select children.autor as child,  parents.autor as parent, children.bodytext, children.commentid 
from comments children
left join comments parents on (children.parentid = parents.commentid )) t
order by count desc
limit 4;
0

Since you're only retrieving 4 rows, a subquery should do fine here, as it should only be evaluated for the rows fetched.

select children.autor as child,  parents.autor as parent, count(*),
  (select bodytext
   from comments
   where commentid = max(child.commentid)) as last_comment
from comments children
left join comments parents on (children.parentid = parents.commentid)
group by child, parent
order by count(*) desc
limit 4;

Not the answer you're looking for? Browse other questions tagged or ask your own question.