The question is old, but this solution is simpler and faster than what's been posted so far:
SELECT b.machine_id
, batch
, timestamp_sta
, timestamp_stp
, min(timestamp_sta) OVER w AS batch_start
, max(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp a
JOIN db_data.ll_lu b ON a.ll_lu_id = b.id
WINDOW w AS (PARTITION BY batch, b.machine_id) -- No ORDER BY !
ORDER BY timestamp_sta, batch, machine_id; -- why this ORDER BY?
If you add ORDER BY
to the window frame definition, each next row with a greater ORDER BY
expression has a later frame start. Neither min()
nor first_value()
can return the "first" timestamp for the whole partition then. Without ORDER BY
all rows of the same partition are peers and you get your desired result.
Your added ORDER BY
works (not the one in the window frame definition, the outer one), but doesn't seem to make sense and makes the query more expensive. You should probably use an ORDER BY
clause that agrees with your window frame definition to avoid additional sort cost:
...
ORDER BY batch, b.machine_id, timestamp_sta, timestamp_stp;
I don't see the need for DISTINCT
in this query. You could just add it if you actually need it. Or DISTINCT ON ()
. But then the ORDER BY
clause becomes even more relevant. See:
If you need some other column(s) from the same row (while still sorting by timestamps), your idea with FIRST_VALUE()
and LAST_VALUE()
might be the way to go. You'd probably need to append this to the window frame definition then:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
See: