5

On PostgreSQL 9.4, I'm trying to achieve something I'll call "aggregate function" on some contiguous rows. Example:

Input data:

recipe  prod1   prod2   timestamp
0       5       4       2015-07-02 08:10:34.357
0       2       7       2015-07-02 08:13:45.352
0       7       0       2015-07-02 08:16:22.098
1       3       2       2015-07-02 08:22:14.678
1       9       4       2015-07-02 08:22:56.123
2       2       6       2015-07-02 08:26:37.564
2       1       7       2015-07-02 08:27:33.109
2       0       8       2015-07-02 08:31:11.687
0       3       5       2015-07-02 08:40:01.345
1       4       2       2015-07-02 08:42:23.210

Desired output:

recipe  prod1_sum   prod2_avg   timestamp_first             timestamp_last
0       14          3.6666      2015-07-02 08:10:34.357     2015-07-02 08:16:22.098
1       12          3           2015-07-02 08:22:14.678     2015-07-02 08:22:56.123
2       3           7           2015-07-02 08:26:37.564     2015-07-02 08:31:11.687
0       3           5           2015-07-02 08:40:01.345     2015-07-02 08:40:01.345
1       4           2           2015-07-02 08:42:23.210     2015-07-02 08:42:23.210

Basically, one output line for each "group" of contiguous rows (when table is sorted on timestamp column) with the same "recipe" value. In the output, prod1_sum is the sum of prod1 in the "group", prod2_avg is average of prod2 in the same "group", and the 2 last columns are respectively the first and the last timestamps in the group. Obviously there are several distinct groups with the same "recipe" value, and I do want an output row for each of them.

At the moment, I've an ugly way of obtaining this based on several requests and a lot of data processing outside of the DB, which I really want to avoid, and which is not worth showing.

My problem really is the "grouping" of rows. I know how to create an aggregate function which would do what I want, if I could apply it to each group individually. I have looked into windows functions, but it seems this would group all values by recipe, not conforming with the "contiguous rows" principle I need to respect.

1 Answer 1

8

You can use the following query:

SELECT recipe, SUM(prod1) AS prod1_sum,
       AVG(prod2) AS prod2_avg, 
       MIN(timestamp) AS timestamp_first, MAX(timestamp) AS timestamp_last
FROM (       
   SELECT recipe, prod1, prod2, timestamp,
          ROW_NUMBER() OVER (ORDER BY timestamp) 
          - 
          ROW_NUMBER() OVER (PARTITION BY recipe 
                             ORDER BY timestamp) AS grp
   FROM mytable ) t
GROUP BY recipe, grp
ORDER BY timestamp_first

The trick here is the usage of ROW_NUMBER window function to identify islands of continuous recipe values: grp calculated field does exactly this.

Demo here

1
  • This is absolutely awesome and spot on! The way you wrote it make it crystal clear. I will dig in the documentation to better understand the details (ROW_NUMBER() OVER, the "t" after FROM mytable, ...), but I'm on the rails now. Thank you very much. Commented Jul 2, 2015 at 7:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.