8

I have a little table to try to understand how the LAST_VALUE function works in PostgreSQL. It looks like this:

 id | value
----+--------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | [null]
  6 | F

What I want to do is to use LAST_VALUE to fill the NULL value with the precedent non-NULL value, so the result should be this:

 id | value
----+--------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | E
  6 | F

The query I tried to accomplish that is:

SELECT LAST_VALUE(value)
OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;

From what I understand of the LAST_VALUE function, it takes all the rows before the current one as a window, sorts them following the ORDER By thing and then returns the last row of the window. With my ORDER BY, all the rows containing a NULL should be put on top of the window, so LAST_VALUE should return the last non NULL value. But it doesn't.

I am clearly missing something. Please help.

2
  • 1
    I am not sure why you want to use last_value() here, but usually this does not work as expected and first_value with desc order is what you want. see here: stackoverflow.com/questions/42299101/…
    – S-Man
    Commented Sep 25, 2019 at 11:21
  • Could you please explain why you choose the last_value function?
    – S-Man
    Commented Sep 25, 2019 at 11:23

3 Answers 3

10

I'm not sure last_value will do what you want. It would be better to use lag:

select id,
coalesce(value, lag(value) OVER (order by id))
FROM test;
 id | coalesce
----+----------
  0 | A
  1 | B
  2 | C
  3 | D
  4 | E
  5 | E
  6 | F
(7 rows)

last_value will return the last value of the current frame. Since you partitioned by id, there's only ever one value in the current frame. lag will return the previous row (by default) in the frame, which seems to be exactly what you want.

To expand on this answer a bit, you can use row_number() to give you a good idea of the frame you are looking at. For your proposed solution, look at the row numbers for each row, when you partition by id:

SELECT id, row_number() OVER (PARTITION BY id ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC)
FROM test;
 id | row_number
----+------------
  0 |          1
  1 |          1
  2 |          1
  3 |          1
  4 |          1
  5 |          1
  6 |          1
(7 rows)

Each row is its own frame, so you won't be able to get anything values from other rows.

If we don't partition by id, but still use your ordering, you can see why this still won't work for last_value:

 SELECT id, row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id)
FROM test;
 id | row_number
----+------------
  5 |          1
  0 |          2
  1 |          3
  2 |          4
  3 |          5
  4 |          6
  6 |          7
(7 rows)

In this case, the row that was NULL is first. By default, last_value will include rows up to the current row, which in this case is just the current row for id 5. You could include all rows in your frame:

SELECT id, 
  row_number() OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, 
id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 
  last_value(value) OVER (ORDER BY case WHEN value IS NULL THEN 0 ELSE 1 END ASC, id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
 id | row_number | last_value
----+------------+------------
  5 |          1 | F
  0 |          2 | F
  1 |          3 | F
  2 |          4 | F
  3 |          5 | F
  4 |          6 | F
  6 |          7 | F
(7 rows)

But now the last row is the end of the frame and it's clearly not what you want. If you're looking for the previous row, choose lag().

4
  • Why not ORDER BY value NULLS FIRST instead of your CASE clause?
    – S-Man
    Commented Sep 25, 2019 at 12:46
  • Sure, that would give the same (incorrect) result. I was just trying to use the syntax that @thebuleon29 used.
    – Jeremy
    Commented Sep 25, 2019 at 12:51
  • Thanks a lot for your answer, but I am even more confused now. The ORDER BY is applied to the result as well ? I though it was only for ordering the frame before LAST_VALUE picked its result.
    – Gaëtan
    Commented Sep 25, 2019 at 12:59
  • We didn't specify an order for the result, if we had, we could order by whatever we want.
    – Jeremy
    Commented Sep 25, 2019 at 13:15
2

So, thanks to Jeremy's explanations and another post (PostgreSQL last_value ignore nulls) I finally figured it out:

SELECT id, value, first_value(value) OVER (partition by t.isnull) AS new_val
FROM(
    SELECT id, value, SUM (CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY id) AS isnull
    FROM test) t;

This query returns the result I expected.

1

The trick here is to provide BETWEEN params, like this:

SELECT
  id,
  COALESCE(value, LAST_VALUE(value) OVER id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING))
FROM test;

The issue with your first attempt was -aside from partitioning- that ever since BETWEEN params weren't provided, it assumed these by default:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Even more confusing is that few window functions, like RANK, ROW_NUMBER, NTILE, etc. assume these by default:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

But, your final solution is still more robust, since it handles contiguous null values. I just wanted to point out this default behavior since I've seen people going through this many times.

Not the answer you're looking for? Browse other questions tagged or ask your own question.