0

My Postgres table has a range column containing timestamps with time zone. I have created an index on the lower bound of the range, like so:

CREATE INDEX bdg_sys_period_start_idx ON building USING btree (lower(sys_period));

Now I am trying to run the following query:

select * from building where lower(sys_period) > '2024-05-12 10:31:14.481545+01'::timestamptz;

Here comes the interesting part. I run an ANALYZE on the table, then an EXPLAIN on the query. I get this:

enter image description here

Perfect, Postgres wants to use my new index!

Then I launch the query and it takes ages. I stop the query, and run the EXPLAIN again. And surprise, the query planner now tells me he wants to use a seq scan.

enter image description here

I see that the planned number of returned rows jumps from 97k to 1.6M. The real number being 30 rows.

I have many questions regarding this situation:

  • Why is the query planner suddenly changing his mind?
  • Are statistics supposed to be collected for range columns? I have seen this discussion, but I am not sure this has been implemented.
  • I have tried to create a custom statistic on lower(sys_period) directly:
CREATE STATISTICS IF NOT EXISTS sys_period_start_range ON ( lower(sys_period) ) FROM building;

Is it supposed to be helpful?

  • I have tried to increase the size of the statistic on the sys_period column
ALTER TABLE building ALTER sys_period SET STATISTICS 1000;

Is it supposed to be helpful?

Thanks in advance for your help.

2 Answers 2

2

I finally understood and I feel stupid. I am going to describe what happend in case someone else runs into the same problem. I am using DBeaver and the connection I am using for this DB has autocommit disabled. When I run analyze, it starts a transaction. Then when I run explain, it gives me the expected query planner results (index scan) and rolls back the current transaction. When I run the query planner again, the previous analyze has been rolled back and the seq scan shows up again.

1

Are you sure that you ran the first ANALYZE after you created the index and before you ran the query? If you ran ANALYZE before you created the index, that would explain why the planner chooses a different plan. PostgreSQL starts collecting statistics for the indexed expression, but that does not happen before the next ANALYZE runs.

The extended statistics are unnecessary — that just duplicates what happens automatically anyway.

Increasing the statistics target for that one column can make a difference; you'd have to try it.

If everything is as you described it, the jump in the estimates is mysterious. With the results for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) for both cases, we might be able to tell you more.

Not the answer you're looking for? Browse other questions tagged or ask your own question.