1

Problem Statement:

we have a Enterprise product with Postgresql db as a backend. one of the tables has 5 columns and respective data type is (bigint time , bigint store_time , varchar category, varchar sub_catorgy, integer count). Totally 15 Million unique records were present.

Objective:

we are trying to fetch/ select ~14 Million records to show in UI within a 1 or 2 sec's response time.

Current Observation and Analysis:

we have designed the select statement as like below to fetch 14 Million records

query - " Select Time, category,sub_catorgory,count from table where time between 1541156340000 and 1560509940000"

Time is maintained in Epoch.

Table have a index on time, category, sub_category and count columns. Store time column not used anywhere.

While running the above query in pgadmin / same server box postgres command line, ~ 25 seconds consumed to retrieve ~14 million records.

We have tried with both Sequential scan as well as index only Scan by modifying below parameter in the postgresql.conf file. Both consumed same 25 seconds (No Diff)

Attempt 1:

enable_indexscan = on
enable_seqscan = on

Attempt 2:

enable_indexscan = on
enable_seqscan = off

While analyzing the Query plan with "EXPLAIN ANALYZE Select Time, category,sub_category,count from table where time between 1541156340000 and 1560509940000" command we could see, Total execution time is ~ 3 -4 Sec where as while running the select query alone (Select Time, category,sub_category,count from table where time between 1541156340000 and 1560509940000), 25 sec's consumed. ( Not sure, What is the diff between raw select query execution time and "EXPLAIN ANALYZE" execution time. We are assuming both should be same. - please confirm if we are wrong about this understanding)

Explain analyze Output:


"Seq Scan on table  (cost=0.00..393652.98 rows=13540999 width=52) (actual time=0.007..3346.241 rows=13545864 loops=1)"
"  Filter: ((time >= '1541156340000'::bigint) AND (time <= '1560509940000'::bigint))"
"Planning time: 0.192 ms"
"Execution time: 3733.535 ms"

We have also tried altering the following parameter in postgresql config file considering machine capacity( 24 core -Hyper threading, 128 GB RAM, 4 TB Disk) still no luck. Observed the same performance numbers with default parameter values as well.

Note:
Out of 128 GB RAM, our application JVM is utilizing 24 GB RAM. both application and Postgresql DB in same JVM. During the Query execution Resource utilization is normal only.

Parameter tuned:


shared_buffers = 6GB
effective_cache_size = 18GB
maintenance_work_mem = 1536MB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1
autovacuum = on
seq_page_cost = 1.0
effective_io_concurrency = 200
work_mem = 62914kB
min_wal_size = 1GB
max_wal_size = 2GB

Note:
25 sec's consumption only for Query execution. as soon as query executed, we are able to see the results in UI. So considering this, No delay between query result and showing it in UI.

Expected Result

We are expecting the above select query should return the results within 1 or 2 sec.

We need some help for tuning the Query or Parameter to boost the performance.

0

You must log in to answer this question.