I recently upgraded a postgres database from version 9.3 to 14.5 (yes, quite a jump). The upgrade seemed successful, but I am getting an issue with a particular query (see at the end).
The query result is returned in about 3 seconds on the old database, but seems to hang indefinitely on the new version of postgres. From pg_stat_activity
I see that 3 processes are spawned for this query, of which 2 are in wait_event MessageQueueSend
. After about 10 minutes one of the processes in MessageQueueSend
is gone.
I tried playing with the following settings, but without any improvements:
- dynamic_shared_memory_type
- shared_memory_type
- min_dynamic_shared_memory
- effective_io_concurrency
Any hints as to where to look for would be much appreciated. I cannot find any other question that involves getting stuck at the MessageQueueSend
wait_event, so I hope someone could help me here.
Edit:
On suggestion of @jjanes, after setting max_parallel_workers_per_gather
to zero it was still getting stuck (but now without the two worker processes; just the master process, albeit in active state). After running strace on the pid of the master process, I found it was stuck doing lseek(41, 0, SEEK_END)
. File descriptor 41 corresponds to a table for which the user running the query has no permissions to. Running the same query as an admin user runs the query directly, but returning 0 rows (was expecting more than that). Makes me think whether or not the upgrade corrupted some tables..
I guess as a follow-up question: is there a way to validate the integrity of a database once it has been upgraded?
EXPLAIN output of old database version (returns in 3 seconds): https://pastebin.com/ybXMW4b4
EXPLAIN (ANALYZE, BUFFERS) output of old database version (returns in 3 seconds): https://pastebin.com/Yh3BXked
EXPLAIN output of new database version (does not return): https://pastebin.com/K51sRifF
Query in question:
SELECT trackpoint.device_info_serial,
trackpoint.date_time,
message.battery_voltage,
message.memory_usage,
trackpoint.latitude,
trackpoint.longitude,
sessionbirdspecies.project_id,
sessionbirdspecies.track_session_id,
sessionbirdspecies.individual_id,
sessionbirdspecies.tracker_id
FROM
(SELECT tracksession.key_name,
tracksession.project_id,
tracksession.tracker_id,
tracksession.individual_id,
tracksession.track_session_id,
tracksession.start_date,
tracksession.end_date,
tracksession.device_info_serial,
bird.ring_number,
bird.sex,
bird_s.english_name,
bird_s.latin_name
FROM gps.ee_track_session_limited AS tracksession
LEFT JOIN gps.ee_individual_limited AS bird ON tracksession.individual_id = bird.individual_id
LEFT JOIN gps.ee_species_limited AS bird_s ON bird.species_latin_name=bird_s.latin_name) AS sessionbirdspecies
INNER JOIN gps.ee_sms_position_limited AS trackpoint ON sessionbirdspecies.device_info_serial = trackpoint.device_info_serial
LEFT JOIN gps.ee_sms_message_limited AS message ON trackpoint.id=message.id
WHERE 1=1
AND trackpoint.date_time >= '2022-10-01 00:00'
AND trackpoint.date_time < '2022-11-01 14:01'
ORDER BY trackpoint.date_time;
set max_parallel_workers_per_gather TO 0;
, or try to figure what is going wrong with it withEXPLAIN
.