1

I have deployed an instance of SQL Server 2019 on Kubernetes, and I��ve enabled Change Data Capture (CDC) on the SQL server.

Periodically, I encounter a “Non-yielding Scheduler” error in the logs, after which the database begins to consume all allocated CPU resources and stops responding to queries.

There are no signs of resource shortages up until the problem occurs, and the database is not under heavy load since this is a development setup. Both Kubernetes and the database are stored on two SSDs in Raid, so I doubt that read-write limits are the cause.

Before enabling CDC, the database operated without issues. Image: 2019-CU25-ubuntu-20.04 (env: PID - Developer, AGENT_ENABLED - true) Logs:

 getspinlock pre-Sleep(): spid 0, 1790 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 1685 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 1467 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 3232 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 3124 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 2904 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 4531 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 4276 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 3827 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 692 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 691 yields on lock type "XDESMGR" (adr 00000010029316C0)
 2024-03-23 14:53:13.66 Server      Using 'dbghelp.dll' version '4.0.5' 
 getspinlock pre-Sleep(): spid 0, 667 yields on lock type "XDESMGR" (adr 00000010029316C0)
 2024-03-23 14:53:14.15 Server      ***Unable to get thread context for spid 0 
 2024-03-23 14:53:14.16 Server      * ******************************************************************************* 
 2024-03-23 14:53:14.16 Server      * 
 2024-03-23 14:53:14.16 Server      * BEGIN STACK DUMP: 
 2024-03-23 14:53:14.16 Server      *   03/23/24 14:53:14 spid 468 
 2024-03-23 14:53:14.16 Server      * 
 2024-03-23 14:53:14.16 Server      * Non-yielding Scheduler 
 2024-03-23 14:53:14.17 Server      * 

 2024-03-23 14:53:14.17 Server      * ******************************************************************************* 
 2024-03-23 14:53:14.17 Server      Stack Signature for the dump is 0x0000000000000014 
 getspinlock pre-Sleep(): spid 0, 5171 yields on lock type "XDESMGR" (adr 00000010029316C0)
 2024-03-23 14:53:20.87 Server      External dump process return code 0x20000001. 
 External dump process returned no errors.  
 2024-03-23 14:53:20.88 Server      Process 0:0:0 (0x570) Worker 0x000000101E398160 appears to be non-yielding on Scheduler 0. Thread creation time: 13355613019271. Approx Thread CPU Used: kernel 100 ms, user 38760 ms. Process Utilization 12%. System Idle 0%. Interval: 70011 ms. 
 getspinlock pre-Sleep(): spid 0, 4854 yields on lock type "XDESMGR" (adr 00000010029316C0)
 2024-03-23 14:53:25.96 Server      Process 0:0:0 (0x168) Worker 0x000000101A354160 appears to be non-yielding on Scheduler 15. Thread creation time: 13355563362779. Approx Thread CPU Used: kernel 0 ms, user 37360 ms. Process Utilization 12%. System Idle 0%. Interval: 77297 ms. 
 getspinlock pre-Sleep(): spid 0, 4403 yields on lock type "XDESMGR" (adr 00000010029316C0)
 2024-03-23 14:53:30.97 Server      Process 0:0:0 (0x88) Worker 0x00000010389FE160 appears to be non-yielding on Scheduler 9. Thread creation time: 13355657814870. Approx Thread CPU Used: kernel 70 ms, user 35910 ms. Process Utilization 12%. System Idle 0%. Interval: 77313 ms. 
 getspinlock pre-Sleep(): spid 0, 1243 yields on lock type "XDESMGR" (adr 00000010029316C0)
 getspinlock pre-Sleep(): spid 0, 1246 yields on lock type "XDESMGR" (adr 00000010029316C0)
 .....

How can I resolve or track down the cause of this issue?

What I tried: I attempted to resolve the “Non-yielding Scheduler” error by increasing the resource limits and setting the minimum and maximum amount of memory that SQL Server can consume. Despite these adjustments, the issue persisted without any improvement.

What I was expecting: I was expecting that adjusting the resource limits and memory consumption settings for SQL Server would alleviate the problem, allowing the database to operate without triggering the “Non-yielding Scheduler” error and the subsequent CPU resource exhaustion.

Microsoft SQL Server 2019 (RTM-CU25) (KB5033688) - 15.0.4355.3 (X64) Jan 30 2024 17:02:22

0

1 Answer 1

2

I applied the suggestion from AlwaysLearning for my MS SQL Server 2019 instance:

Does enabling Forced Unit Access (FUA) help? See Log reader agent may raise a Non-yielding Scheduler dump in SQL Server running on Linux OS.

Here’s what I did:

Created a mssql.conf file with the following contents:

[traceflag]
traceflag0 = 3979

[control]
writethrough = 1
alternatewritethrough = 0

Placed this file in the /var/opt/mssql/ directory as part of the deployment process.

After monitoring the system for almost one and a half months, the issue did not reoccur. Therefore, I believe this advice serves as a solution to this problem.

Not the answer you're looking for? Browse other questions tagged or ask your own question.