4

We run a lot of change tracking in production. Nearly all tables are change tracked. This is for purposes of extracting SQL data to Databricks. Other than the CPU overhead, change tracking rarely causes headaches. But when it does, it’s bad.

The problem manifests with spids waiting on COMMIT_TABLE indefinitely. They just hang and hang and cannot be killed.

enter image description here

Full backups against the database “locked up” on these waits cannot occur with the server in this state. The only remedy we’ve been able to come up with is to fail the instance over to a secondary replica in the Availability Group and then reset the replica with the COMMIT_TABLE storm.

The only answer I’ve found online for alleviating this wait type is to attempt to clean up the commit table manually with the “sys.sp_flush_commit_table_on_demand” proc. Guide here - Change Tracking Cleanup – Going Beyond Automatic – SirSQL.net. This has proven useless for the kinds of backups we’re experiencing. I’ve even tried running the proc in a job overnight every minute to constantly flush out the commit table and we’re still experiencing this problem.

Has anyone else run into this? Is there a trick to using the flush commit table on demand proc? Is it possible we’re just running way too much change tracking and there’s no work around here? Thanks!

0

2 Answers 2

6

probable

Is it possible we’re just running way too much change tracking and there’s no work around here?

I've seen this in several systems, and the only way out was to switch from Change Tracking to Change Data Capture.

Change Tracking is synchronous, and adds overhead to every commit, while Change Data Capture is asynchronous and works off the transaction log.

It is possible for Change Data Capture to fall behind in processing, but there are a number of things you can tweak about the log reader jobs to help it out.

In my experience, very few consumers of Change Tracking data truly need to process absolutely synchronous, up to the second data changes.

1
  • Interesting. I will bring this up with the team involved. Thanks!
    – BBaggins
    Commented Jun 6 at 14:11
4

I have seen this issue with Change Tracking when an Availability Group is involved.

Microsoft Support was not able to offer any insights or any more detail on the causes of COMMIT_TABLE waits or how to respond to them in my case. I was unfortunately left with the sense that CT is not really supported or even understood by anyone in the product group. Perhaps that is not the case but it was my experience this year.

Some notes on what I've observed in case it helps: I have only seen this issue when AGs are involved and there is a synchronous replica. I believe that this has something to do with how CT's in-memory OLTP synchronizes data to disk-based tables with a sync AG. And since you can reproduce the issue perhaps you would have better luck with a support case than I did, if you have the patience for it.

Another way to reduce size of the CT tables is to reduce your retention period, too. If you have a regular peak time of day, setting the retention period to an offset so that cleanup works on non-peak time rows during peak time (like a 12 or 36 hour offset rather than 24) can spread things out.

If the issue happens around a predictable/regular time, you could also try raising the retention period right before the problem time so that cleanup won't do anything at all then-- just to see if it changes anything. I have found that changing retention period or disabling cleanup flushes plan cache. - Kendra Little

1
  • Thanks for your feedback! Right now all databases are set to retain 7 days. What exactly do you mean by "offset?" Is that simply raising or lowering the retention period during peak times? Is pursuing running "sys.sp_flush_commit_table_on_demand" regularly worth doing? So far like I said it hasn't helped.
    – BBaggins
    Commented Jun 6 at 14:13

Not the answer you're looking for? Browse other questions tagged or ask your own question.