QUICK SUMMARY:
We are trying to set up a remote CD server near China using SQL Server Transactional Replication. We almost have it working but are noticing that when we update an item, the remote CD is always displaying the previous version of that item.
DETAILS:
We are trying to set up Sitecore 10.3.1 in a test env (before deploying this config to production) with the following VMs:
- US CM
- US CM database
- US CD
- US CD database
- US Solr
- China CD
- China CD database
- China Solr
The Web database is being replicated from the US CD database server to the China CD database server using SQL Server Transactional Replication.
Following the Sitecore docs, we have moved the EventQueue, Properties and Tasks table from the Web database to a separate WebShared database and have configured the CM and both CDs to use it.
We also have US and CN Solr instances with Solr Replication keeping them in sync.
Sitecore documents we have consulted in setting this up:
- https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0610106
- https://doc.sitecore.com/xp/en/developers/103/platform-administration-and-architecture/configure-sitecore-roles-for-separate-eventqueue%2C-properties%2C-and-tasks-tables.html
Here are some reproducible steps that illustrate our problem:
- On US CM, update item to include "Test 1" in the content, save it and publish it
- On US CD site, corresponding page displays "Test 1"
- On China CD site, same page does not display "Test 1"
- On US CM, update same content item to read "Test 2", save it and publish it
- On US CD site, page now displays "Test 2"
- On China CD site, page now displays "Test 1" (note that it is one update behind)
- Recycle app pool on China CD site
- On China CD site, page now displays "Test 2" (note that it now has the latest content)
We believe the China CD site learns that an item has been updated via the EventQueue table, which it seems to polls pretty frequently. We also know that SQL Replication can take a few seconds to push or pull changes.
We believe the following is happening:
- China CD site gets notified via EventQueue.
- China CD site immediately pulls the content item from the China Web database (before SQL Replication has updated it) and caches it.
- User requests the page and gets the cached item with old content.
We tried a test where we waited a few minutes before requesting the page on the China CD site. And before requesting the page, we checked the item content in the China Web database and verified it had the latest content. But when we requested the page, we still got the old content!
Does Sitecore have any way to tell when the update of an item's data via SQL replication has been completed? If not, it seems like a fundamental flaw to have the trigger mechanism (detecting an update in the shared EventQueue table) be much faster than the mechanism that transfers the actual data (SQL replication). In our old Sitecore 8 env, we think the EventQueue and item data were both updated via SQL replication so maybe that's why that env didn't have these issues?
Any thoughts on how to resolve this? We submitted this problem to Sitecore Support over a month ago and haven't made any significant progress towards a solution.