62

Quite often in my work the idea of 2-way data synchronisation between database systems crops up. The classic example is two slightly different CRM systems (say, Raiser's Edge and Salesforce) and the need to have a two-way sync of Contact data between them.

API considerations aside, assuming you have a shared key to sync on, and purely thinking of the algorithm/pattern to be employed, this is a task that is often underestimated by non-techies.

For example, you have to watch out for:

  • Can you easily detect which records have changed in both systems (or will you have to compare all records between the systems to detect changes)
  • If you're going for a once-every-N-hours sync, how to deal with conflicts where the same record changes at more-or-less the same time in both systems
  • If you're going for a real-time sync (ie an update in one system immediately triggers an update to the other system) how to handle divergence over time due to bugs or system crashes

Personally I can think of ways to tackle all this but I'm wondering if there are any well known patterns, literature or best practices that I could refer to.

4
  • what you describe sounds pretty close to Federated database system - is that correct?
    – gnat
    Commented Feb 17, 2012 at 8:03
  • @gnat: Thanks for the link, some of the concerns are similar (eg dealing with heterogeneity), but I'm talking about syncing a subset of data from two autonomous databases whereas that seems to be more about creating a fully integrated view of everything across multiple dbs.
    – codeulike
    Commented Feb 20, 2012 at 11:48
  • 5
    7 years later, 50 upvotes but only 1 decent answer. There must be some syncronisation patterns or best practices out there?
    – codeulike
    Commented Feb 14, 2019 at 19:56
  • Seems to me that, in addition to the shared key, you need at least a last time updated stamp on each system’s records. That or you could hash the shared fields and compare hashes. It is a pretty low catch for a good Q (at least if MS’s stack cant be applied).
    – JL Peyret
    Commented Jun 5, 2023 at 1:43

2 Answers 2

10

Yes, a hard problem, easily underestimated. And could be a lot of work. If you are on Microsoft technologies, you may want to have a look at Microsoft Sync Framework here and here.

3
  • 1
    Thanks, thats interesting. I'd heard of Ms Sync Framework but hadn't realised it was so generalised. It basically is a pattern for handling sync issues in general.
    – codeulike
    Commented Feb 20, 2012 at 11:35
  • 2
    Microsoft Sync Framework was replaced by Microsoft Sync Framework Toolkit. Commented Dec 9, 2014 at 20:56
  • I am frustrated with the docs, which not that clear, specially for Non SQL-Server ADO.NET data providers, which is my case. Besides, my workplace is looking for something that does not require adding infrastructure tables/making changes in the Production environment. So I am about to discard this one.
    – Veverke
    Commented May 14, 2017 at 13:24
0

There are many theories about remote site DB synchronization. First start with INSERT. handling this one is easy - as you can create a unique ID for every site (for example an initial of the site name + ID (number): site_a_177 vs. site_b_53)

So insert should not create any conflicts. the problem is the update. I don't believe that there's a 100% failure proof method, but you can start an update by "locking" the record in the remote DB, and only after you got the handle - continue with the update, and finish by syncing the update and only then release the lock.

2
  • 1
    Thanks, I think you're talking about distributed dbs with the same schema and dealing with distributed transactions. I'm thinking more of scenarios where the two DBs are completely autonomous (e.g. they assign unique ids in completely different ways and the schemas differ) but you want to sync a subset of the data in them.
    – codeulike
    Commented Feb 20, 2012 at 11:51
  • It sounds like there shouldn't be any conflicts. In that case it should be very simple - just save the "last record-id" that was synced for each table and continue from there.
    – Nir Alfasi
    Commented Feb 20, 2012 at 16:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.