8

I want to synchronise the local database of mobile devices with the server's database. This synchronisation will happen for example when user comes online, to update the local as well as the remote database with possible changes.

I thought about 2 approaches and already implemented 1 of them, but it's quite complicated and I fear a bit about its performance and error-proneness.

Approach 1.

The approach I implemented does all the synchronisation in the server. I'll store in the client's database the last server update timestamp for each item and a flag for items that are marked for removal. I send all my items - with their timestamps and flags - to the server, and the server then decides which items have to be inserted, updated - only if the last update timestamp of the client is equal to the last update timestamp stored in the server (meaning the client has the latest version of the item) and what has to be deleted. Then I do a fresh query to the database and send everything back to the client. The client overwrites it's local database with the result. In detail:

  1. Client sends all its items to server (this could be refined by filtering items that need create/update/delete, but for now it sends everything). Items have a server last update timestamp and flag if it's marked to be deleted.

  2. Server queries all the items for the user

  3. Programmatic algorithm that goes through all the items checking timestamps in order to determine if the item can be updated/deleted or not. If the last server update timestamp in the client's item is different than the one stored in the server (meaning, someone else updated the item in between) the update/delete is rejected. I build many lists here, one of items that have to be inserted (items are not in the server's query result), items that are to be updated, items that are to be deleted, items that can't be updated because the timestamp is different, items that can't be deleted because the timestamp is different.

  4. Transaction for inserts/updates/deletes, for which I use lists constructed in 3.

  5. Query to get again all the items for the user.

  6. Send result of this query to client, together with the lists of items that couldn't be updated or deleted built in 3., so client knows what failed.

The problem with this approach is the multiple queries and programmatic processing, another client (some items can have multiple users) could update something in between these queries and updates can get lost. I assume I have to compact my queries (try to somehow do everything in a single transaction, maybe use stored procedures?) and lock rows to avoid synchronisation issues. My knowledge of SQL doesn't go much further than select...where and join so I may me missing a better way to do this.

Approach 2.

Because of these complications I was thinking if there's maybe a simpler way. The other idea I had, was to do the synchronisation in the client. So I'd do first query all my data and do an update (for which I still need the last update timestamp and delete flags - only this time the sync logic is in the client and I don't have to worry about multiple users here), and then send the sync result to the server which will just do an overwrite. In detail:

  1. Download items for user

  2. Do sync alhorithm in the client, similar to what I do in the server in 3. (Approach 1).

  3. Upload items to the server, server overwrites user's items with this.

The problem here, of course, is that another client can update items in the server while this client is doing sync, and when it uploads the results, the updates of former client will be lost. I thought maybe I can work around this with a hash, so the downloaded data has a hash and when the database is to be overwritten with the sync result I do a query first and check it's the same hash, if not return an error. But then again I'm with multiple queries in the server and not sure this a recommended way.

I'd appreciate thoughts on this matter. Is my current approach the best? Or should I rather go with the client sync (this would also reduce workload in the server, which is good), but then how do I avoid possible overwrites, is the hash a good idea? Note that my app doesn't have super strict requirements, it's something like shared todo lists with a few additional features, nothing generic (like a file sharing service) or for health, science etc. so if once in a (long) while an update is lost it's not the end of the world. I care a little more about performance and implementation easiness than 100% correctness. Of course the less error prone, the better.

P.S. would also appreciate reading recommendations about this (maybe as a comment), any good resource about sync strategies. I'm using Scala, Play 2.4 and Akka in some parts so a recommendation in this direction would also be very useful.

8
  • The problem with my approach is that I'm doing many different queries and programmatic processing - is this not the problem? What transactions are being made against the DB on the server? I don't like the sound of sync'ing to a very 'active' server. I'd favour your first approach, and you don't need to worry about whether there are concurrent users sync'ing or not - update the server DB based on the latest date against each record. Commented Dec 15, 2015 at 14:43
  • I query first the items for the user, then do all the timestamp & delete flag checks programatically (build separate lists for items that have to be inserted, updated, and deleted), and then do these inserts / updates / deletes in a single transaction. After that do a new query to send items back to the user.
    – User
    Commented Dec 15, 2015 at 14:47
  • Problem there is that since the checks are done programatically, another client could perform an update just after I finished my programatic checks, and when I do the transaction then I'll overwrite the updates of the other client. I'm not very fluent in (My)SQL yet, not sure if there's a way I can do the whole thing in the database? If yes does that solve everything?
    – User
    Commented Dec 15, 2015 at 14:52
  • The question as it is now is a text wall. Would you format it a little bit to give it structure? Commented Dec 15, 2015 at 14:55
  • Ok, give me some mins
    – User
    Commented Dec 15, 2015 at 15:00

2 Answers 2

5
  • Syncing between multiple clients is a very complext task.

  • Let's not call it syncing, let's call it batch transaction processing.

  • Let's get rid of "last server update timestamp", let's just every transaction be inserted into a pre-process table with a timestamp generated by the server.

  • Let's make it asynchronous. If data is processed fast enough, users shouldn't notice.

  • Client generates a job id (hash of userid plus timestamp plus some other random data), and pushes it to the server.

  • Server saves the job id in a jobs table and assigns it with a timestamp

  • Client sends transactions that will be in that job enter image description here

    ITEM ID is client generated using a hash function the same way job id was generated.

  • A process that runs, say every few seconds gets the more recent pending job, and proceeds to process all transactions.

  • Transactions that couldn't be done because the ITEM was previously erased are inserted in a rejects table

  • Current data after transaction is pushed to client including rejections (client data is deleted and new data is inserted, client perceives it as a refresh)

  • A job can be transanction-less, meaning it's only a refresh request.

  • Overwrites cannot be avoided but you can opt to duplicate entries when a conflict arises instead of overwriting the older version of an item.

8
  • wow this is entirely different than everything I implemented so far... so serialise all the operations in so to say a queue? I assume there would be only 1 worker thread then? Why do you say overwrites cannot be avoided? (the -1 is not from me btw)
    – User
    Commented Dec 15, 2015 at 16:41
  • Yes, one thread. Overwrites are unavoidable since you cannot prevent a later transaction to overwrite the change you previously made, neither with my approach or yours, although I've seen that Apple just duplicates some entries when there's a conflict, when syncing notes, which is weird since nobody else is syncing my notes. I'll add an edit to the overwrites part. Commented Dec 15, 2015 at 17:51
  • @ixxzz The gist of my answer is that by doing it asyncrhonously you take out a lot of complexity. Commented Dec 15, 2015 at 17:55
  • Mhh interesting. Yeah update overwrites would be unavoidable without a timestamp check. But I think that's ok. Wonder about the performance, my code is already asynchronous (the database access also). But not limited to a single thread. Also, to represent these transactions, a nosql database is probably suitable? Have a lot of items with very different structure. Not sure if it's worth to rewrite my sync instead of trying to improve what I have already. For now +1, I'm curious if there are other approaches.
    – User
    Commented Dec 15, 2015 at 19:08
  • I'd argue that updating data needs to be synchronous if the data can be partially updated. Otherwise, data integrity cannot be guaranteed. Commented Dec 15, 2015 at 22:00
2

There may be another approach depending on whether it's ok for your use case. Instead of trying to synchronise between different versions from different clients, lock them out for the duration of the update.

For example, you could do something like the following:

update locktable set current_user='user61852' where current_user=''

where of course locktable is a single row (single column) table. On failure (current_user is still not your id), sleep for a bit and try again.

On success, (re)query the data as necessary and do your inserts etc. Then make absolutely sure that in every eventuality, current_user is set back to '' (so you're probably want to handle a timeout as well).

It has some disadvantages of course, especially when everyone wants to do it at 9 am in the morning, and there's a lot of things to transfer.

Not the answer you're looking for? Browse other questions tagged or ask your own question.