I want to synchronise the local database of mobile devices with the server's database. This synchronisation will happen for example when user comes online, to update the local as well as the remote database with possible changes.
I thought about 2 approaches and already implemented 1 of them, but it's quite complicated and I fear a bit about its performance and error-proneness.
Approach 1.
The approach I implemented does all the synchronisation in the server. I'll store in the client's database the last server update timestamp for each item and a flag for items that are marked for removal. I send all my items - with their timestamps and flags - to the server, and the server then decides which items have to be inserted, updated - only if the last update timestamp of the client is equal to the last update timestamp stored in the server (meaning the client has the latest version of the item) and what has to be deleted. Then I do a fresh query to the database and send everything back to the client. The client overwrites it's local database with the result. In detail:
Client sends all its items to server (this could be refined by filtering items that need create/update/delete, but for now it sends everything). Items have a server last update timestamp and flag if it's marked to be deleted.
Server queries all the items for the user
Programmatic algorithm that goes through all the items checking timestamps in order to determine if the item can be updated/deleted or not. If the last server update timestamp in the client's item is different than the one stored in the server (meaning, someone else updated the item in between) the update/delete is rejected. I build many lists here, one of items that have to be inserted (items are not in the server's query result), items that are to be updated, items that are to be deleted, items that can't be updated because the timestamp is different, items that can't be deleted because the timestamp is different.
Transaction for inserts/updates/deletes, for which I use lists constructed in 3.
Query to get again all the items for the user.
Send result of this query to client, together with the lists of items that couldn't be updated or deleted built in 3., so client knows what failed.
The problem with this approach is the multiple queries and programmatic processing, another client (some items can have multiple users) could update something in between these queries and updates can get lost. I assume I have to compact my queries (try to somehow do everything in a single transaction, maybe use stored procedures?) and lock rows to avoid synchronisation issues. My knowledge of SQL doesn't go much further than select...where
and join
so I may me missing a better way to do this.
Approach 2.
Because of these complications I was thinking if there's maybe a simpler way. The other idea I had, was to do the synchronisation in the client. So I'd do first query all my data and do an update (for which I still need the last update timestamp and delete flags - only this time the sync logic is in the client and I don't have to worry about multiple users here), and then send the sync result to the server which will just do an overwrite. In detail:
Download items for user
Do sync alhorithm in the client, similar to what I do in the server in 3. (Approach 1).
Upload items to the server, server overwrites user's items with this.
The problem here, of course, is that another client can update items in the server while this client is doing sync, and when it uploads the results, the updates of former client will be lost. I thought maybe I can work around this with a hash, so the downloaded data has a hash and when the database is to be overwritten with the sync result I do a query first and check it's the same hash, if not return an error. But then again I'm with multiple queries in the server and not sure this a recommended way.
I'd appreciate thoughts on this matter. Is my current approach the best? Or should I rather go with the client sync (this would also reduce workload in the server, which is good), but then how do I avoid possible overwrites, is the hash a good idea? Note that my app doesn't have super strict requirements, it's something like shared todo lists with a few additional features, nothing generic (like a file sharing service) or for health, science etc. so if once in a (long) while an update is lost it's not the end of the world. I care a little more about performance and implementation easiness than 100% correctness. Of course the less error prone, the better.
P.S. would also appreciate reading recommendations about this (maybe as a comment), any good resource about sync strategies. I'm using Scala, Play 2.4 and Akka in some parts so a recommendation in this direction would also be very useful.
The problem with my approach is that I'm doing many different queries and programmatic processing
- is this not the problem? What transactions are being made against the DB on the server? I don't like the sound of sync'ing to a very 'active' server. I'd favour your first approach, and you don't need to worry about whether there are concurrent users sync'ing or not - update the server DB based on the latest date against each record.